The HAT map is way of associating named overlays with HATs whose
EEPROMs were programmed with the contents of the overlay.
Unfortunately, change in the DT and kernel drivers has meant that some
of these embedded overlays no longer function, or even don't apply.
The HAT map is a mapping from HAT UUIDs to overlay names. If a HAT with
a listed UUID is detected, the embedded overlay is ignored and the
overlay named in the mapping is loaded in its place.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Commit 4e08713336 ("ASoC: hdmi-codec: fix channel info for
compressed formats") accidentally changed hcp->chmap_idx from
ca_id, the CEA channel allocation ID, to idx, the index to
the table of channel mappings ordered by preference.
This resulted in wrong channel maps being reported to userspace,
eg for 5.1 "FL,FR,LFE,FC" was reported instead of the expected
"FL,FR,LFE,FC,RL,RR":
~ # speaker-test -c 6 -t sine
...
0 - Front Left
3 - Front Center
1 - Front Right
2 - LFE
4 - Unknown
5 - Unknown
~ # amixer cget iface=PCM,name='Playback Channel Map' | grep ': values'
: values=3,4,8,7,0,0,0,0
Revert this incorrect change so that channel maps are properly
reported again.
Fixes: 4e08713336 ("ASoC: hdmi-codec: fix channel info for compressed formats")
Cc: stable@vger.kernel.org
Signed-off-by: Matthias Reichl <hias@horus.com>
Partition the code to separate atomic and non-atomic methods so that
none of them have to handle both cases. The result avoids using deferred
work unless necessary, and should be easier to understand.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Move some functions into a more logical ordering. This change causes
no functional change and is essentially cosmetic.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The debug message from bcm2835_codec_buf_prepare when the buffer
size is incorrect can be a little spammy if the application isn't
careful on how it drives it, therefore drop the priority of the
message.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The SPI core logs error messages if a compatible string device
name is not also present as an spi_device_id.
No spi_device_id values are specified by the driver, therefore
we get 4 log lines every time it is loaded:
SPI driver ads7846 has no spi_device_id for ti,tsc2046
SPI driver ads7846 has no spi_device_id for ti,ads7843
SPI driver ads7846 has no spi_device_id for ti,ads7845
SPI driver ads7846 has no spi_device_id for ti,ads7873
Add the spi_device_id values for these devices.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
For "Really Good Reasons" [1] the SPI core requires a match
between compatible device strings and the name in spi_device_id.
The ili9486 driver uses compatible strings "waveshare,rpi-lcd-35"
and "ozzmaker,piscreen", but "rpi-lcd-35" and "piscreen" are missing,
so add them.
Compatible string "ilitek,ili9486" is already used by
staging/fbtft/fb_ili9486, therefore leaving it present in ili9486 as an
spi_device_id causes the incorrect module to be loaded, therefore remove
this id.
[1] https://elixir.bootlin.com/linux/latest/source/drivers/spi/spi.c#L487
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Adds the option of selecting the DRM/KMS TinyDRM driver for
this panel, rather than the deprecated FBTFT one.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The commit shown in Fixes: aims to improve interrupt throughput by
getting the handlers invoked on different CPU cores. It does so (*) by
using an irq_ack hook to change the interrupt routing.
Unfortunately, the IRQ status bits must be cleared at source, which only
happens once the interrupt handler has run - there is no easy way for
one core to claim one of the IRQs before sending the remainder to the
next core on the list, so waking another core immediately results in a
race with a chance of both cores handling the same IRQ. It is probably
for this reason that the routing change is deferred to irq_ack, but that
doesn't guarantee no clashes - after irq_ack is called, control returns
to bcm2836_chained_handler_irq which proceeds to check for other pending
IRQs at a time when the next core is probably doing the same thing.
Since the whole point of the original commit is to distribute the IRQ
handling, there is no reason to attempt to handle multiple IRQs in one
interrupt callback, so the problem can be solved (or at least made much
harder to reproduce) by changing a "while" into an "if", so that each
invocation only handles one IRQ.
(*) I'm not convinced it's as effective as claimed since irq_ack is
called _after_ the interrupt handler, but the author thought it made a
difference.
See: https://github.com/raspberrypi/linux/issues/5214https://github.com/raspberrypi/linux/pull/1794
Fixes: fd4c9785bd ("ARM64: Round-Robin dispatch IRQs between CPUs.")
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Clang 16 warns:
../drivers/staging/vc04_services/vc-sm-cma/vc_sm.c:816:19: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
buffer->imported = 1;
^ ~
../drivers/staging/vc04_services/vc-sm-cma/vc_sm.c:822:17: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
buffer->in_use = 1;
^ ~
2 warnings generated.
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Updated imx296 driver to support external trigger mode via XTR pin.
Added module parameter to control this mode.
Signed-off-by: Ben Benson <ben.benson@raspberrypi.com>
Add a 2-5ms delay when coming out of standby and before reading the
sensor info register durning probe, as instructed by the datasheet. This
standby delay is already present when the sensor starts streaming.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
The RV3032 RTC requires an additional DT property to enable trickle
charging. Add a parameter - trickle-voltage-mv - to the i2c-rtc
and i2c-rtc-gpio overlays to set it.
See: https://github.com/raspberrypi/linux/issues/5547
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The Raspberry Pi Codec Zero (and IQaudIO Codec) don't notify the DA7213
codec when it needs to change PLL frequencies. As a result, audio can
be played at the wrong rate - play a 48kHz sound immediately after a
44.1kHz sound to see the effect, but in some configurations the codec
can lock into the wrong state and always get some rates wrong.
Add the necessary notification to fix the issue.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Loading the regulatory database from the debian files fails with
"loaded regulatory.db is malformed or signature is missing/invalid"
due to missing certificates. Add these debian-specific certificates
from upstream to fix this error. See #5536 for details.
The certificates have been imported as following:
patch -p1 <<<$(
curl https://salsa.debian.org/kernel-team/linux/-/raw/\
master/debian/patches/debian/\
wireless-add-debian-wireless-regdb-certificates.patch
)
Signed-off-by: Nicolai Buchwitz <n.buchwitz@kunbus.com>
For some reason, TASK_DELAY_ACCT was enabled in all our defconfigs
except arm64/bcm2711_defconfig. Correct the oversight.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Enable PDAF output for all modes, and also need to modify Embedded Line
Width to 11560 * 3 (two lines of Embedded Data + one line of PDAF).
Signed-off-by: Lee Jackson <lee.jackson@arducam.com>
Arducam 64MP has specific requirements for the line length, and if these
conditions are not met, the camera will not function properly. Under the
previous configuration, once a stream off operation is performed, the
camera will not output any data, even if a stream on operation is
performed. This prevents us from switching from 1280x720 to another
resolution.
Signed-off-by: Lee Jackson <lee.jackson@arducam.com>
commit 4e08713336 upstream.
According to CTA 861 the channel/speaker allocation info in the
audio infoframe only applies to uncompressed (PCM) audio streams.
The channel count info should indicate the number of channels
in the transmitted audio, which usually won't match the number of
channels used to transmit the compressed bitstream.
Some devices (eg some Sony TVs) will refuse to decode compressed
audio if these values are not set correctly.
To fix this we can simply set the channel count to 0 (which means
"refer to stream header") and set the channel/speaker allocation to 0
as well (which would mean stereo FL/FR for PCM, a safe value all sinks
will support) when transmitting compressed audio.
Signed-off-by: Matthias Reichl <hias@horus.com>
Link: https://lore.kernel.org/r/20230624165232.5751-1-hias@horus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
commit 04b49b90ca upstream.
The SADs of compressed formats contain the channel and sample rate
info of the audio data inside the compressed stream, but when
building constraints we must use the rates and channels used to
transport the compressed streams.
eg 48kHz 6ch EAC3 needs to be transmitted as a 2ch 192kHz stream.
This patch fixes the constraints for the common AC3 and DTS formats,
the constraints for the less common MPEG, DSD etc formats are copied
directly from the info in the SADs as before as I don't have the specs
and equipment to test those.
Signed-off-by: Matthias Reichl <hias@horus.com>
Link: https://lore.kernel.org/r/20230624165216.5719-1-hias@horus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add a speed parameter to the jedec-spi-nor overlay to allow much
faster accesses, taking the opportunity to simplify the internals.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The overlays configured both irq-gpio and an interrupts/
interrupt-parent configuration for the stmpe MFD device.
irq-gpio was reworked in 6.1 and has issues with the configuration
as provided. Removing it and using the interrupts/interrupt-parent
configuration works fine, so do that.
See: https://forums.raspberrypi.com/viewtopic.php?t=351394
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Ensure that the frame sequence counter is incremented only if a previous
frame start interrupt has occurred, or a frame start + frame end has
occurred simultaneously.
This corresponds the sequence number with the actual number of frames
produced by the sensor, not the number of frame buffers dequeued back
to userland.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
At a high FPS with RAW10, there is frame corruption for 480p because the
rate_factor of 2 is used with the normal 2x2 bining [1]. This commit
ties the rate_factor to the selected binning mode. For the 480p mode,
analog 2x2 binning mode with a rate_factor of 2 is always used. For the
1232p mode the normal 2x2 binning mode is used for RAW10 while analog
2x2 binning mode is used for RAW8.
[1] https://github.com/raspberrypi/linux/issues/5493
Signed-off-by: Vinay Varma <varmavinaym@gmail.com>
A DSI write is issued in st7701_prepare even when the probed panel
runs on SPI. In practice, this results in a panic with the
vc4-kms-dpi-hyperpixel2r overlay active.
Perform the equivalent write over SPI in this case.
Signed-off-by: Jack Andersen <jackoalan@gmail.com>
Disable enumerating and setting of the 2x2 binned mode entirely as it
does not seem to work for either mono or colour sensor variants.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
There is a problem with the current abort scheme
when dma is blocked on a DREQ which prevents halting.
This is triggered by SPI driver which aborts dma
in this state and so leads to a halt timeout.
Discussion with Broadcom suggests the sequence:
CS.ACTIVE=0
while (CS.OUTSTANDING_TRANSACTIONS == 0)
wait()
DEBUG.RESET=1
should be safe on a dma40 channel.
Unfortunately the non-dma40 channels don't have
OUTSTANDING_TRANSACTIONS, so we need a more
complicated scheme.
We attempt to abort the channel, which will work
if there is no blocked DREQ.
It it times out, we can assume there is no AXI
transfer in progress and reset anyway.
The length of the timeout is observed at ~20us.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
As of [1], using PPS_FETCH on a 64-bit ARM kernel with a 32-bit userland
is broken, returning a timeout. This is because the requested 4-byte
alignment for struct pps_ktime_compat (illegal on arm64) results in the
timeout flags field being uninitialised.
Make the hack specific to X86_64 builds with CONFIG_COMPAT defined.
[1] commit c2a49fe8ee ("pps: fix padding issue with PPS_FETCH for
ioctl_compat")
See: https://github.com/raspberrypi/linux/issues/5430
Fixes: c2a49fe8ee ("pps: fix padding issue with PPS_FETCH for ioctl_compat")
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Now that transfers aren't aborted immediately (and uncleanly) on
errors, and the FIFOs are always drained after all transfers, we
can implement I2C_M_IGNORE_NAK by ignoring the returned error
value.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
If a transaction is aborted immediately on ERR being reported,
then the bus is not returned to the STOP condition, and devices
generally get very upset.
Handle the ERR and CLKT conditions only when TA is not set.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
On error condition, note the error return code, but still
handle the FIFOs in the normal way rather than relying on
C_CLEAR flushing everything cleanly.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Rather than always reading the maximum number of points supported
by the chip (which may be as high as 10), read the number of
active points first, and read data for just those.
In most cases this will result in less data on the I2C bus,
with only the maximum touch points taking more due to a second
read that has to configure the start address.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Rather than deleting the upstream stdout-path declaration, overwrite
it with one selecting serial0, which will always be the UART mapped
to the 40-pin header (provided enable_uart=1 is specified). Doing
so has the advantage that earlycon can be configured just by adding
"earlycon" to the command line.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The arm64 initialisation uses the physical address reachable by all
DMA controllers to set the size of ZONE_DMA. This fails on BCM2711
with the current dts files because the declaration of the I/O space
fools it into thinking the legacy 30-bit DMA channels can see most
of the 32-bit space.
Take advantage of the simple nature of the implementation by adding
a node with a restricted dma-ranges property solely so to act as the
limiting factor in the calculation.
See: https://github.com/raspberrypi/linux/issues/5470
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Now we wait for write responses and have a burst
size of 4, we can set the fifo threshold much higher.
Set it to 28 (of the 32 entry size) to keep fifo
fuller and reduce chance of underflow.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Also tweak the flags:
Remove NO_WAIT_RESP (27)
Add BURST_LENGTH (30)
The AXI path from DMA controller to HDMI audio fifo
is long, and may have considerable delay.
When using DMA without waiting for responses it is
very easy to overfill the fifo as when the fifo
removes DREQ there may be large numbers of writes
in flight.
This means the DREQ fifo threshold must be set low
enough to accommodate the maximum number of in flight
writes (unknown by something like 24),
which means the 32 element fifo only requests data
when it contains fewer than 8 entries, making it
susceptable to underflow.
If we wait for write responses we can set the DREQ
fifo threshold much higher as there are a controlled
number of writes in flight.
However the overall bandwidth is reduced by setting
this, so also enable a burstsize of 4 to improve
bandwidth.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Add a control bit to enable a multi-beat burst on a DMA.
This improves DMA performance and is required for HDMI audio.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
The sequence we were doing was not safe.
Clearing CS meant BCM2835_DMA_WAIT_FOR_WRITES was cleared
and so polling BCM2835_DMA_WAITING_FOR_WRITES has no benefit
Broadcom have provided a recommended sequence to abort
a dma lite channel, so switch to that.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
It wasn't aborting the transfer and caused stop/start
of hdmi audio dma to be unreliable.
New sequence approved by Broadcom.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
The bcm2835_dma_create_cb_chain() function is in charge of building up
the descriptors chain for a given transfer.
It was initially supporting only the BCM2835-style DMA controller, and
was later expanded to support controllers with 40-bits channels that use
a different descriptor layout.
However, some part of the function only use the old style descriptor,
even when building a chain of new-style descriptors, resulting in weird
bugs.
Fixes: 9a52a99183 ("bcm2835-dma: Add proper 40-bit DMA support")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
bcm2711_dma40_memcpy has some code strictly equivalent to the
to_bcm2711_cbaddr() function. Let's use it instead.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
For 40 bits channels, the position is reported by reading the upper byte
in the SRCI/DESTI registers. However the driver adds that upper byte
with an 8-bits left shift, while it should be 32.
Fixes: 9a52a99183 ("bcm2835-dma: Add proper 40-bit DMA support")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Add an i2s_dma4 parameter to make the I2S interface use 40-bit DMA
channels, taking the opportunity to remove some duplication.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Contrary to what struct snd_dmaengine_dai_dma_data suggests, the
configuration of addresses of DMA slave interfaces should be done in
CPU physical addresses.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Contrary to what struct snd_dmaengine_dai_dma_data suggests, the
configuration of addresses of DMA slave interfaces should be done in
CPU physical addresses.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Contrary to what struct snd_dmaengine_dai_dma_data suggests, the
configuration of addresses of DMA slave interfaces should be done in
CPU physical addresses.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Although the system timer is largely ignored in favour of the ARM local
timers, retain the DT node so that the bcm2835-sdhost logging can find
the timer in a cleaner fashion.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Contrary to what struct snd_dmaengine_dai_dma_data suggests, the
configuration of addresses of DMA slave interfaces should be done in
CPU physical addresses.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Contrary to what struct snd_dmaengine_dai_dma_data suggests, the
configuration of addresses of DMA slave interfaces should be done in
CPU physical addresses.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Slave addresses for DMA are meant to be supplied as physical addresses
(contrary to what struct snd_dmaengine_dai_dma_data does).
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Contrary to what struct snd_dmaengine_dai_dma_data suggests, the
configuration of addresses of DMA slave interfaces should be done in
CPU physical addresses.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Slave addresses for DMA are meant to be supplied as physical addresses
(contrary to what struct snd_dmaengine_dai_dma_data does). It is up to
the DMA controller driver to perform the translation based on its own
view of the world, as described in Device Tree.
Now that the Pi Device Trees have the correct peripheral mappings,
replace the hacky address munging with phys_to_dma().
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Step one of using DMA addresses the intended way is to make sure that
the mapping between CPU physical addresses and DMA addresses in Device
Tree is complete and correct.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Two new debugfs interfaces are implemented:
- gpu_usage: exposes the total runtime since boot of each
of the 5 scheduling queues available at V3D (BIN, RENDER,
CSD, TFU, CACHE_CLEAN). So if the interface is queried at
two different points of time the usage percentage of each
of the queues can be calculated.
- gpu_pid_usage: exposes the same information but to the
level of detail of each process using the V3D driver. The
runtime for process using the driver is stored. So the
percentages of usage by PID can be calculated with
measures at different timestamps.
The storage of gpu_pid_usage stats is only done if
the debugfs interface is polled during the last 70 seconds.
If a process does not submit a GPU job during last 70
seconds its stats will also be purged.
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
The sensor supports H & V flips, but the controls were READ_ONLY.
Note that the Bayer order changes with these flips, therefore
they set the V4L2_CTRL_FLAG_MODIFY_LAYOUT property.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Sony have advised that there are variants of the IMX258 sensor which
require slightly different register configuration to the mainline
imx258 driver defaults.
There is no available run-time detection for the variant, so add
configuration via the DT compatible string.
The Vision Components imx258 module supports PDAF, so add the
register differences for that variant
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
There are a number of variants of the imx258 modules that can not
be differentiated at runtime, so add compatible strings for them.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
imx258.yaml doesn't include the vendor prefix of sony, so
rename to add it.
Update the id entry and MAINTAINERS to match.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
With the binned modes, there is little point in faithfully
reproducing the horizontal line length of 5352 pixels on the CSI2
bus, and the FIFO between the pixel array and MIPI serialiser
allows us to remove that dependency.
Allow the pixel array to run with the normal settings, with the MIPI
serialiser at half the rate. This requires some additional
information for the link frequency to pixel rate function that
needs to be added to the configuration tables.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
With a read only control there is limited point in advertising
a minimum and maximum for the control, so change to set the
value, min, and max all to the selected pixel rate.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Whilst not documented, register 0x0103 bit 0 is the soft
reset for the sensor, so send it before trying to configure
the sensor.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The sensor has a register CIT_LSHIFT which extends the exposure
and frame times by the specified power of 2 for longer
exposure times.
Add support for this by configuring this register via V4L2_CID_VBLANK
and extending the V4L2_CID_EXPOSURE range accordingly.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The data sheet states that the maximum value for registers
0x0340/0x0341 FRM_LENGTH_LINES is 65525(decimal), not the
0xFFFF defined in this driver. Correct this limit.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The sensor supports the clock lane either remaining in HS mode
during frame blanking, or dropping to LP11.
Add configuration of the mode via V4L2_MBUS_CSI2_NONCONTINUOUS_CLOCK.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Libcamera requires the cropping information for each mode, so
add this information to the driver.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
V4L2 sensor drivers are expected are expected to clip the supported
exposure range based on the VBLANK configured.
IMX258 wasn't doing that as register 0x350 (FRM_LENGTH_CTL)
switches it to a mode where frame length tracks coarse exposure time.
Disable this mode and clip the range for V4L2_CID_EXPOSURE appropriately
based on V4L2_CID_VBLANK.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Extends the driver to also support 2 data lanes.
Frame rates are obviously more restricted on 2 lanes, but some
hardware simply hasn't wired more up.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
There's no reason why the clock must be 19.2MHz and nothing
else (indeed this isn't even a frequency listed in the datasheet),
so add support for 24MHz as well.
The PLL settings result in slightly different link frequencies,
so parameterise those.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The values and ranges of V4L2_CID_VBLANK are all computed,
so there is no reason for it to be a read only control.
Remove the register values from the mode lists, add the
handler, and remove the read only flag.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The device tree bindings define the relevant regulators for the
sensor, so update the driver to request the regulators and control
them at the appropriate times.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Registers 0x0202 and 0x0203 are written via the control handler
for V4L2_CID_EXPOSURE, so are not needed from the mode lists.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The binned modes set DIG_CROP_X_OFFSET and DIG_CROP_IMAGE_WIDTH
to less than the full image, even though the image being captured
is meant to be a scaled version of the full array size.
Reduce X_OFFSET to 0, and increase IMAGE_WIDTH to the full array.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The output image is defined as being 4208x3118 pixels in size.
Y_ADD_STA register was set to 0, and Y_ADD_END to 3118, giving
3119 lines total.
The datasheet lists a requirement for Y_ADD_STA to be a multiple
of a power of 2 depending on binning/scaling mode (2 for full pixel,
4 for x2-bin/scale, 8 for (x2-bin)+(x2-subsample) or x4-bin, or 16
for (x4-bin)+(x2-subsample)).
(Y_ADD_END – Y_ADD_STA + 1) also has to be a similar power of 2.
The current configuration for the full res modes breaks that second
requirement, and we can't increase Y_ADD_STA to 1 to retain exactly
the same field of view as that then breaks the first requirement.
For the binned modes, they are worse off as 3118 is not a multiple of
4.
Increase the main mode to 4208x3120 so that it is the same FOV as the
binned modes, with Y_ADD_STA at 0.
Fix Y_ADD_STA and Y_ADD_END for the binned modes so that they meet the
sensor requirements.
This does change the Bayer order as the default configuration is for
H&V flips to be enabled, so readout is from Y_STA_END to Y_ADD_STA,
and this patch has changed Y_STA_END.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add NETFILTER_XTABLES_COMPAT=y, which is no longer enabled by default.
And regenerate the defconfigs following the usual version bump churn.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
It has been observed that edge events can be lost when GPIO edges occur
close to each other. Investigation suggests this is due to a hardware
bug, although no mechanism has been identified.
Work around the event loss by moving the IRQ acknowledgement into the
main ISR, adding missing events by explicit level-change detection.
See: https://forums.raspberrypi.com/viewtopic.php?t=350295
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The kernel needs to recognise the default BDADDRs used by the Bluetooth
modems, so add a few more that we care about.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The kernel Bluetooth framework understands that devices may not
be programmed with valid Bluetooth addresses. It also has the ability
to override a Bluetooth address with the value of the local-bd-address
DT property, but it ignores the validity of the existing address when
doing so.
Add a new boolean property, fallback-bd-address, which indicates that
the given local-bd-address property should only be used if the device
does not already have a valid BDADDR.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The BCM2835 mini-UART has no modem status interrupt, causing all
transmission to stop, never to use, if a speed change ever happens
while the CTS signal is high.
Add a simple polling mechanism in order to allow recovery in that
situation.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
There is no DT binding for emc2305 as mainline are still
discussing how to do a generic fan binding.
The 5.15 driver was reading the "emc2305," properties
"cooling-levels", "pwm-max", "pwm-min", and "pwm-channel" as u8.
The overlay was writing them as u16 (;) so it was working.
The 6.1 driver was reading as u32, which failed as there is
insufficient data.
As this is all downstream only, revert to u8 to match 5.15.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The interrupt line from the touch controller is not necessarily
connected to the SoC, so add the option to poll for touch info.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Waveshare sell a range of DSI panels of varying sizes, all
using a common MCU to control the panel and backlight.
Add a panel driver that supports these panels.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Commit 46ef9d4ed2 ("hwmon: emc2305: fixups for driver submitted to
mailing lists") missed adding the call to thermal_of_cooling_device_register
required to configure any cooling maps for the device, hence stopping it
from actually ever changing speed.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The 640x480 mode uses a special binning mode for high framerate operation where
the pixel rate is effectively doubled. Account for this when setting up the
pixel clock rate, and applying the vblank and exposure controls.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
The patch adding image encode (JPEG) to the driver missed adding
the alignment constraint for V4L2_PIX_FMT_RGBA32, which meant
it ended up giving a stride and size of 0.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
media_pipeline_start and media_pipeline_stop now validate that
the pipeline is being started and stopped with the same pipe
and pad handles.
When running with embedded metadata (eg imx477 and imx708), the
start typically happens from the metadata pad, whilst stop is
always from the image pad.
Always pass the image pad to media_pipeline_start to ensure
that the calls are balanced.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Create a cluster for the HVLIP and VFLIP controls so they are treated
as a single composite control.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Switch the system clock name from "xclk" to "inclk".
Use lower case lables for all regulator names.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Replace the existing imx708.yaml file with sony,imx708.yaml that follows
the latest devicetree conventions for camera sensors.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
This commit addresses vaious tidy-ups requesed for upstreaming the
IMX708 driver. Notably:
- Remove #define IMX708_NUM_SUPPLIES and use ARRAY_SIZE() directly
- Use dev_err_probe where possible in imx708_probe()
- Fix error handling paths in imx708_probe()
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Add support for three different usable link frequencies (default 450Mhz,
447Mhz, and 453MHz) for the IMX708 camera sensor. The choice of
frequency is handled thorugh the "link-frequency" overlay parameter.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Setting both the Drop and Add bits on the input context prevents the
corruption of split transactions seen with the BCM2711 XHCI controller,
which is a dwc3 variant.
This is a downstream feature that allows usbhid to restrict polling
intervals on mice and keyboards, and was only tested on a VL805 which
didn't complain about the fact the endpoint got added twice.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Since [1], the fbdev deferred IO framework is careful to cancel
pending updates on close to prevent dirty pages being accessed after
they may have been reused. However, this is not necessary in the case
that the pagelist is empty, and drivers that don't make use of the
pagelist may have wanted updates cancelled for no good reason.
Avoid penalising fbdev drivers that don't make use of the pagelist by
making the cancelling of deferred IO on close conditional on there
being a non-empty pagelist.
See: https://github.com/raspberrypi/linux/issues/5398
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
[1] 3efc61d952 ("fbdev: Fix invalid page access after closing deferred I/O devices")
For H264, V4L2_CID_MPEG_VIDEO_H264_I_PERIOD is meant to be the intra
I-frame period, whilst V4L2_CID_MPEG_VIDEO_GOP_SIZE is the intra IDR
frame period.
The firmware encoder doesn't produce I-frames that aren't IDR as well,
therefore V4L2_CID_MPEG_VIDEO_GOP_SIZE is technically the correct
control, however users may have adopted V4L2_CID_MPEG_VIDEO_H264_I_PERIOD.
Add support for V4L2_CID_MPEG_VIDEO_GOP_SIZE controlling the encoder,
and have VIDIOC_S_CTRL for V4L2_CID_MPEG_VIDEO_H264_I_PERIOD update
the value for V4L2_CID_MPEG_VIDEO_GOP_SIZE (the reverse is not
implemented).
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
FFmpeg insists on trying to set V4L2_CID_MPEG_VIDEO_B_FRAMES to
0, and generates an error should it fail.
As our encoder doesn't support B frames, add a stub handler for
it to silence FFmpeg.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
V4L2_CID_MPEG_VIDEO_MPEG2_LEVEL was missed from
"vc04_services: bcm2835_codec: Ignore READ_ONLY ctrls in s_ctrl"
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
While waiting for random data, use sleeps that are proportional
to the amount of data expected. Prevent indefinite waits by
giving up if nothing is received for a second.
See: https://github.com/raspberrypi/linux/issues/5390
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
As of commit [1], introduced in 5.18, fbdev drivers that use
deferred IO and need mmap support must include an explicit fb_mmap
pointer to the fb_deferred_io_mmap.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
[1] 5905585103 ("fbdev: Put mmap for deferred I/O into drivers")
Setting V4L2_CID_WIDE_DYNAMIC_RANGE was causing the long exposure
shift count to be reset, which is incorrect if the user has already
changed the frame length to cause it to have a non-zero value.
Because it only updates control ranges and doesn't set any registers,
the control can also be applied when the sensor is not powered on.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
The RaspberryPi firmware clocks driver uses in several instances a
container_of to retrieve the struct raspberrypi_clk_data from a pointer
to struct clk_hw. Let's create a small function to avoid duplicating it
all over the place.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Busy spinning after halt is dumb
We've previously applied this patch to arch/arm
but it is currenltly missing in arch/arm64
Pi4 after "sudo halt" uses 520mA
Pi4 after "sudo shutdown now" uses 310mA
Make them both use the lower powered option
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
This commit updates the imx219 driver to adverise support for embedded
data streams. This can then be used by the bcm2835-unicam driver, which
has recently been updated to expose the embedded data stream to
userland.
The imx219 sensor subdevice overloads the media pad to differentiate
between image stream (pad 0) and embedded data stream (pad 1) when
performing the v4l2_subdev_pad_ops functions.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The HBLANK control was read-only, and always configured such
that the sensor HTS register was 3448. This limited the maximum
exposure time that could be achieved to around 1.26 secs.
Make HBLANK read/write so that the line time can be extended,
and thereby allow longer exposures (and slower frame rates).
Retain the overall HTS setting when changing modes rather than
resetting it to a default.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The datasheet for this sensor documents the minimum vblanking as being
32 lines. It does fix some problems with occasional black lines at the
bottom of images (tested on Raspberry Pi).
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
This is currently running on defaults, so the --strict desired
for media drivers and similar won't be observed. That may be
possible to add later.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The ST7701 supports numerous different interface mechanisms for
MIPI DSI, RGB, or SPI. The driver was only implementing DSI input,
so add RGB parallel input with SPI configuration.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
In adding the MPEG2/MPEG4/H264 level and profile controls to
the decoder, they weren't declared as read-only, nor handlers
added to bcm2835_codec_s_ctrl. That resulted in an error message
"Invalid control" being logged every time v4l2_ctrl_handler_setup
was called from bcm2835_codec_create_component.
Define those controls as read only, and exit early from s_ctrl
on read only controls.
Fixes: "media: bcm2835-v4l2-codec: Add profile & level ctrls to decode"
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
In order to support discovery of what profile & levels are supported by
stateful decoders implement the profile and level controls where they
are defined by V4L2.
Signed-off-by: John Cox <jc@kynesim.co.uk>
The ISP cases do nothing. Remove the break that separates them from the
deinterlace case so they now do the same as deinterlace. This enables
simple width & height setting, but does not enable setting left and
top coordinates.
Signed-off-by: John Cox <jc@kynesim.co.uk>
With the RAW16 formats now having a defined CSI2 data type ID,
they can be added to the driver.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The MIPI CSI2 data type ID values are now defined in the
mipi-csi2.h header, so use those defines instead of hard
coding them in the driver.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Commit d3292daee3 ("dma-buf: Make locking consistent in dma_buf_detach()")
added checking that the same dmabuf for which dma_buf_attach
was called is passed into dma_buf_detach, which flagged up
that vcsm-cma was passing in the wrong dmabuf.
Correct this so that we don't get the WARN on every dma_buf
release.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Modify the kernel build workflow to create artifacts with the correct
names and structure, both as an example of what we expect and in case
anyone wants to use the output.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
To avoid code with build warnings being introduced into the tree, force
CONFIG_WERROR=y in the build workflow.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Enable long exposure modes by using the long exposure shift register setting
in the imx708 sensor.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
CM4 uses BCM54210PE, which supports 2 additional LEDs, choosing LED3
for the amber LED because it shows activity by default (LED4 is not
connected). However, this makes it uncontrollable by the eth_led<n>
dtparams which target LEDs 1+2.
Solve the problem by making LEDs 3+4 mirror LEDs 1+2 (which is much
simpler than adding baseboard-specific overrides, but comes with a
risk of making one of the LEDs redundant).
See: https://github.com/raspberrypi/linux/issues/5289
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The imx708 is a 12MP MIPI sensor with a 16:9 aspect ratio, here using
two CSI-2 lanes. It is a "quad Bayer" sensor with all 3 modes offering
10-bit output:
12MP: 4608x2592 up to 14.35fps (full FoV)
1080p: 2304x1296 up to 56.02fps (full FoV)
720p: 1536x864 up to 120.12fps (cropped)
This imx708 sensor driver is based heavily on the imx477 driver and
has been tested on the Raspberry Pi platform using libcamera.
Signed-off-by: Nick Hollinghurst <nick.hollinghurst@raspberrypi.com>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add YAML devicetree binding for IMX708 CMOS image sensor.
Let's also add a MAINTAINERS entry for the binding and driver.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The power up/down sequence is already ramped. Extend this to
the first user movement as well, as this will generally avoid
the "tick" noises due to rapid movements and overshooting.
Subsequent movements are generally smaller and so don't cause
issues.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Uses the regulator notifier framework so that the current
focus position will be restored whenever any user of the
regulator powers it up. This means that should the VCM
and sensor share a common regulator then starting the sensor
will automatically restore the default position. If they
have independent regulators then it will behave be powered
up when the VCM subdev is opened.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The VCM driver will often be controlled via a regulator,
therefore add in the relevant DT hooks.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The DW9817 is effectively the same as DW9807 from a programming
interface, however it drives +/-100mA instead of 0-100mA. This means
that the power on ramp needs to take the lens from the midpoint, and
power off return it there. It also changes the default position for
the module.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The DW9817 is programmatically the same as DW9807, but
the output drive is a bi-directional -100 to +100mA
instead of 0-100mA.
Add the appropriate compativle string.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
On some switches, having EEE enabled causes the link to become
unstable. With this patch, adding 'genet.eee=N' to the kernel command
line will cause EEE to be disabled on the link.
See: https://github.com/raspberrypi/linux/issues/4289
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This is a copy of README with the tags added.
You can not delete the file README as then checkpatch complains
you aren't in a kernel tree.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Builds the bcmrpi, bcm2709, bcm2711, and bcm2835 32 bit kernels,
and defconfig and bcm2711 64bit kernels, saving the artifacts for
7 days.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Now that we have some KUnit coverage, let's add a github actions file to
run them on each push or pull request.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
As there isn't currently a defined mechanism for selecting an
external trigger mode on image sensors, copy the imx477
approach of using a module parameter to enable ext trig.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The integrated USB2.0 hub in the VL805 chipset has a bug where it
incorrectly determines the remaining available frame time before the
host next sends a SOF packet with an incremented frame_number.
See the USB2.0 specification sections 11.3 and 11.14.2.3.
The hub's non-periodic TT handler can transmit the IN/OUT handshake
token too late, so a following 64-byte DATA0/1 packet causes the ACK
handshake to collide with the propagated SOF. This causes port babble.
Avoid ringing doorbells for vulnerable endpoints during uFrame 7 if the
TR is Idle to stop one source of babble. An IN transfer for a Running TR
may happen at any time, so there's not much we can do about that.
Ideally a hub firmware update to properly implement frame timeouts is
needed, and to avoid spinning for up to 125us when submitting TDs to
Idle rings.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
xhci: constrain XHCI_VLI_HUB_TT_QUIRK to old firmware versions
VLI have a firmware update for the VL805 which resolves the incorrect
frame time calculation in the hub's TT. Limit applying the quirk to
known-bad firmwares.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Whilst the codecs are restricted to 1920x1080 / 1080x1920, the ISP
isn't, but the limits advertised via V4L2 was 1920x1920 for all
roles.
Increase the limit to 16k x 16k for the ISP.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Whilst the adv7180 driver support s_routing, nothing else
does, and there is a missing lump of framework code to
define the mapping from connectors on a board to the inputs
they represent on the ADV7180.
Add a nasty hack to take a module parameter that is passed in
to s_routing on any call to G_STD, or S_STD (or subdev
g_input_status call).
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
BCM54213PE is an Ethernet PHY that supports PTP hardware timestamping.
BCM54210PW ia another Ethernet PHY, but one without PTP support.
Unfortunately the two PHYs return the same ID when queried, so some
extra information is required to determine whether the PHY is PTP-
capable.
There are two Raspberry Pi products that use these PHYs - Pi 4B and
CM4 - and fortunately they use different PHY addresses, so use that as
a differentiator. Choose to treat a PHY with the same ID but another
address as a BCM54210PE, which seems more common.
See: https://github.com/raspberrypi/linux/issues/5104
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Configuring GCC to use task stack protector canaries means it will
insert calls to check functions in FIQ code. This is bad, as a) the
FIQ's stack is banked and b) the failure invokes __stack_chk_fail which
eventually tries to call printk(). Printing to the console inside the
FIQ is generally fatal.
Add CFLAGS to stop this happening in FIQ code.
Also catch one function where notrace wasn't specified.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The pointer (struct usb_host_endpoint *)->hcpriv should contain a
reference to dwc_otg_qh_t if the driver has already seen a URB submitted
to this endpoint.
It then checks whether the qh exists and is already in a schedule in
order to decide whether to allocate periodic bandwidth or not. Passing a
pointer to an offset inside of struct usb_host_endpoint instead of just
the pointer means it dereferences bogus addresses.
Rationalise (delete) a variable while we're at it.
See https://github.com/raspberrypi/linux/issues/5189
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The driver had a number of issues, checkpatch warnings/errors,
and other limitations, so fix these up to make it usable.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
As a result of [1], DRM_GEM_CMA_HELPER has been replaced by
DRM_GEM_CMA_HELPER.
[1] 4a83c26a1d ("drm/gem: rename GEM CMA helpers to GEM DMA helpers")
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add the ability to load the names of alternative firmwares from the
Device Tree node. This permits separate firmwares for 43436s and 43438
and allows downstream firmwares to coexist with upstream.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.
This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.
Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().
There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.
See https://github.com/raspberrypi/linux/issues/5088
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The VL805 can cause data corruption if a SS Bulk OUT endpoint enters a
flow-control condition and there are TRBs in the transfer ring that are
not an integral size of wMaxPacket and the endpoint is behind one or more
hubs.
This is frequently the case encountered when FAT32 filesystems are
present on mass-storage devices with cluster sizes of 1 sector, and the
filesystem is being written to with an aggregate of small files.
The initial implementation of this quirk separated TRBs that didn't
adhere to this limitation into two - the first a multiple of wMaxPacket
and the second the 512-byte remainder - in an attempt to force TD
fragments to align with packet boundaries. This reduced the incidence
rate of data corruption but did not resolve it.
The fix as recommended by VIA is to disable bursts if this sequence of
TRBs can occur.
Limit turning off bursts to just USB mass-storage devices by searching
the device's configuration for an interface with a class type of
USB_CLASS_MASS_STORAGE.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The Unicam hardware has been observed to cause a buffer overrun when using the
dummy buffer as a circular buffer. The conditions that cause the overrun are not
fully known, but it seems to occur when the memory bus is heavily loaded.
To avoid the overrun, program the hardware with a buffer size of 0 when using
the dummy buffer. This will cause overrun into the allocated dummy buffer, but
avoid out of bounds writes.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
There is no obligation for all source devices on a video-mux to
require the same bus configuration, so read the configuration
from the sink ports, and relay via get_mbus_config on the source
port.
If the sources support get_mbus_config, then call that first.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
If we get a simultaneous FS + FE interrupt for the same frame, it cannot be
marked as completed and returned to userland as the framebuffer will be refilled
by Unicam on the next sensor frame. Additionally, the timestamp will be set to 0
as the FS interrupt handling code will not have run yet.
To avoid these problems, the frame is considered dropped in the FE handler,
and will be returned to userland on the subsequent sensor frame.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
The clk-bcm2835 handling of the pixel clock does not function
correctly when the HDMI power domain is disabled.
The firmware supports it correctly, so add it to the
firmware clock driver.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The BCM54213PE identifier is a RPI-specific addition.
Populate the remainder of the driver functions, including the
required probe routine.
Add a version of bcm54xx_suspend, from upstream.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Also check for which HDMI devices are connected and only create
devices for those that are present.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
snd_bcm2835: disable HDMI audio when vc4 is used (#3640)
Things don't work too well when both the vc4 driver and the firmware
driver are trying to control the same audio output:
[ 763.569406] bcm2835_audio bcm2835_audio: vchi message timeout, msg=5
Hence, when the vc4 HDMI driver is used, let it control audio. This is done
by introducing a new device tree property to the audio node, and
extending the vc4-kms-v3d overlays to set it appropriately.
Signed-off-by: Hristo Venev <hristo@venev.name>
staging: bcm2835-audio: Add disable-headphones flag
Add a property to allow the headphone output to be disabled. Use an
integer property rather than a boolean so that an overlay can clear it.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
staging: bcm2835-audio: Find compatible firmware node
Commit "ARM: dts: Adopt the upstream snd_bcm2835 handling" removed the
audio section from the DT and the driver can no longer access the
referenced firmware node 'brcm,firmware'. Fix that by searching for a
compatible firmware node instead, similar to drivers/gpu/drm/vc4.
Fixes: b9e62329e096 ("ARM: dts: Adopt the upstream snd_bcm2835 handling")
Signed-off-by: Juerg Haefliger <juergh@proton.me>
staging: bcm2835-audio: Fix firmware node refcounting
Decrement firmware node refcounts on all exit paths in set_hdmi_enables().
Signed-off-by: Juerg Haefliger <juergh@proton.me>
staging: bcm2835-audio: Log errors in case of firmware query failures
The driver queries the firmware for the number of detected HDMI displays
and their IDs. Log error messages if queries fail.
Signed-off-by: Juerg Haefliger <juergh@proton.me>
staging: bcm2835-audio: Fix unused enable_hdmi module parameter
The commit "Add HDMI1 facility to the driver." made the enable_hdmi module
parameter unused. Fix that by making it a global switch for all available
HDMI audio outputs.
Fixes: 755f336608 ("Add HDMI1 facility to the driver.")
Signed-off-by: Juerg Haefliger <juergh@proton.me>
staging: bcm2835-audio: Fix unused enable_headphones module parameter
Since commit "staging: bcm2835-audio: Add disable-headphones flag" the
enabling/disabling of the headphones output is solely determined by the
presence of the DT property 'brcm,disable-headphones' and the
enable_headphones module parameter is unused. Fix that by making it a
global switch.
Fixes: ee90e47d88 ("staging: bcm2835-audio: Add disable-headphones flag")
Signed-off-by: Juerg Haefliger <juergh@proton.me>
This commit updates the arducam_64mp driver to adverise support for
embedded data streams.
The arducam_64mp sensor subdevice overloads the media pad to differentiate
between image stream (pad 0) and embedded data stream (pad 1) when
performing the v4l2_subdev_pad_ops functions.
Signed-off-by: Lee Jackson <info@arducam.com>
Add a driver for the Arducam 64MP camera sensor.
Whilst the sensor supports 2 or 4 CSI2 data lanes, this driver
currently only supports 2 lanes.
The following Bayer modes are currently available:
9152x6944 10-bit @ 2.7fps
4624x3472 10-bit (binned) @ 10fps
3840x2160 10-bit (cropped/binned) @ 20fps
2312x1736 10-bit (binned) @ 30fps
1920x1080 10-bit (cropped/binned) @ 60fps
1280x720 10-bit (cropped/binned) @ 120fps
Signed-off-by: Lee Jackson <info@arducam.com>
[ I would like to pursue fixing this more directly first before actually
merging this, but I thought I'd send this to the list now anyway as a
the "backup" plan. If I can't figure out how to make headway on the
main plan in the next few days, it'll be easy to just do this. ]
Stephen reported that a static key warning splat appears during early
boot on systems that credit randomness from device trees that contain an
"rng-seed" property, because because setup_machine_fdt() is called
before jump_label_init() during setup_arch():
static_key_enable_cpuslocked(): static key '0xffffffe51c6fcfc0' used before call to jump_label_init()
WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xb0/0xb8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0+ #224 44b43e377bfc84bc99bb5ab885ff694984ee09ff
pstate: 600001c9 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : static_key_enable_cpuslocked+0xb0/0xb8
lr : static_key_enable_cpuslocked+0xb0/0xb8
sp : ffffffe51c393cf0
x29: ffffffe51c393cf0 x28: 000000008185054c x27: 00000000f1042f10
x26: 0000000000000000 x25: 00000000f10302b2 x24: 0000002513200000
x23: 0000002513200000 x22: ffffffe51c1c9000 x21: fffffffdfdc00000
x20: ffffffe51c2f0831 x19: ffffffe51c6fcfc0 x18: 00000000ffff1020
x17: 00000000e1e2ac90 x16: 00000000000000e0 x15: ffffffe51b710708
x14: 0000000000000066 x13: 0000000000000018 x12: 0000000000000000
x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000
x8 : 0000000000000000 x7 : 61632065726f6665 x6 : 6220646573752027
x5 : ffffffe51c641d25 x4 : ffffffe51c13142c x3 : ffff0a00ffffff05
x2 : 40000000ffffe003 x1 : 00000000000001c0 x0 : 0000000000000065
Call trace:
static_key_enable_cpuslocked+0xb0/0xb8
static_key_enable+0x2c/0x40
crng_set_ready+0x24/0x30
execute_in_process_context+0x80/0x90
_credit_init_bits+0x100/0x154
add_bootloader_randomness+0x64/0x78
early_init_dt_scan_chosen+0x140/0x184
early_init_dt_scan_nodes+0x28/0x4c
early_init_dt_scan+0x40/0x44
setup_machine_fdt+0x7c/0x120
setup_arch+0x74/0x1d8
start_kernel+0x84/0x44c
__primary_switched+0xc0/0xc8
---[ end trace 0000000000000000 ]---
random: crng init done
Machine model: Google Lazor (rev1 - 2) with LTE
A trivial fix went in to address this on arm64, 73e2d827a5 ("arm64:
Initialize jump labels before setup_machine_fdt()"). But it appears that
fixing it on other platforms might not be so trivial. Instead, defer the
setting of the static branch until later in the boot process.
Fixes: f5bda35fba ("random: use static branch for crng_ready()")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
There is a flags field in struct mmal_es_format, but the defines
for what the bits meant weren't included in the headers.
For V4L2_PIX_FMT_NV12_COL128 support we need them, so add them in.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Mainline APIs have changed the way in which the bus flags and
number of active CSI2 data lanes is signalled, so fix the driver
up accordingly.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
It is quite common for the devm_thermal_zone_of_sensor_register
to need to defer, so avoid spamming the log by using
dev_err_probe instead of dev_err.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add a driver for the Arducam Pivariety series CSI2 camera sensor.
Signed-off-by: Lee Jackson <info@arducam.com>
SQUASH: Fix VIDEO_ARDUCAM_PIVARIETY Kconfig entry
The cherry-pick from rpi-5.17.y put it in the wrong section, and failed
to update it for 5.18.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add YAML device tree binding for Arducam Pivariety CMOS image sensor, and
the relevant MAINTAINERS entries.
Signed-off-by: Lee Jackson <info@arducam.com>
The panel sends MIPI DCS commands during prepare and is expecting
the bus to remain in LP-11 state in-between.
Set the prepare_upstream_first flag so that the upstream DSI host
is prepared / pre_enabled first, and therefore the bus is in a
defined state.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
mipi_dsi_attach is allowed to fail, and currently the probe
code doesn't clean up (mainly drm_panel_remove) if this happens.
Add cleanup code on failure.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Allowing GPIO state to persist allows the use of gpioset to control
GPIO levels without having to use the --mode=wait feature.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
On some platforms the cma area can be half the entire system memory,
meaning that allocations start happening in the cma area immediately.
This leads to fragmentation and subsequent fatal cma_alloc failures.
We introduce an "alloc_in_cma_threshold" parameter which requires that
this many sixteenths of the free pages must be in cma before it will
try to use them. By default this is set to 12, but the previous
behaviour can be restored by setting it to 8 on startup.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
When all nodes have stopped streaming, ensure the firmware has released its
handle on the LS table dmabuf. This is done by passing a null handle in the
LS params.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
There's already a regulator module called ad5398 that exposes
this device through the regulator API. That is meaningless in
the terms that it uses and how it maps to V4L2, so a new driver
was added. However the module name collision wasn't noted, so
rename it now.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
v4l2_async_register_subdev doesn't bind in lens or flash drivers,
but v4l2_async_register_subdev_sensor does.
Switch to using v4l2_async_register_subdev_sensor.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Adds a driver for the Analog Devices AD5398 10 bit
I2C DAC which is commonly used for driving VCM lens
mechanisms.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add a binding for Analog Devices AD5398 10bit current
sinking DAC when used as a lens VCM driver.
FIXME: Convert to YAML
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup
code. This will subsequently cause and error if userland re-queues the same
buffer on the next start_streaming() call.
Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the
cleanups instead.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
ISP outputs actually support all colour spaces that are fundamentally
sRGB underneath, regardless of whether an RGB or YUV output format is
actually requested.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
Fix incorrectly applying the quirk for bulk IN endpoints and remove the
commentary which is not completely accurate based on observed behaviour.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
"media: i2c: Remove .s_power() from ov7251" removed the call that
sent ov7251_global_init_setting to the sensor. Send it when starting
streaming.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The VL805 has a bug in its internal FIFO space accounting that results
in bulk OUT babble if a TRB in a large multi-element TD has a data
buffer size that is larger than and not a multiple of wMaxPacketSize for
the endpoint. If the downstream USB3.0 link is congested, or latency is
increased through an intermediate hub, then the VL805 enters a suspected
FIFO overflow condition and transmits repeated, garbled data to the
endpoint.
TDs with TRBs of exact multiples of wMaxPacketSize and smaller than
wMaxPacketSize appear to be handled correctly, so split buffers at a
1024-byte length boundary and put the remainder in a separate smaller
TRB.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
In the case of 2 frame starts being received with no frame end
between, the queued buffer held in next_frm was lost as the
pointer was overwritten with the dummy buffer.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The sck-idle-input property indicates that the spi-gpio driver should
return the SCK line to an input when the chip select signals are
inactive.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The Top DisplayOptoelectronics (TDO) T17B driver chip is used
in the TL040HDS20CT panel (found in the Pimoroni HyperPixel4
Square display) and potentially other displays.
The driver chip supports SPI for configuration and DPI
video data.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The Ilitek ILI9806E driver is used in the Pimoroni HyperPixel4
and potentially other displays. Whilst it can support multiple
interfaces, this driver only accounts for SPI configuration and
DPI video data.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Panel orientation is now handled by the drm_panel and
panel_bridge frameworks, so remove the custom handling.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The Geekworm MZP280 panel is a 480x640 (portrait) panel with a
capacitive touch interface and a 40 pin header meant to interface
directly with the Raspberry Pi. The screen is 2.8 inches diagonally,
and there appear to be at least 4 distinct versions all with the same
panel timings.
Timings were derived from drivers posted on the github located here:
https://github.com/tianyoujian/MZDPI/tree/master/vga
Additional details about this panel family can be found here:
https://wiki.geekworm.com/2.8_inch_Touch_Screen_for_Pi_zero
Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Add support for MEDIA_BUS_FMT_RGB565_1X24_CPADHI. This format is used
by the Geekworm MZP280 panel which interfaces with the Raspberry Pi.
Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
The default clock-stretch timeout is 35 mS, which works well for
SMBus, but there are some I2C devices which can stretch the clock even
longer. Rather than trying to prescribe a safe default for everyone,
allow the timeout to be configured.
Signed-off-by: Alex Crawford <raspberrypi/linux@code.acrawford.com>
Clang warns:
drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (!source_pads) {
^~~~~~~~~~~~
drivers/media/platform/bcm2835/bcm2835-unicam.c:3152:9: note: uninitialized use occurs here
return ret;
^~~
drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:2: note: remove the 'if' if its condition is always false
if (!source_pads) {
^~~~~~~~~~~~~~~~~~~
drivers/media/platform/bcm2835/bcm2835-unicam.c:3091:9: note: initialize the variable 'ret' to silence this warning
int ret;
^
= 0
1 warning generated.
When the if condition is true, ret will be used uninitialized, which
could result in undesirable behavior. Set ret to -ENODEV on the error
path, which is a standard error code for the ->complete() callback.
Fixes: d056e86eb3 ("media/bcm2835-unicam: Parse pad numbers correctly")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
On a Pi3 B/B+ platform the imx219 sensor frequently generates a single corrupt
frame when the sensor first starts. This can either be a missing line, or
invalid samples within the line. This only occurrs using the Unicam kernel
driver.
Disabling trigger mode elimiates this corruption. Since trigger mode is a
legacy feature copied from the firmware driver and not expected to be needed,
remove it. Tested on the Raspberry Pi cameras and shows no ill effects.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
The firmware can only use I2C0 if the kernel isn't, therefore
with libcamera and DRM using it the PoE HAT fan control needs
to move to the kernel.
Add the option for the driver to be created by the PoE HAT core
MFD driver, and use the I2C regmap that provides to control fan
functions.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
As documented the _AVG parameters are meant to be hardware
averaged, but the implementation for the PoE+ HAT was done in
software in the firmware.
Drop the property.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The firmware can only use I2C0 if the kernel isn't, therefore
with libcamera and DRM using it the PoE HAT fan control needs
to move to the kernel.
Add the option for the driver to be created by the PoE HAT core
MFD driver, and use the I2C regmap that provides to control fan
functions.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The Raspbery Pi PoE+ HAT exposes a fan controller and power
supply status reporting via a single I2C address.
Create an MFD device that allows loading of the relevant
sub-drivers, with a shared I2C regmap.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Use GitHubs issue form for bug reports.
- modern look
- user don't need to mess with given markdown parts while filling the issue template
Setup config.yml for general questions and problems with the Raspbian distribution packages.
https://github.com/raspberrypi/linux/issues/4440
Upstream has added additional device specific controls, so the
V4L2_CID_USER_BASE + 0x10e0 value that had been defined for use with
the ISP has been taken by something else (and +0x10f0 has been used as
well)
Duplicate the use on V4L2_CID_USER_BASE + 0x10e0 so that userspace
(libcamera) doesn't need to change. Once the driver is upstream, then
we'll update libcamera to adopt the new value as it then won't change.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
bcm2835_isp_remove is called on an initialisation failure, but at that
point the drvdata hasn't been set. This causes a crash when e.g. using
the cutdown firmware (gpu_mem=16).
Move platform_set_drvdata before the instance probing loop to avoid the
problem.
See: https://github.com/raspberrypi/linux/issues/4774
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Not all implementations wire up the enable GPIO and may just tie
it to a supply rail.
Make it optional.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The driver supported using GPIOs to control the shutdown line,
but no regulator control.
Add regulator hooks.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The VL805 fetches up to 4 transfer TRBs at a time. TRB reads don't cross
a 64B boundary, and if a TRB is fetched and is not on a 64B boundary,
the read is sized up to the next 64B boundary.
However the VL805 implements a readahead prefetch for TRBs on a transfer
ring. This fetches the next 64B after any TRB read has happened. Near
the end of a ring segment, the prefetcher can read the first 64B of the
next page in physical memory and this is where the behaviour causes a
bug.
The controller does not tag reads with which endpoint they are for, so
if the start of the next page is a ring segment used by a victim
endpoint, and the victim endpoint is about to fetch TRBs from the start
of the segment, the victim endpoint will read from the prefetched data
and not perform a read to main memory. If the data is stale, the ring
cycle state bit may not be correct and the endpoint will silently halt.
Adjust trbs_per_seg for transfer rings allocated for this controller.
See https://github.com/raspberrypi/linux/issues/4685
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
In anticipation of adjusting the number of utilised TRBs in a ring
segment, add trbs_per_seg to struct xhci_ring and use this instead
of a compile-time define.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Don't calculate space based on the number of TRBs in the current segment,
as it's OK to wrap to the start (and flip the cycle state bit).
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint
being set to a Link TRB. The hardware-maintained endpoint context ends up
stuck at the address of the Link TRB, leading to erroneous ring expansion
events whenever the enqueue pointer wraps to the dequeue position.
If the search for the end of the current TD and ring cycle state lands on
a Link TRB, move to the next segment.
See: https://github.com/raspberrypi/linux/issues/3919
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The "panel-dpi" compatible string configures panel from device tree,
but it doesn't provide any way of configuring the bus format (colour
representation), nor does it populate it.
Add a DT parameter "bus-format" that allows the MEDIA_BUS_FMT_xxx value
to be specified from device tree.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
panel-dpi doesn't know the bit depth, so in the same way that
DPI is guessed for the connector type, guess that it'll be 8bpc.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Register 0x02 in the FT5x06 is TD_STATUS containing the number
of valid touch points being reported.
Iterate over that number of points rather than all that are
supported on the device.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The I2C to the Atmel is very fussy, and locks up easily on
Pi0-3 particularly on reads.
The LCD power status is controlled solely by this driver, so
rather than reading it back from the Atmel, use the cached
status last set.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
If a dummy buffer is still active on a frame start, it indicates that this frame
will be dropped. The explicit logging helps users identify performance issues.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
The ft5x06 is unreliable in sending touch up events, so some
touch IDs can become stuck in the detected state.
Ensure that IDs that are unreported by the controller are
released.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
As happens occasionally, an upstream change has once again prevented
spidev from being loaded via Device Tree. We now need "spidev" to be
included in the new spi_device_id list, otherwise although the
spidev driver gets loaded no /dev/spidev*.* entries will appear.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add a second (identical) set of device nodes to allow concurrent use of the ISP
hardware by another user. This change effectively creates a second state
structure (struct bcm2835_isp_dev) to maintain independent state for the second
user. Node and media entity names are appened with the instance index
appropriately.
Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Add these missing V4L2 controls. Tested binned and full resolution
modes in all four orientations using Raspberry Pi running libcamera.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
An unwanted side effect of enabling the BRCMDBG config setting is
redefining brcmf_info to be brcmf_err. This can be alarming to users
and makes it harder to spot real errors, so don't do it.
See: https://github.com/raspberrypi/linux/issues/4663
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Adds Media Controller API support for more complex pipelines.
libcamera is about to switch to using this mechanism for configuring
sensors.
This can be enabled by either a module parameter, or device tree.
Various functions have been moved to group video-centric and
mc-centric functions together.
Based on a similar conversion done to ti-vpe.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
media: bcm2835-unicam: Fixup for 5.18 and new get_mbus_config struct
The number of active CSI2 data lanes has moved within the struct
v4l2_mbus_config used by the get_mbus_config API call.
Update the driver to match the changes in mainline.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The driver was making big assumptions about the source device
using pad 0 and 1, which doesn't follow for more complex
devices where Unicam's source device may be a sink device for
something else.
Read the pad numbers through media controller, and reference
them appropriately.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
RAW color spaces are more usually reported as having full range
quantization.
Tested using libcamera.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
For cases where spare PWM outputs are available, but are desired
to be addressed a standard outputs instead.
Wraps a PWM channel as a new GPIO chip with the one output.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Should we go through the timeout failure case with port_disable
not returning all buffers for whatever reason, the
buffers_with_vpu counter gets left at a non-zero value, which
will cause reference counting issues should the instance be
reused.
Reset the count when the port is enabled again, but before
any buffers have been sent to the VPU.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
As we're wanting to wrap the image_fx component for deinterlacing,
add the deinterlace algorithm values to enum mmal_parameter_imagefx
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Adds enum mmal_interlace_type and struct
mmal_parameter_video_interlace_type to allow for querying the
interlacing mode on decoders.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
If no interrupt is defined then a timer and workqueue are used
to poll the controller.
On remove these were not being cleaned up correctly.
Fixes: ca61fdaba7 "Input: edt-ft5x06: Poll the device if no interrupt is
configured."
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The Raspberry Pi 7" 800x480 panel uses a Toshiba TC358762 DSI
to DPI bridge chip, so there is a requirement for the timings
to be specified for the end panel. Add such a definition.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
rpi_touchscreen_i2c_read returns any errors from i2c_transfer,
or the 8 bit received value.
Check for error values before trying to process the data as
valid.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
During a bulk transfer we request a DMA allocation to hold the
scatter-gather list. Most of the time, this allocation is small
(<< PAGE_SIZE), however it can be requested at a high enough frequency
to cause fragmentation and/or stress the CMA allocator (think time
spent in compaction here, or during allocations elsewhere).
Implement a pool to serve up small DMA allocations, falling back
to a coherent allocation if the request is greater than
VCHIQ_DMA_POOL_SIZE.
Signed-off-by: Oliver Gjoneski <ogjoneski@gmail.com>
Although it is no longer necessary for vchiq's children to have a
different DMA configuration to the parent, they do still need to
explicitly to have their DMA configuration set - to be that of the
parent.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Conditional on a new compatible string, change the pagelist encoding
such that the top 24 bits are the pfn, leaving 8 bits for run length
(-1), giving a 36-bit address range.
Manage the split between addresses for the VPU and addresses for the
40-bit DMA controller with a dedicated DMA device pointer that on non-
BCM2711 platforms is the same as the main VCHIQ device. This allows
the VCHIQ node to stay in the usual place in the DT.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
See https://github.com/raspberrypi/linux/issues/3981
An unknown unsafe memory access can result in the ep_state variable
in xhci_virt_ep being trampled with a stuck SET_DEQ_PENDING state
despite successful completion of a Set TR Deq Pointer command.
All URB enqueue/dequeue calls for the endpoint will fail in this state
so no transfers are possible until the device is reconnected.
As a workaround, clear the flag if we see it set and issue a new Set
TR Deq command anyway - this should be harmless, as a prior Set TR Deq
command will only have been issued in the Stopped state, and if the
endpoint is Running then the controller is required to ignore it and
respond with a Context State Error event TRB.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Add call to v4l2_ctrl_new_fwnode_properties to read and
create the fwnode based controls.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The vidioc_enum_input() v4l2 ioctl is capable of returning
sensor/input status as well. This is used in current
GStreamer HEAD for signal detection [1].
bcm2835-unicam does handle this syscall, but it didn't ask
the subdevice driver about the input status. The input then
appeared as always present.
This commit adds the necessary query. There is a precedent for
this - the R-Car VIN V4L2 driver does a similar call [2].
[1]: ce0be27caf/sys/v4l2/gstv4l2src.c (L553)
[2]: 7fb9d006d3/drivers/media/platform/rcar-vin/rcar-v4l2.c (L548)
Signed-off-by: Jakub Vaněk <linuxtardis@gmail.com>
The bcm2835 ISP requires the base address of all input/output planes to have 32
byte alignment. Using a Y stride of 32 bytes would not guarantee that the V
plane would fulfil this, e.g. a height of 650 lines would mean the V plane
buffer is not 32 byte aligned for YUV420 formats.
Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte
alignment, as the luma height will always be an even number of lines.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Fixes the following v4l2-compliance failure:
fail: v4l2-test-controls.cpp(871): subscribe event for control 'User Controls' failed test
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
Trial and error reveals that the minimum vblank value appears to be 24
(the OV5647 data sheet does not give any clues). This fixes streaming
lock-ups in full resolution mode.
Fixes: 9b5a5ebedc ("media: i2c: ov5647: Add support for V4L2_CID_VBLANK")
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
The top offset in the pixel array is actually 6 (see page 3-1 of the
OV5647 data sheet).
Fixes: f2f7ad5ce5 ("media: i2c: ov5647: Selection compliance fixes")
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
Keeping a copy of the old poweroff handler allows it to be restored
should this module be unloaded, but also provides a fallback if the
power hasn't been removed when the timeout elapses.
See: https://github.com/raspberrypi/rpi-eeprom/issues/330
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The result of dividing a u32 by a size_t is an unsigned int on arm32
and a long unsigned int on arm64. Use "%zu" (the size_t format) to
remove the build warning for 64-bit builds.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Whilst the hardware can't achieve the limits of level 4.2 under
all situations, it can exceed level 4.0.
Allow selection of levels 4.1 and 4.2.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
There is no reason why the control GPIOs for the panel can not
be connected to I2C or similar GPIO interfaces that may need to
sleep, therefore switch from gpiod_set_value to
gpiod_set_value_cansleep calls to configure them.
Without that you get complaints from gpiolib every time the state
is changed.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The Adafruit Mini-PiTFT13 display needs offsets applying when rotated,
so use the "variant" mechanism to select a custom set_addr_win method
using a dedicated compatible string of "fbtft,minipitft13".
See: https://github.com/raspberrypi/firmware/issues/1524
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
DMABUFs are all handled by videobuf2, so there is no reason not
to enable support for them.
Note that this driver is still using the vmalloc allocator, so
the buffers it allocates will not be compatible with the codec
or ISP driver that require contiguous buffers. However this
driver should be able to import the buffers allocated by them.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Parse device properties and register controls for them using the V4L2
fwnode properties helpers.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Previously they were returning positive non-zero codes for success,
which were getting passed up the call stack. Since release 4.19,
do_dentry_open (fs/open.c) has been catching these and flagging an
error. (So this driver has been broken since that date.)
Fixes: 3c2472a [media] media: i2c: Add support for OV5647 sensor
Signed-off-by: David Plowman <david.plowman@raspberrypi.org>
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1
Signed-off-by: Zefa Chen <zefa.chen@rock-chips.com>
media: i2c: ov9281: fix mclk issue when probe multiple camera.
Takes the ov9281 part only from the Rockchip's patch.
Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4
Signed-off-by: Zefa Chen <zefa.chen@rock-chips.com>
media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3
Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to
a large number of drivers.
Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c
Signed-off-by: Zefa Chen <zefa.chen@rock-chips.com>
media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code
The Rockchip driver was based on a 4.4 kernel, and had several custom
Rockchip parts.
Update to 5.4 kernel APIs, with the relevant controls required by
libcamera, and remove custom Rockchip parts.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
media: i2c: ov9281: Read chip ID via 2 reads
Vision Components have made an OV9281 module which blocks reading
back the majority of registers to comply with NDAs, and in doing
so doesn't allow auto-increment register reading as used when
reading the chip ID.
Use two reads and manually combine the results.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
media: i2c: ov9281: Add support for 8 bit readout
The sensor supports 8 bit mode as well as 10bit, so add the
relevant code to allow selection of this.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
media: ov9281: Add 1280x720 and 640x480 modes
Breaks out common register set and adds the different registers
for 1280x720 (cropped) and 640x480 (skipped) modes
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Fixed picture line bug in all ov9281 modes
Signed-off-by: Mathias Anhalt <mathiasanhalt@web.de>
Added hflip and vflip controls to ov9281
Signed-off-by: Mathias Anhalt <mathiasanhalt@web.de>
media: i2c: ov9281: Remove override of subdev name
From the original Rockchip driver, the subdev was renamed
from the default to being "mov9281 <dev_name>" whereas the
default would have been "ov9281 <dev_name>".
Remove the override to drop back to the default rather than
a vendor custom string.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
media: v4l2-subdev: add subdev-wide state struct
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
media: i2c: ov9281: Add fwnode properties controls
Add call to v4l2_ctrl_new_fwnode_properties to read and
create the fwnode based controls.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
media: i2c: ov9281: Sensor should report RAW color space
Tested on Raspberry Pi running libcamera.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
Partial revert "media: i2c: add ov9281 driver."
This partially reverts commit 84e98e3a4f.
The commit had merged some changes to other drivers with adding the ov9281
driver. Only the ov9281 parts have been reverted.
In order to get the intended behaviour of the stateful video
decoder API where only the OUTPUT queue needs to be enabled and fed
buffers in order to get the SOURCE_CHANGED event that configures the
CAPTURE queue, we want the device to run should either queue be
streaming.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The kernel modules aes-neon-blk and aes-neon-bs perform poorly, at least on
Cortex-A72 without crypto extensions. In fact, aes-arm64 outperforms them
on benchmarks, despite it being a simpler implementation (only accelerating
the single-block AES cipher).
For modes of operation where multiple cipher blocks can be processed in
parallel, aes-neon-bs outperforms aes-neon-blk by around 60-70% and aes-arm64
is another 10-20% faster still. But the difference is even more marked with
modes of operation with dependencies between neighbouring blocks, such as
CBC encryption, which defeat parallelism: in these cases, aes-arm64 is
typically around 250% faster than either aes-neon-blk or aes-neon-bs.
The key trade-off with aes-arm64 is that the look-up tables are situated in
RAM. This leaves them potentially open to cache timing attacks. The two other
modules, by contrast, load the look-up tables into NEON registers and so are
able to perform in constant time.
This patch aims to load aes-arm64 more often.
If none of the currently-loaded crypto modules implement a given algorithm,
a new one is typically selected for loading using a platform-neutral alias
describing the required algorithm. To enable users to still
load aes-neon-blk or aes-neon-bs if they really want them, while still
ensuring that aes-arm64 is usually selected, remove the aliases from
aes-neonbs-glue.c and aes-glue.c and apply them to aes-cipher-glue.c, but
still build the two NEON modules.
Since aes-glue.c can also be used to build aes-ce-blk, leave them enabled
if USE_V8_CRYPTO_EXTENSIONS is defined, to ensure they are selected if we
in future use a CPU which has the crypto extensions enabled.
Note that the algorithm priority specifiers are unchanged, so if
aes-neon-bs is loaded at the same time as aes-arm64, the former will be
used in preference. However, aes-neon-blk and aes-arm64 have tied priority,
so whichever module was loaded first will be used (assuming aes-neon-bs is
not loaded).
Signed-off-by: Ben Avison <bavison@riscosopen.org>
If multiple sets of interrupts occur simultaneously, it may be unsafe
to swap buffers, as the hardware may already be re-using the current
buffers. In such cases, avoid swapping buffers, and wait for the next
opportunity at the Frame End interrupt to signal completion.
Additionally, check the packet compare status when watching for frame
end for buffers swaps, as this could also signify a frame end event.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Each supported format now includes a mask showing the allowed colour
spaces, as well as a default colour space for when one was not
specified.
Additionally we translate the colour space to mmal format and pass it
over to the VideoCore.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
Much effort has been put into finding ways to avoid warnings from dtc
about overlays, usually to do with the presence of #address-cells and
size-cells, but not exclusively so. Since the issues being warned about
are harmless, suppress the warnings to declutter the build output and
to avoid alarming users.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
A relatively recent commit ([1]) contained optimisation for the PIO
SPI FIFO-filling functions. The commit message includes the phrase
"[t]he blind and counted loops are always called with nonzero count".
This is technically true, but it is still possible for count to become
zero before the loop is entered - if tfr->len is zero. Moving the loop
exit condition to the end of the loop saves a few cycles, but results
in a near-infinite loop should the revised count be zero on entry.
Strangely, zero-lengthed transfers aren't filtered by the SPI framework
and, even more strangely, the Python3 spidev library is triggering them
for no obvious reason.
Avoid the problem completely by bailing out of the main transfer
function early if trf->len is zero, although there may be a case for
moving the mitigation into the framework.
See: https://github.com/raspberrypi/linux/issues/4100
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
[1] 26751de25d ("spi: bcm2835: Micro-optimise FIFO loops")
Add colour denoise control to the bcm2835 driver through a new v4l2
control: V4L2_CID_USER_BCM2835_ISP_CDN.
Add the accompanying MMAL configuration structure definitions as well.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Use (reserved) bits 24 and 25 of the dreq value
(the second cell of the DT DMA descriptor) to request
that wide source reads or wide dest writes are required
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
When logging that the firmware has provided more supported formats
than we had allocated storage for, log the number allocated and
returned.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Now that the firmware supports the unpacked (16bpp) variants
of the MIPI raw formats, add the mappings.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Support has been added for the unpacked (16bpp) versions of
the MIPI raw 10, 12, and 14 formats, so add the 4CCs for them.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
error logs:
drivers/staging/vc04_services/vc-sm-cma/Kconfig:1:error: recursive dependency detected!
drivers/staging/vc04_services/vc-sm-cma/Kconfig:1: symbol BCM_VC_SM_CMA is selected by BCM2835_VCHIQ_MMAL
drivers/staging/vc04_services/vchiq-mmal/Kconfig:1: symbol BCM2835_VCHIQ_MMAL depends on BCM2835_VCHIQ
drivers/staging/vc04_services/Kconfig:14: symbol BCM2835_VCHIQ is selected by BCM_VC_SM_CMA
For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
Tested-by: make ARCH=arm64 bcm2711_defconfig
Test platform: fedora 33
Branch: rpi-5.10.y
lan78xx_link_reset explicitly clears the MAC's view of the PHY's IRQ
status. In doing so it potentially leaves the PHY with a pending
interrupt that will never be acknowledged, at which point no further
interrupts will be generated.
Avoid the problem by acknowledging any pending PHY interrupt after
clearing the MAC's status bit.
See: https://github.com/raspberrypi/linux/issues/2937
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Although the BRCMSTB PCIe interface doesn't technically support the
MSI-X spec, in practise it seems to work provided no more than 32
MSI-Xs are required. Add the required flag to the driver to allow
experimentation with devices that demand MSI-X support.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Commit 65e08c4650 failed to clear the
clock state when the device stopped streaming. Fix this, as it might
again cause the same problems when doing an unprepare.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
clk_disable_unprepare() is called unconditionally in stop_streaming().
This is incorrect in the cases where start_streaming() fails, and
unprepares all clocks as part of the failure cleanup. To avoid this,
ensure that clk_disable_unprepare() is only called in stop_streaming()
if the clocks are in a prepared state.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
On a failure in start_streaming(), the error code would not propagate to
the calling function on all conditions. This would cause the userland
caller to not know of the failure.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
DSI1 on BCM2711 doesn't require the DMA workaround that is used
on BCM2835/6/7, therefore it needs a new compatible string.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
In switching to the hardware I2C controller there is an issue
where we seem to not get back the correct state from the Pi
touchscreen.
Insert a delay before polling to avoid this condition.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
We now have the hardware I2C controller pinmuxed to the drive the
display I2C, but this controller does not support clock stretching.
The Atmel micro-controller in the panel requires clock stretching
to allow it to prepare any data to be read.
Split the rpi_touchscreen_i2c_read into two independent transactions with
a delay between them for the Atmel to prepare the data.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Not all systems have the interrupt line wired up, so switch to
polling the touchscreen off a timer if no interrupt line is
configured.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
[1] replaced a single reset function with a pointer to one of two
implementations, but also removed the call asserting the reset
at the start of brcm_pcie_setup. Doing so breaks Raspberry Pis with
VL805 XHCI controllers lacking dedicated SPI EEPROMs, which have been
used for USB booting but then need to be reset so that the kernel
can reconfigure them. The lack of a reset causes the firmware's loading
of the EEPROM image to RAM to fail, breaking USB for the kernel.
See: https://www.raspberrypi.org/forums/viewtopic.php?p=1758157#p1758157
Fixes: 04356ac307 ("PCI: brcmstb: Add bcm7278 PERST# support")
[1] 04356ac307 ("PCI: brcmstb: Add bcm7278 PERST# support")
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The last nibble is a revision ID, and the 54213pe is a later rev
than the 54210e. Running the 54210e setup code on a 54213pe results
in a broken RGMII interface.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
Define a new mailbox (SET_REBOOT_FLAGS) which may be used to
pass optional flags to the Raspberry Pi firmware that changes
the behaviour of the bootloader and firmware during a reboot.
Currently this just defines the 'tryboot' flag which causes
the firmware to load tryboot.txt instead config.txt. This
alternate configuration file can be used to specify the
path of an alternate firmware and kernels allowing a fallback
mechanism to be implemented for OS upgrades.
The gpio-fsm driver implements simple state machines that allow GPIOs
to be controlled in response to inputs from other GPIOs - real and
soft/virtual - and time delays. It can:
+ create dummy GPIOs for drivers that demand them,
+ drive multiple GPIOs from a single input, with optional delays,
+ add a debounce circuit to an input,
+ drive pattern sequences onto LEDs
etc.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
gpio-fsm: Fix a build warning
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
gpio-fsm: Rename 'num-soft-gpios' to avoid warning
As of 5.10, the Device Tree parser warns about properties that look
like references to "suppliers" of various services. "num-soft-gpios"
resembles a declaration of a GPIO called "num-soft", causing the value
to be interpreted as a phandle, the owner of which is checked for a
"#gpio-cells" property.
To avoid this warning, rename the gpio-fsm property to "num-swgpios".
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
gpio-fsm: Show state info in /sys/class/gpio-fsm
Add gpio-fsm sysfs entries under /sys/class/gpio-fsm. For each state
machine show the current state, which state (if any) will be entered
after a delay, and the current value of that delay.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
gpio-fsm: Fix shutdown timeout handling
The driver is intended to jump directly to a shutdown state in the
event of a timeout during shutdown, but the sense of the test was
inverted.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
gpio-fsm: Clamp the delay time to zero
The sysfs delay_ms value is calculated live, and it is possible for
the time left to appear to be negative briefly if the timer handling
hasn't completed. Ensure the displayed value never goes below zero,
for the sake of appearances.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Driver for the BCM2835 ISP hardware block. This driver uses the MMAL
component to program the ISP hardware through the VC firmware.
The ISP component can produce two video stream outputs, and Bayer
image statistics. This can't be encompassed in a simple V4L2
M2M device, so create a new device that registers 4 video nodes.
This patch squashes all the development patches from the earlier
rpi-5.4.y branch into one
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
This file defines the userland interface to the bcm2835-isp driver
that will follow in a separate commit.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
If CONFIG_DMA_BCM2708 isn't enabled there's no need to mask out
one of the already scarce DMA channels.
Signed-off-by: Matthias Reichl <hias@horus.com>
This adds a V4L2 memory to memory device that wraps the MMAL
video decode and video_encode components for H264 and MJPEG encode
and decode, MPEG4, H263, and VP8 decode (and MPEG2 decode
if the appropriate licence has been purchased).
This patch squashes all the work done in developing the driver
on the Raspberry Pi rpi-5.4.y kernel branch.
Thanks to Kieran Bingham, Aman Gupta, Chen-Yu Tsai, and
Marek Behún for their contributions. Please refer to the
rpi-5.4.y branch for the full history.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Ensure OUTPUT timestamps are always forwarded
The firmware by default tries to ensure that decoded frame
timestamps always increment. This is counter to the V4L2 API
which wants exactly the OUTPUT queue timestamps passed to the
CAPTURE queue buffers.
Disable the firmware option.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/vc04_services/codec: Add support for CID MPEG_HEADER_MODE
Control V4L2_CID_MPEG_VIDEO_HEADER_MODE controls whether the encoder
is meant to emit the header bytes as a separate packet or with the
first encoded frame.
Add support for it.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/vc04_services/codec: Clear last buf dequeued flag on START
It appears that the V4L2 M2M framework requires the driver to manually
call vb2_clear_last_buffer_dequeued on the CAPTURE queue during a
V4L2_DEC_CMD_START.
Add such a call.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/vc04-services/codec: Fix logical precedence issue
Two issues identified with operator precedence in logical
expressions. Fix them.
https://github.com/raspberrypi/linux/issues/4040
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
vc04_services: bcm2835-codec: Switch to s32fract
staging/bcm2835-codec: Add the unpacked (16bpp) raw formats
Now that the firmware supports the unpacked (16bpp) variants
of the MIPI raw formats, add the mappings.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Log the number of excess supported formats
When logging that the firmware has provided more supported formats
than we had allocated storage for, log the number allocated and
returned.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Add support for pixel aspect ratio
If the format is detected by the driver and a V4L2_EVENT_SOURCE_CHANGE
event is generated, then pass on the pixel aspect ratio as well.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Implement additional g_selection calls for decode
v4l_cropcap calls our vidioc_g_pixelaspect function to get the pixel
aspect ratio, but also calls g_selection for V4L2_SEL_TGT_CROP_BOUNDS
and V4L2_SEL_TGT_CROP_DEFAULT. Whilst it allows for vidioc_g_pixelaspect
not to be implemented, it doesn't allow for either of the other two.
Add in support for the additional selection targets.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Add VC-1 support.
Providing the relevant licence has been purchased, then Pi0-3
can decode VC-1.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Fix support for levels 4.1 and 4.2
The driver said it supported H264 levels 4.1 and 4.2, but
was missing the V4L2 to MMAL mappings.
Add in those mappings.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Set the colourspace appropriately for RGB formats
Video decode supports YUV and RGB formats. YUV needs to report SMPTE170M
or REC709 appropriately, whilst RGB should report SRGB.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Pass corrupt frame flag.
MMAL has the flag MMAL_BUFFER_HEADER_FLAG_CORRUPTED but that
wasn't being passed through, so add it.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Do not update crop from S_FMT after res change
During decode, setting the CAPTURE queue format was setting the crop
rectangle to the requested height before aligning up the format to
cater for simple clients that weren't expecting to deal with cropping
and the SELECTION API.
This caused problems on some resolution change events if the client
didn't also then use the selection API.
Disable the crop update after a resolution change.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
bcm2835: Allow compressed frames to set sizeimage (#4386)
Allow the user to set sizeimage in TRY_FMT and S_FMT if the format
flags have V4L2_FMT_FLAG_COMPRESSED set
Signed-off-by: John Cox <jc@kynesim.co.uk>
staging/bcm2835-codec: Change the default codec res to 32x32
In order to effectively guarantee that a V4L2_EVENT_SOURCE_CHANGE
event occurs, adopt a default resolution of 32x32 so that it
is incredibly unlikely to be decoding a stream of that resolution
and therefore failing to note a "change" requiring the event.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Add support for decoding interlaced streams
The video decoder can support decoding interlaced streams, so add
the required plumbing to signal this correctly.
The encoder and ISP do NOT support interlaced data, so trying to
configure an interlaced format on those nodes will be rejected.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Correct ENUM_FRAMESIZES stepsize to 2
Being YUV420 formats, the step size is always 2 to avoid part
chroma subsampling.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Return buffers to QUEUED not ERROR state
Should start_streaming fail, or buffers be queued during
stop_streaming, they should be returned to the core as QUEUED
and not (as currently) as ERROR.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835_codec: Log MMAL flags in hex
The flags is a bitmask, so it's far easier to interpret as hex
data instead of decimal.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: bcm2835-codec: Allow custom specified strides/bytesperline.
If the client provides a bytesperline value in try_fmt/s_fmt then
validate it and correct if necessary.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835_codec: Add support for image_fx to deinterlace
Adds another /dev/video node wrapping image_fx doing deinterlace.
Co-developed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
staging/bcm2835-v4l2_codec: Fix for encode selection API
Matches correct behaviour from DECODE and DEINTERLACE
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
staging: bcm2835-codec: Allow decode res changed before STREAMON(CAPTURE)
The V4L2 stateful video decoder API requires that you can STREAMON
on only the OUTPUT queue, feed in buffers, and wait for the
SOURCE_CHANGE event.
This requires that we enable the MMAL output port at the same time
as the input port, because the output port is the one that creates
the SOURCE_CHANGED event.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Do not send buffers to the VPU unless streaming
With video decode we now enable both input and output ports on
the component. This means that buffers will get passed to the VPU
earlier than desired if they are queued befoer STREAMON.
Check that the queue is streaming before sending buffers to the VPU.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: bcm2835-codec: Format changed should trigger drain
When a format changed event occurs, the spec says that it
triggers an implicit drain, and that needs to be signalled
via -EPIPE.
For BCM2835, the format changed event happens at the point
the format change occurs, so no further buffers exist from
before the resolution changed point. We therefore signal the
last buffer immediately.
We don't have a V4L2 available to us at this point, so set
the videobuf2 queue last_buffer_dequeued flag directly.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: bcm2835-codec: Signal the firmware to stop on all changes
The firmware defaults to not stopping video decode if only the
pixel aspect ratio or colourspace change. V4L2 requires us
to stop decoding on any change, therefore tell the firmware
of the desire for this alternate behaviour.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: bcm2835-codec: Queue flushed buffers instead of completing
When a buffer is returned on a port that is disabled, return it
to the videobuf2 QUEUED state instead of DONE which returns it
to the client.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: bcm2835_codec: Correct flushing code for refcounting
Completions don't reference count, so setting the completion
on the first buffer returned and then not reinitialising it
means that the flush function doesn't behave as intended.
Signal the completion when the last buffer is returned.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: bcm2835-codec: Ensure all ctrls are set on streamon
Currently the code was only setting some controls from
bcm2835_codec_set_ctrls, but it's simpler to use
v4l2_ctrl_handler_setup to avoid forgetting to adding new
controls to the list.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: bcm2835-codec: Add support for H&V Flips to ISP
The ISP can do H & V flips whilst resizing or converting
the image, so expose that via V4L2_CID_[H|V]FLIP.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
bcm2835-v4l2-codec: Remove advertised support of VP8
The support for this format by firmware is very limited
and won't be faster than the arm.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Pass V4L2_CID_MPEG_VIDEO_H264_MIN_QP/MAX_QP to bcm2835-v4l2-codec
Following raspberrypi/linux#4704. This is necessary to set up
quantization for variable bitrate to avoid video flickering.
staging/bcm2835-codec: bytesperline for YUV420/YVU420 needs to be 64
Matching https://github.com/raspberrypi/linux/pull/4419, the ISP
block (which is also used on the input of the encoder, and output
of the decoder) needs the base address of all planes to be aligned
to multiples of 32. This includes the chroma planes of YUV420 and
YVU420.
If the height is only a multiple of 2 (not 4), then you get an odd
number of lines in the second plane, which means the 3rd plane
starts at a multiple of bytesperline/2.
Set the minimum bytesperline alignment to 64 for those formats
so that the plane alignment is always right.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/bcm2835-codec: Allow a different stride alignment per role
Deinterlace and decode aren't affected in the same way as encode
and ISP by the alignment requirement on 3 plane YUV420.
Decode would be affected, but it always aligns the height up to
a macroblock, and uses the selection API to reflect that.
Add in the facility to set the bytesperline alignment per role.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: vc04_services: codec: Add support for V4L2_PIX_FMT_RGBA32 format
We already support V4L2_PIX_FMT_BGR32 which is the same thing with red
and blue swapped, so it makes sense to include this variant as well.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
bcm2835-codec: Return empty buffers to the VPU instead of queueing to vbuf2
The encoder can skip frames totally should rate control overshoot
the target bitrate too far. In this situation it generates an
output buffer of length 0.
V4L2 treats a buffer of length 0 as an end of stream flag, which is
not appropriate in this case, therefore we can not return that buffer
to the client.
The driver was returning the buffer to videobuf2 in the QUEUED state,
however that buffer was then not dequeued again, so the number of
buffers was reduced each time this happened. In the pathological
case of using GStreamer's videotestsrc in mode 1 for noise, this happens
sufficiently frequently to totally stall the pipeline.
If the port is still enabled then return the buffer straight back to
the VPU rather than to videobuf2.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
vc04_services: bcm2835-codec: Add support for V4L2_PIX_FMT_NV12_COL128
V4L2_PIX_FMT_NV12_COL128 is supported by the ISP and the input of
video_encode, output of video_decode, and both input and output
of the ISP.
Add in the plumbing to support the format on those ports.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
vc04_services: bcm2835-codec: Set crop_height for compressed formats
In particular for the encoder where the CAPTURE format dictates
the parameters given to the codec we need to be able to set the
value passed as the crop_height for the compressed format.
There's no crop available for cropped modes, so always set
crop_height to the requested height.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
vc04_services: bcm2835-codec: Set port format from s_selection
s_selection allows the crop region of an uncompressed pixel
format to be specified, but it wasn't passing the setting on to
the firmware. Depending on call order this would potentially
mean that the crop wasn't actioned.
Set the port format on s_selection if we have a component created.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
bcm2835-codec: /dev/video31 as interface to image_encode JPEG encoder
Signed-off-by: Maxim Devaev <mdevaev@gmail.com>
bcm2835-v4l2-codec: support H.264 5.0 and 5.1 levels
vc04_services: bcm2835-codec: Remove redundant role check
vidioc_try_encoder_cmd checks the role, but the ioctl is disabled
for any roles in which it is invalid.
Remove the redundant check.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
vc04_services: bcm2835-codec: Allow encoder_cmd on ISP and deinterlace
ISP and deinterlace also need a mechanism for passing effectively
an EOS through the pipeline to signal when all buffers have been
processed.
VIDIOC_ENCODER_CMD does exactly this for encoders, so reuse the same
function for ISP and deinterlace.
(VIDIOC_DECODER_CMD is slightly different in that it also passes
details of when and how to stop, so is not as relevant).
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
With the vc-sm-cma driver we can support zero copy of buffers between
the kernel and VPU. Add this support to mmal-vchiq.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add Broadcom VideoCore Shared Memory support.
This new driver allows contiguous memory blocks to be imported
into the VideoCore VPU memory map, and manages the lifetime of
those objects, only releasing the source dmabuf once the VPU has
confirmed it has finished with it.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging: vcsm-cma: Fix memory leak from not detaching dmabuf
When importing there was a missing call to detach the buffer,
so each import leaked the sg table entry.
Actually the release process for both locally allocated and
imported buffers is identical, so fix them to both use the same
function.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
staging/vc-sm-cma: Avoid log spamming on Pi0/1 over cache alias.
Pi 0/1 use the 0x80000000 cache alias as the ARM also sees the world
through the VPU L2 cache.
vc-sm-cma was trying to ensure it was in an uncached alias (0xc), and
complaining on every allocation if it weren't. Reduce this logging.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
vc-sm-cma: Restore correct cache maintainance operations
We have been using the more expensive flush operations rather than
invalidate and clean since kernel rpi-5.9.y
These are exposed with:
52f1453513 Re-expose some dmi APIs for use in VCSM
But I believe that commit was dropped when (non-cma) vc-sm was dropped,
and didn't get updated when the commit was restored
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
staging: vc04_services: Fix clang14 warning
Insert a break to fix a fallthrough warning from clang14. Since the
fallthrough was to another break, this is a cosmetic change.
See: https://github.com/raspberrypi/linux/issues/5078
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
vc04_services/vc-sm-cma: Handle upstream require vchiq_instance to be passed around
If the RBUF logic is not reset when the kernel starts then there
may be some data left over from any network boot loader. If the
64-byte packet headers are enabled then this can be fatal.
Extend bcmgenet_dma_disable to do perform the reset, but not when
called from bcmgenet_resume in order to preserve a wake packet.
N.B. This different handling of resume is just based on a hunch -
why else wouldn't one reset the RBUF as well as the TBUF? If this
isn't the case then it's easy to change the patch to make the RBUF
reset unconditional.
See: https://github.com/raspberrypi/linux/issues/3850
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Increase the delay before entering the lower power state to 2 seconds
(the maximum allowed) in order to reduce the packet latencies,
particularly for inbound packets.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Display variants are intended as a replacement for the now-deleted
fbtft_device drivers. Drivers can register additional compatible
strings with a custom callback that can make the required changes
to the fbtft_display structure.
Start the ball rolling by adding adafruit18, adafruit18_green and
sainsmart18 displays.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Since the unicam driver was modified to write to a dummy buffer when no
user-supplied buffer is available, it can now write to and return a
buffer even when there's only a single one. Enable this by changing the
min_buffers_needed in the vb2_queue; it will be useful for enabling
still captures without allocating more memory than absolutely necessary.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
The change to retrieve the pixel format always on g_fmt didn't
check whether the native or unpacked version of the format
had been requested, and always returned the packed one.
Correct this so that the packing setting is retained whereever
possible.
Fixes "9d59e89 media: bcm2835-unicam: Re-fetch mbus code from subdev
on a g_fmt call"
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
From when bringing up the driver, there was a check in the isr
to ignore interrupts (claiming them handled) should the driver
not be streaming.
The VPU now will not register a camera driver if it finds a
CSI2 node enabled in device tree, therefore this flawed check is
redundant.
https://github.com/raspberrypi/linux/issues/3602
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Fix commit "media: tc358743: Return an appropriate colorspace from
tc358743_set_fmt" to ensure that the format passed in to set_fmt
is checked to be valid, and reset to the current format if not.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Pi 0&1 pass all ARM accesses through the VPU L2 cache, therefore
the dma-ranges property sets the cache alias bits to other
than the direct alias, hence this WARN was firing.
It was overprotective coding, so assume that everything is OK
with the dma-ranges, and remove the WARN.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The actpwr trigger is a meta trigger that cycles between an inverted
mmc0 and default-on. It is written in a way that could fairly easily
be generalised to support alternative sets of source triggers.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
When streaming with Unicam, the VPU must have a clock frequency of at
least 250Mhz. Otherwise, the input fifos could overrun, causing
image corruption.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
[g|s]_selection pass in a buffer type that needs to be validated
before passing on to the sensor subdev.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
v4l2-compliance throws a failure if the device doesn't advertise
V4L2_CAP_READWRITE but allows read or write operations.
We do support read, so reinstate the flag.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Use bit 27 of the dreq value (the second cell of the DT DMA descriptor)
to request that the WAIT_RESP bit is not set.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Now that the 14bit non-packed Bayer formats are defined, add them
into the supported formats lookup table.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Now that V4L2_PIX_FMT_Y14 and V4L2_PIX_FMT_Y14P are defined,
allow passing 14bit mono data through the peripheral.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Now that V4L2_PIX_FMT_Y12P is defined, allow passing raw 12bit
mono packed data through the peripheral.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Older gcc versions object to = { 0 } initialisation if the first
elemtn in the structure is a substructure.
Use = { } to avoid this compiler warning.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Use the get_mbus_config pad subdev call to allow a source to use
fewer than the number of CSI2 lanes defined in device tree.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add a driver for the Unicam camera receiver block on BCM283x processors.
Compared to the bcm2835-camera driver present in staging, this driver
handles the Unicam block only (CSI-2 receiver), and doesn't depend on
the VC4 firmware running on the VPU.
The commit is made up of a series of changes cherry-picked from the
rpi-5.4.y branch of https://github.com/raspberrypi/linux/ with
additional enhancements, forward-ported to the mainline kernel.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reported-by: kbuild test robot <lkp@intel.com>
When closing the video device, the irs1125 is put in power down state.
To keep V4L2 ctrls and the HW in sync, v4l2_ctrl_handler_setup is
called after power up.
The compound ctrl IRS1125_CID_MOD_PLL however has a default value
of all zeros, which puts the imager into a non responding state.
Thus, this ctrl is not written by the driver into HW after power up.
The userspace has to take care to write senseful data.
Signed-off-by: Markus Proeller <markus.proeller@pieye.org>
Instead of changing the exposure and framerate settings for all sequences,
they can be changed for every sequence individually now. Therefore the
IRS1125_CID_SAFE_RECONFIG ctrl has been removed and replaced by
IRS1125_CID_SAFE_RECONFIG_S<seq_num>_EXPO and *_FRAME ctrls.
The consistency check in the sequence ctrl IRS1125_CID_SEQ_CONFIG
is removed.
Signed-off-by: Markus Proeller <markus.proeller@pieye.org>
Reading data over i2c is done by using i2c_transfer to ensure that this
operation can't be interrupted.
Signed-off-by: Markus Proeller <markus.proeller@pieye.org>
The BRCM PCIe block has controls to enable control of the CLKREQ#
signal by the L1SS, and to gate the refclk with the CLKREQ# input.
These controls are mutually exclusive - the upstream code sets the
latter, but some use cases require the former.
Add a Device Tree property - brcm,enable-l1ss - to switch to the
L1SS configuration.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Upstream Linux deems using output GPIOs to generate IRQs as a bogus
use case, even though the BCM2835 GPIO controller is capable of doing
so. A number of users would like to make use of this facility, so
disable the checks.
See: https://github.com/raspberrypi/linux/issues/2527
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
V4L2 wishes to have the codec header bytes in the same buffer as the
first encoded frame, so it does become 1-in 1-out for encoding.
The firmware now has an option to do this, so enable it.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
If the firmware hasn't detected a display, the driver would assume
one display was available, but because it had failed to retrieve the
display size it would try to allocate a zero-sized buffer.
Avoid the allocation failure by bailing out early if no display is
found.
See: https://github.com/raspberrypi/linux/issues/3598
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
BCM2711 has 4 DMA channels with a 40-bit address range, allowing them
to access the full 4GB of memory on a Pi 4.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcmn2835_isp is a platform driver dependent on vchiq,
therefore add the load/unload functions for it to vchiq.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Add V4L2_META_FMT_BCM2835_ISP_STATS V4L2 format type.
This new format will be used by the BCM2835 ISP device to return
out ISP statistics for 3A.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
This patch adds MEDIA_BUS_FMT_SENSOR_DATA used by the bcm2835-unicam
driver to support CSI-2 embedded data streams from camera sensors.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Add V4L2_META_FMT_SENSOR_DATA format 4CC.
This new format will be used by the BCM2835 Unicam device to return
out camera sensor embedded data.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Commit f3186dd876 ("spi: Optionally use GPIO descriptors for CS GPIOs")
amended of_spi_parse_dt() to always set SPI_CS_HIGH for SPI slaves whose
Chip Select is defined by a "cs-gpios" devicetree property.
This change breaks drivers whose probe functions set the mode field of
the spi_device because in doing so they clear the SPI_CS_HIGH flag.
Fix by setting SPI_CS_HIGH in spi_setup (under the same conditions as
in of_spi_parse_dt()).
See also: 83b2a8fe43 ("spi: spidev: Fix CS polarity if GPIO descriptors are used")
Fixes: f3186dd876 ("spi: Optionally use GPIO descriptors for CS GPIOs")
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
SQUASH: spi: Demote SPI_CS_HIGH warning to KERN_DEBUG
This warning is unavoidable from a client's perspective and
doesn't indicate anything wrong (just surprising).
SQUASH with "spi: use_gpio_descriptor fixup moved to spi_setup"
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This driver is for the HEVC/H265 decoder block on the Raspberry
Pi 4, and conforms to the V4L2 stateless decoder API.
Signed-off-by: John Cox <jc@kynesim.co.uk>
staging: media: rpivid: Select MEDIA_CONTROLLER and MEDIA_CONTROLLER_REQUEST_API
MEDIA_CONTROLLER_REQUEST_API is a hidden option. If rpivid depends on it,
the user would need to first enable another driver that selects
MEDIA_CONTROLLER_REQUEST_API, and only then rpivid would become available.
By selecting it instead of depending on it, it becomes possible to enable
rpivid without having to enable other potentially unnecessary drivers.
Signed-off-by: Hristo Venev <hristo@venev.name>
rpivid_h265: Fix width/height typo
Signed-off-by: popcornmix <popcornmix@gmail.com>
rpivid_h625: Fix build warnings
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
staging: rpivid: Fix crash when CMA alloc fails
If realloc to increase coeff size fails then attempt to re-allocate
the original size. If that also fails then flag a fatal error to abort
all further decode.
Signed-off-by: John Cox <jc@kynesim.co.uk>
rpivid: Request maximum hevc clock
Query maximum and minimum clock from driver
and use those
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
rpivid: Switch to new clock api
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
rpivid: Only clk_request_done once
Fixes: 25486f49bf
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
media: rpivid: Remove the need to have num_entry_points set
VAAPI H265 has num entry points but never sets it. Allow a VAAPI
shim to work without requiring rewriting the VAAPI driver.
num_entry_points can be calculated from the slice_segment_addr
of the next slice so delay processing until we have that.
Also includes some minor cosmetics.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Convert to MPLANE
Use multi-planar interface rather than single plane interface. This
allows dmabufs holding compressed data to be resized.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Add an enable count to irq claim Qs
Add an enable count to the irq Q structures to allow the irq logic to
block further callbacks if resources associated with the irq are not
yet available.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Add a Pass0 to accumulate slices and rework job finish
Due to overheads in assembling controls and requests it is worth having
the slice assembly phase separate from the h/w pass1 processing. Create
a queue to service pass1 rather than have the pass1 finished callback
trigger the next slice job.
This requires a rework of the logic that splits up the buffer and
request done events. This code contains two ways of doing that, we use
Ezequiel Garcias <ezequiel@collabora.com> solution, but expect that
in the future this will be handled by the framework in a cleaner manner.
Fix up the handling of some of the memory exhaustion crashes uncovered
in the process of writing this code.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Map cmd buffer directly
It is unnecessary to have a separate dmabuf to hold the cmd buffer.
Map it directly from the kmalloc.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Improve values returned when setting output format
Guess a better value for the compressed bitstream buffer size
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Improve stream_on/off conformance & clock setup
Fix stream on & off such that failures leave the driver in the correct
state. Ensure that the clock is on when we are streaming and off when
all contexts attached to this device have stopped streaming.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Improve SPS/PPS error handling/validation
Move size and width checking from bitstream processing to control
validation
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Fix H265 aux ent reuse of the same slot
It is legitimate, though unusual, for an aux ent associated with a slot
to be selected in phase 0 before a previous selection has been used and
released in phase 2. Fix such that if the slot is found to be in use
that the aux ent associated with it is reused rather than an new aux
ent being created. This fixes a problem where when the first aux ent
was released the second was lost track of.
This bug spotted in Nick's testing. It may explain some other occasional,
unreliable decode error reports where dmesg included "Missing DPB AUX
ent" logging.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Update to compile with new hevc decode params
DPB entries have moved from slice params to the new decode params
attribute - update to deal with this. Also fixes fallthrough
warnings which seem to be new in 5.14.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Make slice ctrl dynamic
Allows the user to submit a whole frames worth of slice headers in
one lump along with a single bitstream dmabuf for the whole lot.
This saves potentially a lot of bitstream copying.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Only create aux entries for H265 if needed
Only create aux entries of mv info for frames where that info might
be used by a later frame. This saves some memory bandwidth and
potentially some memory.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Avoid returning EINVAL to a G_FMT ioctl
V4L2 spec says that G/S/TRY_FMT IOCTLs should never return errors for
anything other than wrong buffer types. Improve the capture format
function such that this is so and unsupported values get converted
to supported ones properly.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Remove unused ctx state variable and defines
Remove unused ctx state tracking variable and associated defines.
Their presence implies they might be used, but they aren't.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Ensure IRQs have completed before uniniting context
Before uniniting the decode context sync with the IRQ queues to ensure
that decode no longer has any buffers in use. This fixes a problem that
manifested as ffmpeg leaking CMA buffers when it did a stream off on
OUTPUT before CAPTURE, though in reality it was probably much more
dangerous than that.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: remove min_buffers_needed from src queue
Remove min_buffers_needed=1 from src queue init. Src buffers are bound
to media requests therefore this setting is not needed and generates
a WARN in kernel 5.16.
Signed-off-by: John Cox <jc@kynesim.co.uk>
rpivid: Use clk_get_max_rate()
The driver was using clk_round_rate() to figure out the maximum clock
rate that was allowed for the HEVC clock.
Since we have a function to return it directly now, let's use it.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
media: rpivid: Apply V4L2 stateless API changes
media: rpivid: Fix fallthrough warning
Replace old-style /* FALLTHRU */ with fallthrough;
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Set min value as well as max for HEVC_DECODE_MODE
As only one value can be accepted set both min and max to that value.
Signed-off-by: John Cox <jc@kynesim.co.uk>
media: rpivid: Accept ANNEX_B start codes
Allow the START_CODE control to take ANNEX_B as a value. This makes no
difference to any part of the decode process as the added bytes are in
data that we ignore. This helps my testing and may help userland code
that expects to send those bytes.
Signed-off-by: John Cox <jc@kynesim.co.uk>
rpivid: Convert to new clock rate API
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
This is probably not the API we will want to add, but it
should show what semantics are needed by drivers.
The goal is to allow the OUTPUT (aka source) buffer and the
controls associated to a request to be released from the request,
and in particular return the OUTPUT buffer back to userspace,
without signalling the media request fd.
This is useful for devices that are able to pre-process
the OUTPUT buffer, therefore able to release it before
the decoding is finished. These drivers should signal
the media request fd only after the CAPTURE buffer is done.
Tested-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Some of the Broadcom codec blocks use a column based YUV4:2:0 image
format, so add the documentation and defines for both 8 and 10 bit
versions.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The DT bindings description of the Brcmstb PCIe device is described. This
node can be used by almost all Broadcom settop box chips, using
ARM, ARM64, or MIPS CPU architectures.
Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
When symbols from overlays are added to the live tree their paths must
be rebased. The translated symbol is normally the result of joining
the fragment-relative path (with a leading "/") to the target path
(either copied directly from the "target-path" property or resolved
from the phandle). This translation fails when the target is the root
node (a common case for Raspberry Pi overlays) because the resulting
path starts with a double slash. For example, if target-path is "/" and
the fragment adds a node called "newnode", the label associated with
that node will be assigned the path "//newnode", which can't be found
in the tree.
Fix the failure case by explicitly replacing a target path of "/" with
an empty string.
Fixes: d1651b03c2 ("of: overlay: add overlay symbols to live device tree")
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The definition of compat_ptr is now common for most platforms, but
requires the inclusion of <linux/compat.h>.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
A failure in gpiochip_irqchip_add leads to a leak of a gpiochip. Fix
the leak with the use of devm_gpiochip_add_data.
Fixes: 85ae9e512f ("pinctrl: bcm2835: switch to GPIOLIB_IRQCHIP")
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
vchiq kernel clients are now instantiated as platform drivers rather
than using DT, but the children of the vchiq interface may still
benefit from access to DT properties. Give them the option of a
a sub-node of the vchiq parent for configuration and to allow
them to be disabled.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The IMA (Integrity Measurement Architecture) looks for a TPM (Trusted
Platform Module) having been registered when it initialises; otherwise
it assumes there is no TPM. It has been observed on BCM2835 that IMA
is initialised before TPM, and that initialising the BCM2835 clock
driver before the firmware driver has the effect of reversing this
order.
Change the firmware driver to initialise at core_initcall, delaying the
BCM2835 clock driver to postcore_initcall.
See: https://github.com/raspberrypi/linux/issues/3291https://github.com/raspberrypi/linux/pull/3297
Signed-off-by: Luke Hinds <lhinds@redhat.com>
Co-authored-by: Phil Elwell <phil@raspberrypi.org>
vchiq on Pi4 is no longer under the soc node, therefore it
doesn't get the dma-ranges for the VPU.
Switch to using the configuration of the old dma controller as
that will set the dma-ranges correctly.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
The VCHIQ driver now loads the audio, camera, codec, and vc-sm
drivers as platform drivers. However they were not being given
the correct DMA configuration.
Call of_dma_configure with the parent (VCHIQ) parameters to be
inherited by the child.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
For performance/power it is beneficial to adjust gpu clocks with arm clock.
This is how the downstream cpufreq driver works
Signed-off-by: popcornmix <popcornmix@gmail.com>
Setting the v3d clock to low value allows firmware to handle dvfs in case
where v3d hardware is not being actively used (e.g. console use).
Signed-off-by: popcornmix <popcornmix@gmail.com>
Add device tree entries and code to allow the specification of
the lighting modes for the LED's on the ethernet connector.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
net:phy:2711 Change the default ethernet LED actions
This should return default behaviour back to that of previous
releases.
Following the same pattern as bcm2835-camera and bcm2835-audio,
register the V4L2 codec driver as a platform driver
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Following the same pattern as bcm2835-camera and bcm2835-audio,
register the vcsm-cma driver as a platform driver
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The irq_fence and done_fence are given a reference that is never
released. The necessary dma_fence_put()s seem to have been
deleted in error in an earlier commit.
Fixes: 0b73676836b2 ("drm/v3d: Clock V3D down when not in use.")
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The v3d driver currently encounters a lot of MMU PTE exceptions, so
only log the first to avoid swamping the kernel log.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The Infineon IRS1125 is a time of flight depth sensor that
has a CSI-2 interface.
Add a V4L2 subdevice driver for this device.
Signed-off-by: Markus Proeller <markus.proeller@pieye.org>
After the decision to use bcm2711 compatible for upstream, we should
switch all accepted compatibles to bcm2711. So we can boot with
one DTB the down- and the upstream kernel.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
The cherry-pick of the patch that added the greyworld AWB mode
was incomplete. Fix it up.
Fixes: b3ef481fe2 "staging: bcm2835-camera: Add greyworld AWB mode"
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add name for greyworld to white_balance preset names.
This patch previously applied to v4l2-ctrl.c but that was split
and deleted.
Signed-off-by: John Cox <jc@kynesim.co.uk>
This is mainly used for the NoIR camera which has no IR
filter and can completely confuse normal AWB presets.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Adds a simple greyworld white balance preset, mainly for use
with cameras without an IR filter (eg Raspberry Pi NoIR)
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
dt-bindings: media: i2c: Add IMX519 CMOS sensor binding
Add YAML device tree binding for IMX519 CMOS image sensor, and
the relevant MAINTAINERS entries.
Signed-off-by: Lee Jackson <info@arducam.com>
media: i2c: Add driver for IMX519 sensor
Adds a driver for the 16MPix IMX519 CSI2 sensor.
Whilst the sensor supports 2 or 4 CSI2 data lanes, this driver
currently only supports 2 lanes.
The following Bayer modes are currently available:
4656x3496 10-bit @ 10fps
3840x2160 10-bit (cropped) @ 21fps
2328x1748 10-bit (binned) @ 30fps
1920x1080 10-bit (cropped/binned) @ 60fps
1280x720 10-bit (cropped/binned) @ 120fps
Signed-off-by: Lee Jackson <info@arducam.com>
media: i2c: imx519: Advertise embedded data node on media pad 1
This commit updates the imx519 driver to adverise support for embedded
data streams.
The imx519 sensor subdevice overloads the media pad to differentiate
between image stream (pad 0) and embedded data stream (pad 1) when
performing the v4l2_subdev_pad_ops functions.
Signed-off-by: Lee Jackson <info@arducam.com>
media: i2c: imx519: Sensor should report RAW color space
Tested on Raspberry Pi running libcamera.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
media: i2c: Update imx519 Kconfig entry
Bring the IMX519 Kconfig declaration in line with the upstream entries.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
dt-bindings: media: i2c: Add IMX477 CMOS sensor binding
Add YAML device tree binding for IMX477 CMOS image sensor.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: Add driver for Sony IMX477 sensor
Adds a driver for the 12MPix Sony IMX477 CSI2 sensor.
Whilst the sensor supports 2 or 4 CSI2 data lanes, this driver
currently only supports 2 lanes.
The following Bayer modes are currently available:
4056x3040 12-bit @ 10fps
2028x1520 12-bit (binned) @ 40fps
2028x1050 12-bit (cropped/binned) @ 50fps
1012x760 10-bit (scaled) @ 120 fps
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Add support for adaptive frame control
Use V4L2_CID_EXPOSURE_AUTO_PRIORITY to control if the driver should
automatically adjust the sensor frame length based on exposure time,
allowing variable frame rates and longer exposures.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Return correct result on sensor id verification
The test should return -EIO if the register read id does not match
the expected sensor id.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Parse and register properties
Parse device properties and register controls for them using the V4L2
fwnode properties helpers.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
media: i2c: imx477: Selection compliance fixes
To comply with the intended usage of the V4L2 selection target when
used to retrieve a sensor image properties, adjust the rectangles
returned by the imx477 driver.
The top/left crop coordinates of the TGT_CROP rectangle were set to
(0, 0) instead of (8, 16) which is the offset from the larger physical
pixel array rectangle. This was also a mismatch with the default values
crop rectangle value, so this is corrected. Found with v4l2-compliance.
While at it, add V4L2_SEL_TGT_CROP_BOUNDS support: CROP_DEFAULT and
CROP_BOUNDS have the same size as the non-active pixels are not readable
using the selection API. Found with v4l2-compliance.
This commit mirrors 543790f777 done for
the imx219 sensor.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Remove auto frame length adjusting
The V4L2_CID_EXPOSURE_AUTO_PRIORITY was used to let the sensor control
frame length (effectively framerate) based on the requested exposure
time requested. Remove this feature as it is never used, and goes
against how V4L2 likes to handle exposure and vblank controls.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Add very long exposure control to the driver
Add support for very long exposures by using the exposure multiplier
register. Userland does not need to pass any additional controls to
enable long exposures, it simply requests a larger vblank to extend the
exposure control range appropriately.
Currently, since hblank is fixed, a maximum of approximately 124 seconds
of exposure time can be used. In a future change, hblank could also be
controlled in userland to give over 200 seconds of exposure time.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Fix crop height for 2028x1080 mode
The crop height for this mode was set at 2600 lines, it should be 2160
lines instead.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Replace existing 1012x760 mode
The existing 1012x760 120 fps mode has significant IQ problem using
the internal sensor scaler. Replace this mode with a 1332x990 120 fps
mode instead. This new mode has a smaller field of view, but does not
suffer from the bad IQ of the original mode.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Remove internal v4l2_mbus_framefmt from the state
The only field in this struct that is used is the format code, so
replace the struct with this single field.
Save the format code in imx477_set_pad_format() when setting up a new
mode so that imx477_get_pad_format() performs the right lookup.
Otherwise, this caused a bug where the mode lookup occurred on the
12-bit table rather than the 10-bit table.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Remove unused function parameter
The struct imx477 *ctrl parameter is not used in the function
imx477_adjust_exposure_range(), so remove it.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Fix for long exposure limit calculations
Do not scale IMX477_EXPOSURE_OFFSET with the long exposure factor during
the limit calculations. This allows larger exposure times, and does seem to be
what the sensor is doing internally.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Extend driver to support imx378 sensor
The imx378 sensor is almost identical to the imx477 and can be
supported as a "compatible" sensor with just a few extra register
writes.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
media: i2c: imx477: Fix framerates for 1332x990 mode
The imx477 driver's line length for this mode had not been updated to
the value supplied to us by the sensor manufacturer. With this
correction the sensor delivers the framerates that are expected.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
media: i2c: imx477: Allow control of on-sensor DPC
A module parameter "dpc_enable" is added to allow the control of the
sensor's on-board DPC (Defective Pixel Correction) function.
This is a global setting to be configured before using the sensor;
there is no intention that this would ever be changed on-the-fly.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
media: i2c: imx477: Sensor should report RAW color space
Tested on Raspberry Pi running libcamera.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
media: i2c: imx477: Add vsync trigger_mode parameter
trigger_mode == 0 (default) => no effect / no registers written
trigger_mode == 1 => source
trigger_mode == 2 => sink
This can be set e.g. in /boot/cmdline.txt as imx477.trigger_mode=N
Signed-off-by: Jonas Jacob <jonas.jacob@neocortexvision.com>
media: i2c: Update imx477 Kconfig entry
Bring the IMX477 Kconfig declaration in line with upstream entries.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
media: i2c: imx477: Correct minimum exposure lines
The minimum number of exposure lines value (IMX477_EXPOSURE_MIN) was
previously 20 but this is not correct. The datasheet is not completely
explicit, however the new value of 4 has been tested with all the
sensor modes supported by this driver, and matches the lowest exposure
value of 114us that could be achieved wtih Raspberry Pi's legacy
firmware driver.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
media: i2c: imx477: Allow dynamic horizontal blanking control
Currently, the V4L2_CID_HBLANK control is marked as read-only. Remove this
restriction and allow userland to modify the control if needed.
Set the maximum limit of the line length to 0xfff0.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Reset hblank on mode switch
Reset the hblank control to the minimum value on every mode switch. This is to
account for userland instances that do not yet control hblank, otherwise it
gets set to a non-optimal value.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
media: i2c: imx477: Do not unconditionally adjust hblank and vblank limits
On a mode change, only call imx477_set_framing_limits() to adjust the hblank
and vblank limits if the new mode is different from the existing mode. This
preserves any manual control values the user might have set.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Users have reported log spam created by "Event Ring Full" xHC event
TRBs. These are caused by interrupt latency in conjunction with a very
busy set of devices on the bus. The errors are benign, but throughput
will suffer as the xHC will pause processing of transfers until the
event ring is drained by the kernel. Expand the number of event TRB slots
available by increasing the number of event ring segments in the ERST.
Controllers have a hardware-defined limit as to the number of ERST
entries they can process, so make the actual number in use
min(ERST_MAX_SEGS, hw_max).
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
Some combinations of Pi 4Bs and Ethernet switches don't reliably get a
DCHP-assigned IP address, leaving the unit with a self=assigned 169.254
address. In the failure case, the Pi is left able to receive packets
but not send them, suggesting that the MAC<->PHY link is getting into
a bad state.
It has been found empirically that skipping a reset step by the genet
driver prevents the failures. No downsides have been discovered yet,
and unlike the forced renegotiation it doesn't increase the time to
get an IP address, so the workaround is enabled by default; add
genet.skip_umac_reset=n
to the command line to disable it.
See: https://github.com/raspberrypi/linux/issues/3108
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
These wireless mouse/keyboard combo remote control devices specify
multiple "wheel" events in their report descriptors. The wheel events
are incorrectly defined and apparently map to accelerometer data, leading
to spurious mouse scroll events being generated at an extreme rate when
the device is moved.
As a workaround, use HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE to mask
feeding the extra wheel events to the input subsystem.
See: https://github.com/raspberrypi/firmware/issues/1189
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
Based on the gpiomem driver, allow mapping of the decoder register
spaces such that userspace can access control/status registers.
This driver is intended for use with a custom ffmpeg backend accelerator
prior to a v4l2 driver being written.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
driver: char: rpivid: Destroy the legacy device on remove
The legacy name support created a new device that was never destroyed.
If the driver was unloaded and reloaded, it failed due to the
device already existing.
Fixes: "75f1d14 driver: char: rpivid - also support legacy name"
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Clean up error handling use of ERR_PTR/IS_ERR
The driver used an unnecessary intermediate void* variable so it
only called ERR_PTR once to convert to the error value.
Switch to converting as the error arises to remove these intermediate
variables.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Add error handling to the legacy device load
The return value from device_create for the legacy device was never
checked or handled. Add the required error handling.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Fix coding style whitespace issues.
Makes checkpatch happier.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpimem: Add SPDX licence header.
Stops checkpatch complaining.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Fix access to freed memory
The error path during probe frees the private memory block, and
then promptly dereferences it to log an error message.
Use the base device instead of the pointer to it in the private
structure.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Remove legacy name support
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
driver: char: rpivid: Don't map more than wanted
Limit mappings to the permitted range, but don't map more than asked
for otherwise we walk off the end of the allocated VMA.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
My various attempts at re-enabling runtime PM have failed, so just
crank the clock down when V3D is idle to reduce power consumption.
Signed-off-by: Eric Anholt <eric@anholt.net>
The BCM2835 I2C blocks have a register to set the clock-stretch
timeout - how long the device is allowed to hold SCL low - in bus
cycles. The current driver doesn't write to the register, therefore
the default value of 64 cycles is being used for all devices.
Set the timeout to the value recommended for SMBus - 35ms.
See: https://github.com/raspberrypi/linux/issues/3064
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Must be called in a non-atomic context, after the endpoint
has been registered with the hardware via xhci_add_endpoint
and before the first URB is submitted for the endpoint.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
xHCI caches device and endpoint data after the interface is configured,
so an explicit command needs to be issued for any device driver wanting
to alter the polling interval of an endpoint.
Add usb_fixup_endpoint() to allow drivers to do this. The fixup must be
called after calculating endpoint bandwidth requirements but before any
URBs are submitted.
If polling intervals are shortened, any bandwidth reservations are no
longer valid but in practice polling intervals are only ever relaxed.
Limit the scope to interrupt transfers for now.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
There are several warts surrounding bcmgenet_mii_probe() as this
function is called from ndo_open, but it's calling registration-type
functions. The probe should be called at probe time and refactored
such that the PHY device data can be extracted to limit the scope
of this flag to Broadcom PHYs.
For now, pass this flag in as it puts our attached PHY into a low-power
state when disconnected.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
Set defaults for TX and RX packet coalescing to be equivalent to:
# ethtool -C eth0 tx-frames 10
# ethtool -C eth0 rx-usecs 50
This may be something we want to set via DT parameters in the
future.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The HWRNG on the BCM2838 is compatible to iproc-rng200, so add the
support to this driver instead of bcm2835-rng.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
hwrng: iproc-rng200: Correct SoC name
The Pi 4 SoC is called BCM2711, not BCM2838.
Fixes: "hwrng: iproc-rng200: Add BCM2838 support"
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The legacy peripherals can only address the first gigabyte of RAM, so
ensure that DMA allocations are restricted to that region.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The ioremapping creates mappings within the vmalloc area. The
equivalent early function, create_mapping, now checks that the
requested explicit virtual address is between VMALLOC_START and
VMALLOC_END. As there is no reason to have any correlation between
the physical and virtual addresses, put the required mappings at
VMALLOC_START and above.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
On error, vchiq_mmal_component_init could leave the
event context allocated for ports.
Clean them up in the error path.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
vchiq_mmal_component_init calls init_event_context for the
control port, but vchiq_mmal_component_finalise didn't free
it, causing a memory leak..
Add the free call.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
mmal_parameters.h hasn't been updated to reflect additions made
over the last few years. Update it to reflect the currently
supported parameters.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
The list of formats was copied before Bayer support was added.
The ISP supports Bayer and is being supported by the bcm2835_codec
driver, so add in the encodings for them.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
The MMAL client_component field is used with the event
mechanism to allow the client to identify the component for
which the event is generated.
The field is only 32bits in size, therefore we can't use a
pointer to the component in a 64 bit kernel.
Component handles are already held in an array per VCHI
instance, so use the array index as the client_component handle
to avoid having to create a new IDR for this purpose.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
videobuf2 only allowed exporting a dmabuf as a file descriptor,
but there are instances where having the struct dma_buf is
useful within the kernel.
Split the current implementation into two, one step which
exports a struct dma_buf, and the second which converts that
into an fd.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Add the ability to send data to ports. This only supports
zero copy mode as the required bulk transfer setup calls
are not done.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
(Preparation for the codec driver).
The codec uses the event mechanism to report things such as
resolution changes. It is signalled by the cmd field of the buffer
being non-zero.
Add support for passing this information out to the client.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Fixes up a checkpatch error "Avoid using bool structure members
because of possible alignment issues".
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
When calling tc358743_set_fmt, the code was calling tc358743_get_fmt
to choose a valid format. However that sets the colorspace
based on what was read back from the chip. When you set the format,
then the driver would choose and program the colorspace based
on the format code.
The result was that if you called try or set format for UYVY
when the current format was RGB3 then you would get told sRGB,
and try RGB3 when current was UYVY and you would get told
SMPTE170M.
The value programmed into the chip is determined by this driver,
therefore there is no need to read back the value. Return the
colorspace based on the format set/tried instead.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Document the DT bindings for the CSI2/CCP2 receiver peripheral
(known as Unicam) on BCM283x SoCs.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Acked-by: Rob Herring <robh@kernel.org>
New helper defines that allow printing of a FOURCC using
printf(V4L2_FOURCC_CONV, V4L2_FOURCC_CONV_ARGS(fourcc));
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
The ADV7282M can support YPbPr on AIN1-3, but this was
not selectable from the driver. Add it to the list of
supported input modes.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
The hardware default is differential CVBS on AIN1 & 2, which
isn't very useful.
Select the first input that is defined as valid for the
chip variant (typically CVBS_AIN1).
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
The probe for the TC358743 reads the CHIPID register from
the device and compares it to the expected value of 0.
If the I2C request fails then that also returns 0, so
the driver loads thinking that the device is there.
Generally I2C communications are reliable so there is
limited need to check the return value on every transfer,
therefore only amend the one read during probe to check
for I2C errors.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Adds register setups for running the CSI lanes at 972Mbit/s,
which allows 1080P50 UYVY down 2 lanes.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
The existing fixed value of 16 worked for UYVY 720P60 over
2 lanes at 594MHz, or UYVY 1080P60 over 4 lanes. (RGB888
1080P60 needs 6 lanes at 594MHz).
It doesn't allow for lower resolutions to work as the FIFO
underflows.
374 is required for 1080P24-30 UYVY over 2 lanes @ 972Mbit/s, but
>374 means that the FIFO underflows on 1080P50 UYVY over 2 lanes
@ 972Mbit/s.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Add the ability to interpret the high bits of the dreq specifier as
flags to be included in the DMA_CS register. The motivation for this
change is the ability to set the DISDEBUG flag for SD card transfers
to avoid corruption when using the VPU debugger.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.
See: https://github.com/raspberrypi/linux/issues/2447
The driver already reported the firmware build date during probe.
The mailbox calls have been extended to also report the variant
1 = standard start.elf
2 = start_x.elf (includes camera stack)
3 = start_db.elf (includes assert logging)
4 = start_cd.elf (cutdown version for smallest memory footprint).
Log the variant during probe.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
firmware: raspberrypi: Report the fw git hash during probe
The firmware can now report the git hash from which it was built
via the mailbox, so report it during probe.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Ethernet cables with faulty or missing pairs (specifically pairs C and
D) allow auto-negotiation to 1000Mbs, but do not support the successful
establishment of a link. Add a DT property, "microchip,downshift-after",
to configure the number of auto-negotiation failures after which it
falls back to 100Mbs. Valid values are 2, 3, 4, 5 and 0, where 0 means
never downshift.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Step wise governor increases the mitigation level when the temperature
goes above a threshold and will decrease the mitigation when the
temperature falls below the threshold. If it were a case, where the
temperature hovers around a threshold, the mitigation will be applied
and removed at every iteration. This reaction to the temperature is
inefficient for performance.
The use of hysteresis temperature could avoid this ping-pong of
mitigation by relaxing the mitigation to happen only when the
temperature goes below this lower hysteresis value.
Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Avoid a hard userspace ABI change by adding a compatible get_throttled
sysfs entry. Its value is now feed by the GET_THROTTLED requests of the
new hwmon driver. The first access to get_throttled will generate
a warning.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.
See: https://github.com/raspberrypi/linux/issues/2437
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
As of 4.18, a firmware that implements the update_connect_params
method but doesn't claim to support roaming causes an error. We
disabled firmware roaming in 4.4 [1] because it appeared to
prevent disconnects, but let's try with the current firmware to see
if things have improved.
[1] dd91880117
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
ad83c7cb2f ("irqchip/irq-bcm2836: Add support for DT interrupt polarity")
changed the way that the BCM2836/7 local interrupts are mapped; instead
of being pre-mapped they are now mapped on-demand. A side effect of this
change is that the call to irq_of_parse_and_map from armctrl_of_init
creates a new mapping, forming a gap between the IRQs and the FIQs. This
gap breaks the FIQ<->IRQ mapping which up to now has been done by assuming:
1) that the value of FIQ_START is the same as the number of normal IRQs
that will be mapped (still true), and
2) that this value is also the offset between an IRQ and its equivalent
FIQ (which is no longer the case).
Remove both assumptions by measuring the interval between the last IRQ
and the last FIQ, passing it as the parameter to init_FIQ().
Fixes: https://github.com/raspberrypi/linux/issues/2432
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Register for reboot notifications, sending RPI_FIRMWARE_NOTIFY_REBOOT
over the mailbox interface on reception.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Add two new DT properties:
* microchip,eee-enabled - a boolean to enable EEE
* microchip,tx-lpi-timer - time in microseconds to wait before entering
low power state
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
I2C busses can be assigned specific bus numbers using aliases in
Device Tree - string properties where the name is the alias and the
value is the path to the node. The current DT parameter mechanism
does not allow property names to be derived from a parameter value
in any way, so it isn't possible to generate unique or matching
aliases for nodes from an overlay that can generate multiple
instances, e.g. i2c-gpio.
Work around this limitation (at least temporarily) by allowing
the i2c adapter number to be initialised from the "reg" property
if present.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
There is a new test in __irq_startup that the IRQ is activated, which
hasn't been the case for FIQs since they bypass some of the usual setup.
Augment enable_fiq to include a call to irq_activate to avoid the
warning.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Create a semi-static mapping for the USB registers early in the boot
process, before additional kernel threads are started, so all threads
will have the mappings from the start. This avoids the need for
data aborts to lazily update them.
See: https://github.com/raspberrypi/linux/issues/2450
Signed-off-by: Floris Bos <bos@je-eigen-domein.nl>
The VideoCore bootloader passes in Serial number and
Revision number through Device Tree. Make these available to
userspace through /proc/cpuinfo.
Mainline status:
There is a commit in linux-next that standardize passing the serial
number through Device Tree (string: /serial-number):
ARM: 8355/1: arch: Show the serial number from devicetree in cpuinfo
There was an attempt to do the same with the revision number, but it
didn't get in:
[PATCH v2 1/2] arm: devtree: Set system_rev from DT revision
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Uses the debugfs I/F to provide access to the AXI
bus performance monitors.
Requires the new mailbox peripheral access for access
to the VPU performance registers, system bus access
is done using direct register reads.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
raspberrypi_axi_monitor: suppress warning
Suppress the following warning by casting the pointer to and uintptr_t
before to u32:
Signed-off-by: Matteo Croce <mcroce@redhat.com>
IRQ-CPU mapping is round robined on ARM64 to increase
concurrency and allow multiple interrupts to be serviced
at a time. This reduces the need for FIQ.
Signed-off-by: Michael Zoran <mzoran@crowfest.net>
brcmfmac: Disable power management
Disable wireless power saving in the brcmfmac WLAN driver. This is a
temporary measure until the connectivity loss resulting from power
saving is resolved.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
brcmfmac: Use original country code as a fallback
Commit 73345fd212:
brcmfmac: Configure country code using device specific settings
prevents region codes from working on devices that lack a region code
translation table. In the event of an absent table, preserve the old
behaviour of using the provided code as-is.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
brcmfmac: Plug memory leak in brcmf_fill_bss_param
See: https://github.com/raspberrypi/linux/issues/1471
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
brcmfmac: do not use internal roaming engine by default
Some evidence of curing disconnects with this disabled, so make it a default.
Can be overridden with module parameter roamoff=0
See: http://projectable.me/optimize-my-pi-wi-fi/
brcmfmac: Change stop_ap sequence
Patch from Broadcom/Cypress to resolve a customer error
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Revert "brcmfmac: Disable power management"
Shortly after the release of the Pi 3B, a loss of SSH connectivity
over WiFi was traced to the power management handling, so power
management was disabled. And so it has remained ever since.
Enabling power management saves 55mA (~270mW) on a Pi 4B, so is very
much worth the minimal effort of reverting this patch, which was
squashed and rebased many times since then to the commit hash is
meaningless.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
This is a port of Pantelis Antoniou's v3 port that makes use of the
new upstreamed configfs support for binary attributes.
Original commit message:
Add a runtime interface to using configfs for generic device tree overlay
usage. With it its possible to use device tree overlays without having
to use a per-platform overlay manager.
Please see Documentation/devicetree/configfs-overlays.txt for more info.
Changes since v2:
- Removed ifdef CONFIG_OF_OVERLAY (since for now it's required)
- Created a documentation entry
- Slight rewording in Kconfig
Changes since v1:
- of_resolve() -> of_resolve_phandles().
Originally-signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
DT configfs: Fix build errors on other platforms
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
DT configfs: fix build error
There is an error when compiling rpi-4.6.y branch:
CC drivers/of/configfs.o
drivers/of/configfs.c:291:21: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
.default_groups = of_cfs_def_groups,
^
drivers/of/configfs.c:291:21: note: (near initialization for 'of_cfs_subsys.su_group.default_groups.next')
The .default_groups is linked list since commit
1ae1602de0.
This commit uses configfs_add_default_group to fix this problem.
Signed-off-by: Slawomir Stepien <sst@poczta.fm>
configfs: New of_overlay API
of: configfs: Use of_overlay_fdt_apply API call
The published API to the dynamic overlay application mechanism now
takes a Flattened Device Tree blob as input so that it can manage the
lifetime of the unflattened tree. Conveniently, the new API call -
of_overlay_fdt_apply - is virtually a drop-in replacement for
create_overlay, which can now be deleted.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Add a virtual GPIO driver that uses the firmware mailbox interface to
request that the VPU toggles LEDs.
gpio: bcm-virt: Fix the get() method
The get() method does not understand the on-the-wire encoding of the
remote GPIO states, thinking they are simple on/off bits when they are
really pairs of 16-bit counts. Rewrite the get() handler to return the
value last written, which will eventually match the actual GPIO state
if there are no other changes.
See: https://github.com/raspberrypi/linux/issues/4638
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add a mailbox-driven backlight controller for the Raspberry Pi DSI
touchscreen display. Requires updated GPU firmware to recognise the
mailbox request.
Signed-off-by: Gordon Hollingworth <gordon@raspberrypi.org>
Add Raspberry Pi firmware driver to the dependencies of backlight driver
Otherwise the backlight driver fails to build if the firmware
loading driver is not in the kernel
Signed-off-by: Alex Riesen <alexander.riesen@cetitec.com>
ASoC: Add support for Rpi-DAC
ASoC: Add prompt for ICS43432 codec
Without a prompt string, a config setting can't be included in a
defconfig. Give CONFIG_SND_SOC_ICS43432 a prompt so that Pi soundcards
can use the driver.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Add IQaudIO Sound Card support for Raspberry Pi
Set a limit of 0dB on Digital Volume Control
The main volume control in the PCM512x DAC has a range up to
+24dB. This is dangerously loud and can potentially cause massive
clipping in the output stages. Therefore this sets a sensible
limit of 0dB for this control.
Allow up to 24dB digital gain to be applied when using IQAudIO DAC+
24db_digital_gain DT param can be used to specify that PCM512x
codec "Digital" volume control should not be limited to 0dB gain,
and if specified will allow the full 24dB gain.
Modify IQAudIO DAC+ ASoC driver to set card/dai config from dt
Add the ability to set the card name, dai name and dai stream name, from
dt config.
Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
IQaudIO: auto-mute for AMP+ and DigiAMP+
IQAudIO amplifier mute via GPIO22. Add dt params for "one-shot" unmute
and auto mute.
Revision 2, auto mute implementing HiassofT suggestion to mute/unmute
using set_bias_level, rather than startup/shutdown....
"By default DAPM waits 5 seconds (pmdown_time) before shutting down
playback streams so a close/stop immediately followed by open/start
doesn't trigger an amp mute+unmute."
Tested on both AMP+ (via DAC+) and DigiAMP+, with both options...
dtoverlay=iqaudio-dacplus,unmute_amp
"one-shot" unmute when kernel module loads.
dtoverlay=iqaudio-dacplus,auto_mute_amp
Unmute amp when ALSA device opened by a client. Mute, with 5 second delay
when ALSA device closed. (Re-opening the device within the 5 second close
window, will cancel mute.)
Revision 4, using gpiod.
Revision 5, clean-up formatting before adding mute code.
- Convert tab plus 4 space formatting to 2x tab
- Remove '// NOT USED' commented code
Revision 6, don't attempt to "one-shot" unmute amp, unless card is
successfully registered.
Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
ASoC: iqaudio-dac: fix S24_LE format
Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: iqaudio-dac: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
Added support for HiFiBerry DAC+
The driver is based on the HiFiBerry DAC driver. However HiFiBerry DAC+ uses
a different codec chip (PCM5122), therefore a new driver is necessary.
Add support for the HiFiBerry DAC+ Pro.
The HiFiBerry DAC+ and DAC+ Pro products both use the existing bcm sound driver with the DAC+ Pro having a special clock device driver representing the two high precision oscillators.
An addition bug fix is included for the PCM512x codec where by the physical size of the sample frame is used in the calculation of the LRCK divisor as it was found to be wrong when using 24-bit depth sample contained in a little endian 4-byte sample frame.
Limit PCM512x "Digital" gain to 0dB by default with HiFiBerry DAC+
24db_digital_gain DT param can be used to specify that PCM512x
codec "Digital" volume control should not be limited to 0dB gain,
and if specified will allow the full 24dB gain.
Add dt param to force HiFiBerry DAC+ Pro into slave mode
"dtoverlay=hifiberry-dacplus,slave"
Add 'slave' param to use HiFiBerry DAC+ Pro in slave mode,
with Pi as master for bit and frame clock.
Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
Fixed a bug when using 352.8kHz sample rate
Signed-off-by: Daniel Matuschek <daniel@hifiberry.com>
ASoC: pcm512x: revert downstream changes
This partially reverts commit 185ea05465
which was added by https://github.com/raspberrypi/linux/pull/1152
The downstream pcm512x changes caused a regression, it broke normal
use of the 24bit format with the codec, eg when using simple-audio-card.
The actual bug with 24bit playback is the incorrect usage
of physical_width in various drivers in the downstream tree
which causes 24bit data to be transmitted with 32 clock
cycles. So it's not the pcm512x that needs fixing, it's the
soundcard drivers.
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: hifiberry_dacplus: fix S24_LE format
Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: hifiberry_dacplus: transmit S24_LE with 64 BCLK cycles
Signed-off-by: Matthias Reichl <hias@horus.com>
hifiberry_dacplus: switch to snd_soc_dai_set_bclk_ratio
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: hifiberry_dacplus: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Add driver for rpi-proto
Forward port of 3.10.x driver from https://github.com/koalo
We are using a custom board and would like to use rpi 3.18.x
kernel. Patch works fine for our embedded system.
URL to the audio chip:
http://www.mikroe.com/add-on-boards/audio-voice/audio-codec-proto/
Playback tested with devicetree enabled.
Signed-off-by: Waldemar Brodkorb <wbrodkorb@conet.de>
ASoC: rpi-proto: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Add Support for JustBoom Audio boards
justboom-dac: Adjust for ALSA API change
As of 4.4, snd_soc_limit_volume now takes a struct snd_soc_card *
rather than a struct snd_soc_codec *.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
ASoC: justboom-dac: fix S24_LE format
Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.
Also remove hw_params as it's no longer needed.
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: justboom-dac: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
New AudioInjector.net Pi soundcard with low jitter audio in and out.
Contains the sound/soc/bcm ALSA machine driver and necessary alterations to the Kconfig and Makefile.
Adds the dts overlay and updates the Makefile and README.
Updates the relevant defconfig files to enable building for the Raspberry Pi.
Thanks to Phil Elwell (pelwell) for the review, simple-card concepts and discussion. Thanks to Clive Messer for overlay naming suggestions.
Added support for headphones, microphone and bclk_ratio settings.
This patch adds headphone and microphone capability to the Audio Injector sound card. The patch also sets the bit clock ratio for use in the bcm2835-i2s driver. The bcm2835-i2s can't handle an 8 kHz sample rate when the bit clock is at 12 MHz because its register is only 10 bits wide which can't represent the ch2 offset of 1508. For that reason, the rate constraint is added.
ASoC: audioinjector-pi-soundcard: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
New driver for RRA DigiDAC1 soundcard using WM8741 + WM8804
ASoC: digidac1-soundcard: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Add support for Dion Audio LOCO DAC-AMP HAT
Using dedicated machine driver and pcm5102a codec driver.
Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
ASoC: dionaudio_loco: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Allo Piano DAC boards: Initial 2 channel (stereo) support (#1645)
Add initial 2 channel (stereo) support for Allo Piano DAC (2.0/2.1) boards,
using allo-piano-dac-pcm512x-audio overlay and allo-piano-dac ALSA ASoC
machine driver.
NB. The initial support is 2 channel (stereo) ONLY!
(The Piano DAC 2.1 will only support 2 channel (stereo) left/right output,
pending an update to the upstream pcm512x codec driver, which will have
to be submitted via upstream. With the initial downstream support,
provided by this patch, the Piano DAC 2.1 subwoofer outputs will
not function.)
Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Signed-off-by: Clive Messer <clive.messer@digitaldreamtime.co.uk>
Tested-by: Clive Messer <clive.messer@digitaldreamtime.co.uk>
ASoC: allo-piano-dac: fix S24_LE format
Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.
Also remove hw_params and ops as they are no longer needed.
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: allo-piano-dac: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Add support for Allo Piano DAC 2.1 plus add-on board for Raspberry Pi.
The Piano DAC 2.1 has support for 4 channels with subwoofer.
Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Reviewed-by: Vijay Kumar B. <vijaykumar@zilogic.com>
Reviewed-by: Raashid Muhammed <raashidmuhammed@zilogic.com>
Add clock changes and mute gpios (#1938)
Also improve code style and adhere to ALSA coding conventions.
Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Reviewed-by: Vijay Kumar B. <vijaykumar@zilogic.com>
Reviewed-by: Raashid Muhammed <raashidmuhammed@zilogic.com>
PianoPlus: Dual Mono & Dual Stereo features added (#2069)
allo-piano-dac-plus: Master volume added + fixes
Master volume added, which controls both DACs volumes.
See: https://github.com/raspberrypi/linux/pull/2149
Also fix initial max volume, default mode value, and unmute.
Signed-off-by: allocom <sparky-dev@allo.com>
ASoC: allo-piano-dac-plus: fix S24_LE format
Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.
Signed-off-by: Matthias Reichl <hias@horus.com>
sound: bcm: Fix memset dereference warning
This warning appears with GCC 6.4.0 from toolchains.bootlin.com:
../sound/soc/bcm/allo-piano-dac-plus.c: In function ‘snd_allo_piano_dac_init’:
../sound/soc/bcm/allo-piano-dac-plus.c:711:30: warning: argument to ‘sizeof’ in ‘memset’ call is the same expression as the destination; did you mean to dereference it? [-Wsizeof-pointer-memaccess]
memset(glb_ptr, 0x00, sizeof(glb_ptr));
^
Suggested-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
ASoC: allo-piano-dac-plus: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Add support for Allo Boss DAC add-on board for Raspberry Pi. (#1924)
Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Reviewed-by: Deepak <deepak@zilogic.com>
Reviewed-by: BabuSubashChandar <babusubashchandar@zilogic.com>
Add support for new clock rate and mute gpios.
Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Reviewed-by: Deepak <deepak@zilogic.com>
Reviewed-by: BabuSubashChandar <babusubashchandar@zilogic.com>
ASoC: allo-boss-dac: fix S24_LE format
Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: allo-boss-dac: transmit S24_LE with 64 BCLK cycles
Signed-off-by: Matthias Reichl <hias@horus.com>
allo-boss-dac: switch to snd_soc_dai_set_bclk_ratio
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: allo-boss-dac: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Support for Blokas Labs pisound board
Pisound dynamic overlay (#1760)
Restructuring pisound-overlay.dts, so it can be loaded and unloaded dynamically using dtoverlay.
Print a logline when the kernel module is removed.
pisound improvements:
* Added a writable sysfs object to enable scripts / user space software
to blink MIDI activity LEDs for variable duration.
* Improved hw_param constraints setting.
* Added compatibility with S16_LE sample format.
* Exposed some simple placeholder volume controls, so the card appears
in volumealsa widget.
Add missing SND_PISOUND selects dependency to SND_RAWMIDI
Without it the Pisound module fails to compile.
See https://github.com/raspberrypi/linux/issues/2366
Updates for Pisound module code:
* Merged 'Fix a warning in DEBUG builds' (1c8b82b).
* Updating some strings and copyright information.
* Fix for handling high load of MIDI input and output.
* Use dual rate oversampling ratio for 96kHz instead of single
rate one.
Signed-off-by: Giedrius Trainavicius <giedrius@blokas.io>
Fixing memset call in pisound.c
Signed-off-by: Giedrius Trainavicius <giedrius@blokas.io>
Fix for Pisound's MIDI Input getting blocked for a while in rare cases.
There was a possible race condition which could lead to Input's FIFO queue
to be underflown, causing high amount of processing in the worker thread for
some period of time.
Signed-off-by: Giedrius Trainavicius <giedrius@blokas.io>
Fix for Pisound kernel module in Real Time kernel configuration.
When handler of data_available interrupt is fired, queue_work ends up
getting called and it can block on a spin lock which is not allowed in
interrupt context. The fix was to run the handler from a thread context
instead.
Pisound: Remove spinlock usage around spi_sync
ASoC: pisound: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
ASoC: pisound: fix the parameter for spi_device_match
Signed-off-by: Hui Wang <hui.wang@canonical.com>
ASoC: Add driver for Cirrus Logic Audio Card
Note: due to problems with deferred probing of regulators
the following softdep should be added to a modprobe.d file
softdep arizona-spi pre: arizona-ldo1
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: rpi-cirrus: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
sound: Support for Dion Audio LOCO-V2 DAC-AMP HAT
Signed-off-by: Miquel Blauw <info@dionaudio.nl>
ASoC: dionaudio_loco-v2: fix S24_LE format
Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.
Also remove hw_params and ops as they are no longer needed.
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: dionaudio_loco-v2: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Add support for Fe-Pi audio sound card. (#1867)
Fe-Pi Audio Sound Card is based on NXP SGTL5000 codec.
Mechanical specification of the board is the same the Raspberry Pi Zero.
3.5mm jacks for Headphone/Mic, Line In, and Line Out.
Signed-off-by: Henry Kupis <fe-pi@cox.net>
ASoC: fe-pi-audio: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Add support for the AudioInjector.net Octo sound card
AudioInjector Octo: sample rates, regulators, reset
This patch adds new sample rates to the Audioinjector Octo sound card. The
new supported rates are (in kHz) :
96, 48, 32, 24, 16, 8, 88.2, 44.1, 29.4, 22.05, 14.7
Reference the bcm270x DT regulators in the overlay.
This patch adds a reset GPIO for the AudioInjector.net octo sound card.
Audioinjector octo : Make the playback and capture symmetric
This patch ensures that the sample rate and channel count of the audioinjector
octo sound card are symmetric.
audioinjector-octo: Add continuous clock feature
By user request, add a switch to prevent the clocks being stopped when
the stream is paused, stopped or shutdown. Provide access to the switch
by adding a 'non-stop-clocks' parameter to the audioinjector-addons
overlay.
See: https://github.com/raspberrypi/linux/issues/2409
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
sound: Fixes for audioinjector-octo under 4.19
1. Move the DT alias declaration to the I2C shim in the cases
where the shim is enabled. This works around a problem caused by a
4.19 commit [1] that generates DT/OF uevents for I2C drivers.
2. Fix the diagnostics in an error path of the soundcard driver to
correctly identify the reason for the failure to load.
3. Move the declaration of the clock node in the overlay outside
the I2C node to avoid warnings.
4. Sort the overlay nodes so that dependencies are only to earlier
fragments, in an attempt to get runtime dtoverlay application to
work (it still doesn't...)
See: https://github.com/Audio-Injector/Octo/issues/14
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
[1] af503716ac ("i2c: core: report OF style module alias for devices registered via OF")
ASoC: audioinjector-octo-soundcard: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Driver support for Google voiceHAT soundcard.
ASoC: googlevoicehat-codec: Use correct device when grabbing GPIO
The fixup for the VoiceHAT in 4.18 incorrectly tried to find the
sdmode GPIO pin under the card device, not the codec device.
This failed, and therefore caused the device probe to fail.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
ASoC: googlevoicehat-codec: Reformat for kernel coding standards
Fix all whitespace, indentation, and bracing errors.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
ASoC: googlevoicehat-codec: Make driver function structure const
Make voicehat_component_driver a const structure.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
ASoC: googlevoicehat-codec: Only convert from ms to jiffies once
Minor optimisation and allows to become checkpatch clean.
A msec value is read out of DT or from a define, and convert once to
jiffies, rather than every time that it is used.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Driver and overlay for Allo Katana DAC
Allo Katana DAC: Updated default values
Signed-off-by: Jaikumar <jaikumar@cem-solutions.com>
Added mute stream func
Signed-off-by: Jaikumar <jaikumar@cem-solutions.net>
codecs: Correct Katana minimum volume
Update Katana minimum volume to get the exact 0.5 dB value in each step.
Signed-off-by: Sudeep Kumar <sudeepkumar@cem-solutions.net>
ASoC: Add generic RPI driver for simple soundcards.
The RPI simple sound card driver provides a generic ALSA SOC card driver
supporting a variety of Pi HAT soundcards. The intention is to avoid
the duplication of code for cards that can't be fully supported by
the soc simple/graph cards but are otherwise almost identical.
This initial commit adds support for the ADAU1977 ADC, Google VoiceHat,
HifiBerry AMP, HifiBerry DAC and RPI DAC.
Signed-off-by: Tim Gover <tim.gover@raspberrypi.org>
ASoC: Use correct card name in rpi-simple driver
Use the specific card name from drvdata instead of the snd_rpi_simple
rpi-simple-soundcard: Use nicer driver name "RPi-simple"
Rename the driver from "RPI simple soundcard" to "RPi-simple" so that
the driver name won't be mangled allowing to be used unaltered as the
card conf filename.
ASoC: rpi-simple-soundcard: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
ASoC: Add Kconfig and Makefile for sound/soc/bcm
Signed-off-by: popcornmix <popcornmix@gmail.com>
ASoC: Create a generic Pi Hat WM8804 driver
Reduce the amount of duplicated code by creating a generic driver for
Pi Hat digi cards using the WM8804 codec.
This replaces the
Allo DigiOne, Hifiberry Digi/Pro, JustBoom Digi and IQAudIO Digi
dedicate soundcard drivers with a generic driver.
There are no significant changes to the runtime behavior of the drivers
and end users should not have to change any configuration settings
after upgrading.
Minor changes
* Check the return value of snd_soc_component_update_bits
* Added some pr_debug tracing
* Various checkpatch tidyups
* Updated allodigi-one to use use 128FS at > 96 Khz. This appears to
be an omission in the original driver code so followed the Hifiberry
DAC driver approach.
ASoC: rpi-wm8804-soundcard: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
rpi-wm8804-soundcard: drop PWRDN register writes
Since kernel 4.0 the PWRDN register bits are under DAPM
control from the wm8804 driver.
Drop code that modifies that register to avoid interfering
with DAPM.
Signed-off-by: Matthias Reichl <hias@horus.com>
rpi-wm8804-soundcard: configure wm8804 clocks only on rate change
This should avoid clicks when stopping and immediately afterwards
starting a stream with the same samplerate as before.
Signed-off-by: Matthias Reichl <hias@horus.com>
rpi-wm8804-soundcard: Fixed MCLKDIV for Allo Digione
The Allo Digione board wants a fixed MCLKDIV of 256.
See: https://github.com/raspberrypi/linux/issues/3296
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
ASoC: Add support for AudioSense-Pi add-on soundcard
AudioSense-Pi is a RPi HAT based on a TI's TLV320AIC32x4 stereo codec
This hardware provides multiple audio I/O capabilities to the RPi.
The codec connects to the RPi's SoC through the I2S Bus.
The following devices can be connected through a 3.5mm jack
1. Line-In: Plain old audio in from mobile phones, PCs, etc.,
2. Mic-In: Connect a microphone
3. Line-Out: Connect the output to a speaker
4. Headphones: Connect a Headphone w or w/o microphones
Multiple Inputs:
It supports the following combinations
1. Two stereo Line-Inputs and a microphone
2. One stereo Line-Input and two microphones
3. Two stereo Line-Inputs, a microphone and
one mono line-input (with h/w hack)
4. One stereo Line-Input, two microphones and
one mono line-input (with h/w hack)
Multiple Outputs:
Audio output can be routed to the headphones or
speakers (with additional hardware)
Signed-off-by: b-ak <anur.bhargav@gmail.com>
ASoC: audiosense-pi: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Added driver for the HiFiBerry DAC+ ADC (#2694)
Signed-off-by: Daniel Matuschek <daniel@hifiberry.com>
hifiberry_dacplusadc: switch to snd_soc_dai_set_bclk_ratio
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: hifiberry_dacplusadc: fix DAI link setup
The driver only defines a single DAI link and the code that tries
to setup the second (non-existent) DAI link looks wrong - using dmic
as a CPU/platform driver doesn't make any sense.
The DT overlay doesn't define a dmic property, so the code was never
executed (otherwise it would have resulted in a memory corruption).
So drop the offending code to prevent issues if a dmic property
should be added to the DT overlay.
Signed-off-by: Matthias Reichl <hias@horus.com>
ASoC: hifiberry_dacplusadc: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
Audiophonics I-Sabre 9038Q2M DAC driver
Signed-off-by: Audiophonics <contact@audiophonics.fr>
ASoC: i-sabre-q2m: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Added IQaudIO Pi-Codec board support (#2969)
Add support for the IQaudIO Pi-Codec board.
Signed-off-by: Gordon <gordon@iqaudio.com>
Fixed 48k timing issue
ASoC: iqaudio-codec: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
adds the Hifiberry DAC+ADC PRO version
This adds the driver for the DAC+ADC PRO version of the Hifiberry soundcard with software controlled PCM1863 ADC
Signed-off-by: Joerg Schambacher joerg@i2audio.com
Add Hifiberry DAC+DSP soundcard driver (#3224)
Adds the driver for the Hifiberry DAC+DSP. It supports capture and
playback depending on the DSP firmware.
Signed-off-by: Joerg Schambacher <joerg@i2audio.com>
Allow simultaneous use of JustBoom DAC and Digi
Signed-off-by: Johannes Krude <johannes@krude.de>
Pisound: MIDI communication fixes for scaled down CPU.
* Increased maximum SPI communication speed to avoid running too slow
when the CPU is scaled down and losing MIDI data.
* Keep track of buffer usage in millibytes for higher precision.
Signed-off-by: Giedrius Trainavičius <giedrius@blokas.io>
sound: Add the HiFiBerry DAC+HD version
This adds the driver for the DAC+HD version supporting HiFiBerry's
PCM179x based DACs. It also adds PLL control for clock generation.
Signed-off-by: Joerg Schambacher <joerg@i2audio.com>
Fix master mode settings of HiFiBerry DAC+ADC PRO card (#3424)
This patch fixes the board DAI setting when in master-mode.
Wrong setting could have caused random pop noise.
Signed-off-by: Joerg Schambacher <joerg@i2audio.com>
adds LED OFF feature to HiFiBerry DAC+ADC PRO sound card
This adds a DT overlay parameter 'leds_off' which allows
to switch off the onboard activity LEDs at all times
which has been requested by some users.
Signed-off-by: Joerg Schambacher <joerg@i2audio.com>
adds LED OFF feature to HiFiBerry DAC+ADC sound card
This adds a DT overlay parameter 'leds_off' which allows
to switch off the onboard activity LEDs at all times
which has been requested by some users.
Signed-off-by: Joerg Schambacher <joerg@i2audio.com>
adds LED OFF feature to HiFiBerry DAC+/DAC+PRO sound cards
This adds a DT overlay parameter 'leds_off' which allows
to switch off the onboard activity LEDs at all times
which has been requested by some users.
Signed-off-by: Joerg Schambacher <joerg@i2audio.com>
pisound: Added reading Pisound board hardware revision and exposing it (#3425)
pisound: Added reading Pisound board hardware revision and exposing it in kernel log and sysfs file:
/sys/kernel/pisound/hw_version
Signed-off-by: Giedrius <giedrius@blokas.io>
Added driver for HiFiBerry Amp amplifier add-on board
The driver contains a low-level hardware driver for the TAS5713 and the
drivers for the Raspberry Pi I2S subsystem.
TAS5713: return error if initialisation fails
Existing TAS5713 driver logs errors during initialisation, but does not return
an error code. Therefore even if initialisation fails, the driver will still be
loaded, but won't work. This patch fixes this. I2C communication error will now
reported correctly by a non-zero return code.
HiFiBerry Amp: fix device-tree problems
Some code to load the driver based on device-tree-overlays was missing. This is added by this patch.
According to 5713 pdf doc CLOCK_CTRL is a readonly status register, and it behaves so. Remove useless setting
sound: pcm512x-codec: Adding 352.8kHz samplerate support
sound/soc: only first codec is master in multicodec setup
When using multiple codecs, at most one codec should generate the master
clock. All codecs except the first are therefore configured for slave
mode.
Signed-off-by: Johannes Krude <johannes@krude.de>
ASoC: Fix snd_soc_get_pcm_runtime usage
Commit [1] changed the snd_soc_get_pcm_runtime to take a dai_link
pointer instead of a string. Patch up the downstream drivers to use
the modified API.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
[1] 4468189ff3 ("ASoC: soc-core: find rtd via dai_link pointer at snd_soc_get_pcm_runtime()")
Add support for the AudioInjector.net Isolated sound card
This patch adds support for the Audio Injector Isolated sound card.
Signed-off-by: Matt Flax <flatmax@flatmax.org>
Add support for merus-amp soundcard and ma120x0p codec
Add 96KHz rate support to MA120X0P codec and make enable and mute gpio
pins optional.
Signed-off-by: AMuszkat <ariel.muszkat@gmail.com>
Fixes a problem with clock settings of HiFiBerry DAC+ADC PRO (#3545)
This patch fixes a problem of the re-calculation of
i2s-clock and -parameter settings when only the ADC is activated.
Signed-off-by: Joerg Schambacher <joerg@i2audio.com>
configs: Enable the AD193x codecs
See: https://github.com/raspberrypi/linux/issues/2850
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Switch to snd_soc_dai_set_bclk_ratio
Replaces obsolete function snd_soc_dai_set_tdm_slot
Signed-off-by: Joerg Schambacher <joerg@i2audio.com>
Enhances the DAC+ driver to control the optional headphone amplifier
Probes on the I2C bus for TPA6130A2, if successful, it sets DT-parameter
'status' from 'disabled' to 'okay' using change_sets to enable
the headphone control.
Signed-off-by: Joerg Schambacher joerg@i2audio.com
Update Allo Piano Dac Driver
Add unique names to the individual dac coded drivers
Remove some of the codec controls that are not used.
Signed-off-by: Paul Hermann <paul@picoreplayer.org>
Fixes an onboard clock detection problem of the PRO versions
Increasing the sleep time after clock selection to 3-4ms
allows the correct detection of all combinations of DAC+ Pro
and DAC+ADC Pro sound cards and the various PI revisions.
Signed-off-by: Joerg Schambacher <joerg@hifiberry.com>
ASoC:ma120x0p: Increase maximum sample rate to 192KHz
Change the maximum sample rate for the amplifier to
192KHz as given in the Infineon specification.
Signed-off-by: Joerg Schambacher <joerg@hifiberry.com>
ASoC: ma120x0p: Remove unnecessary const specifier
Clang warns:
sound/soc/codecs/ma120x0p.c:891:14: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
static const SOC_VALUE_ENUM_SINGLE_DECL(pwr_mode_ctrl,
^
./include/sound/soc.h:362:2: note: expanded from macro 'SOC_VALUE_ENUM_SINGLE_DECL'
SOC_VALUE_ENUM_DOUBLE_DECL(name, xreg, xshift, xshift, xmask, xtexts, xvalues)
^
./include/sound/soc.h:359:2: note: expanded from macro 'SOC_VALUE_ENUM_DOUBLE_DECL'
const struct soc_enum name = SOC_VALUE_ENUM_DOUBLE(xreg, xshift_l, xshift_r, xmask, \
^
1 warning generated.
SOC_VALUE_ENUM_DOUBLE_DECL already has a const specifier. Remove the duplicate
const to clean up the warning.
Fixes: 42444979e7 ("Add support for all the downstream rpi sound card drivers")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
ASoC: bcm: allo-piano-dac-plus: Remove unnecessary const specifiers
Clang warns:
sound/soc/bcm/allo-piano-dac-plus.c:66:14: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
static const SOC_ENUM_SINGLE_DECL(allo_piano_mode_enum,
^
./include/sound/soc.h:355:2: note: expanded from macro 'SOC_ENUM_SINGLE_DECL'
SOC_ENUM_DOUBLE_DECL(name, xreg, xshift, xshift, xtexts)
^
./include/sound/soc.h:352:2: note: expanded from macro 'SOC_ENUM_DOUBLE_DECL'
const struct soc_enum name = SOC_ENUM_DOUBLE(xreg, xshift_l, xshift_r, \
^
sound/soc/bcm/allo-piano-dac-plus.c:75:14: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
static const SOC_ENUM_SINGLE_DECL(allo_piano_dual_mode_enum,
^
./include/sound/soc.h:355:2: note: expanded from macro 'SOC_ENUM_SINGLE_DECL'
SOC_ENUM_DOUBLE_DECL(name, xreg, xshift, xshift, xtexts)
^
./include/sound/soc.h:352:2: note: expanded from macro 'SOC_ENUM_DOUBLE_DECL'
const struct soc_enum name = SOC_ENUM_DOUBLE(xreg, xshift_l, xshift_r, \
^
sound/soc/bcm/allo-piano-dac-plus.c:96:14: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
static const SOC_ENUM_SINGLE_DECL(allo_piano_enum,
^
./include/sound/soc.h:355:2: note: expanded from macro 'SOC_ENUM_SINGLE_DECL'
SOC_ENUM_DOUBLE_DECL(name, xreg, xshift, xshift, xtexts)
^
./include/sound/soc.h:352:2: note: expanded from macro 'SOC_ENUM_DOUBLE_DECL'
const struct soc_enum name = SOC_ENUM_DOUBLE(xreg, xshift_l, xshift_r, \
^
3 warnings generated.
SOC_VALUE_ENUM_DOUBLE_DECL already has a const specifier. Remove the duplicate
const specifiers to clean up the warnings.
Fixes: 42444979e7 ("Add support for all the downstream rpi sound card drivers")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
rpi-simple-soundcard: Add Dion Audio KIWI streamer
Signed-off-by: Miquel Blauw <miquelblauw@hotmail.com>
rpi-simple-soundcard: adds definitions for the HiFiBerry AMP3 card
Uses Infineon MA120x0 amplifier and supports full sample rate of 192ksps.
Signed-off-by: Joerg Schambacher <joerg@hifiberry.com>
sound: soc: bcm: Added Sound card driver for Dacberry400 Audio card for Raspberry Pi 400
Added Sound card driver for DACberry400 Audio card.
Signed-off-by: Ashish Vara <ashishhvara@gmail.com>
ASoC:ma120x0p: Corrects the volume level display
Fixes the wrongly changed 'limiter volume' display back to -50dB minimum
and sets the correct minimum volume level to -144dB to be aligned with
the controls and display in alsamixer etc.
Signed-off-by: Joerg Schambacher <joerg@hifiberry.com>
ASoC: bcm: Fix Rpi-PROTO and audioinjector.net Pi
As of kernel 5.19 the WM8731 driver has separate I2C and SPI support
modules. Change the Kconfig definitions for the audioinjector.net Pi
and Rpi-PROTO soundcards to select SND_SOC_WM8731_I2C.
See: https://github.com/raspberrypi/linux/issues/5364
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
ASoC: adau1977: Add correct compatible strings
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The Raspberry Pi firmware manages the power-down and reboot
process. To do this it installs a pm_power_off handler, causing
the gpio-poweroff module to abort the probe function.
This patch introduces a "force" DT property that overrides that
behaviour, and also adds a DT overlay to enable and control it.
Note that running in an active-low configuration (DT parameter
"active_low") requires a custom dt-blob.bin and probably won't
allow a reboot without switching off, so an external inversion
of the trigger signal may be preferable.
Provide a __copy_from_user that uses memcpy. On BCM2708, use
optimised memcpy/memmove/memcmp/memset implementations.
arch/arm: Add mmiocpy/set aliases for memcpy/set
See: https://github.com/raspberrypi/linux/issues/1082
copy_from_user: CPU_SW_DOMAIN_PAN compatibility
The downstream copy_from_user acceleration must also play nice with
CONFIG_CPU_SW_DOMAIN_PAN.
See: https://github.com/raspberrypi/linux/issues/1381
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Fix copy_from_user if BCM2835_FAST_MEMCPY=n
The change which introduced CONFIG_BCM2835_FAST_MEMCPY unconditionally
changed the behaviour of arm_copy_from_user. The page pinning code
is not safe on ARMv7 if LPAE & high memory is enabled and causes
crashes which look like PTE corruption.
Make __copy_from_user_memcpy conditional on CONFIG_2835_FAST_MEMCPY=y
which is really an ARMv6 / Pi1 optimization and not necessary on newer
ARM processors.
arm: fix mmap unlocks in uaccess_with_memcpy.c
This is a regression that was added with the commit 192a4e923e as of rpi-5.8.y, since that is when the move to the mmap locking API was introduced - d8ed45c5dc
The issue is that when the patch to improve performance for the __copy_to_user and __copy_from_user functions were added for the Raspberry Pi, some of the mmaps were incorrectly mapped to write instead of read. This would cause a verity of issues, and in my case, prevent the booting of a squashfs filesystem on rpi-5.8-y and above. An example of the panic you would see from this can be seen at https://pastebin.com/raw/jBz5xCzL
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Christopher Blake <chrisrblake93@gmail.com>
arch/arm: Add __memset alias to memset_rpi.S
memset_rpi.S is an optimised memset implementation, but doesn't define
__memset (which was just added to memset.S). As a result, building
for the BCM2835 platform causes a link failure.
Add __memset as yet another alias to our common implementation.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
arm: Fix custom rpi __memset32 and __memset64
See: https://github.com/raspberrypi/linux/issues/4798
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
arm: Fix annoying .eh_frame section warnings
Replace the cfi directives with the UNWIND equivalents. This prevents
the .eh_frame section from being created, eliminating the warnings.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The "input" trigger makes the associated GPIO an input. This is to support
the Raspberry Pi PWR LED, which is driven by external hardware in normal use.
N.B. pwr_led is not available on Model A or B boards.
leds-gpio: Implement the brightness_get method
The power LED uses some clever logic that means it is driven
by a voltage measuring circuit when configured as input, otherwise
it is driven by the GPIO output value. This patch wires up the
brightness_get method for leds-gpio so that user-space can monitor
the LED value via /sys/class/gpio/led1/brightness. Using the input
trigger this returns an indication of the system power health,
otherwise it is just whatever value the trigger has written most
recently.
See: https://github.com/raspberrypi/linux/issues/1064
Support booting without Device Tree.
Turn on USB power.
Load driver early because of lacking support for deferred probing
in many drivers.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
firmware: bcm2835: Don't turn on USB power
The raspberrypi-power driver is now used to turn on USB power.
This partly reverts commit:
firmware: bcm2835: Support ARCH_BCM270x
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Add module for accessing the mailbox property channel through
/dev/vcio. Was previously in bcm2708-vcio.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
char: vcio: Add compat ioctl handling
There was no compat ioctl handler, so 32 bit userspace on a
64 bit kernel failed as IOCTL_MBOX_PROPERTY used the size
of char*.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
char: vcio: Fail probe if rpi_firmware is not found.
Device Tree is now the only supported config mechanism, therefore
uncomment the block of code that fails the probe if the
firmware node can't be found.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
drivers: char: vcio: Use common compat header
The definition of compat_ptr is now common for most platforms, but
requires the inclusion of <linux/compat.h>.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
char: vcio: Rewrite as a firmware node child
The old vcio driver is a simple character device that manually locates
the firmware driver. Initialising it before the firmware driver causes
a failure, and no retries are attempted.
Rewrite vcio as a platform driver that depends on a DT node for its
instantiation and the location of the firmware driver, making use of
the miscdevice framework to reduce the code size.
N.B. Using miscdevice changes the udev SUBSYSTEM string, so a change
to the companion udev rule is required in order to continue to set
the correct device permissions, e.g.:
KERNEL="vcio", GROUP="video", MODE="0660"
See: https://github.com/raspberrypi/linux/issues/4620
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
i2c-bcm2708: fixed baudrate
Fixed issue where the wrong CDIV value was set for baudrates below 3815 Hz (for 250MHz bus clock).
In that case the computed CDIV value was more than 0xffff. However the CDIV register width is only 16 bits.
This resulted in incorrect setting of CDIV and higher baudrate than intended.
Example: 3500Hz -> CDIV=0x11704 -> CDIV(16bit)=0x1704 -> 42430Hz
After correction: 3500Hz -> CDIV=0x11704 -> CDIV(16bit)=0xffff -> 3815Hz
The correct baudrate is shown in the log after the cdiv > 0xffff correction.
Perform I2C combined transactions when possible
Perform I2C combined transactions whenever possible, within the
restrictions of the Broadcomm Serial Controller.
Disable DONE interrupt during TA poll
Prevent interrupt from being triggered if poll is missed and transfer
starts and finishes.
i2c: Make combined transactions optional and disabled by default
i2c: bcm2708: add device tree support
Add DT support to driver and add to .dtsi file.
Setup pins in .dts file.
i2c is disabled by default.
Signed-off-by: Noralf Tronnes <notro@tronnes.org>
bcm2708: don't register i2c controllers when using DT
The devices for the i2c controllers are in the Device Tree.
Only register devices when not using DT.
Signed-off-by: Noralf Tronnes <notro@tronnes.org>
I2C: Only register the I2C device for the current board revision
i2c_bcm2708: Fix clock reference counting
Fix grabbing lock from atomic context in i2c driver
2 main changes:
- check for timeouts in the bcm2708_bsc_setup function as indicated by this comment:
/* poll for transfer start bit (should only take 1-20 polls) */
This implies that the setup function can now fail so account for this everywhere it's called
- Removed the clk_get_rate call from inside the setup function as it locks a mutex and that's not ok since we call it from under a spin lock.
i2c-bcm2708: When using DT, leave the GPIO setup to pinctrl
i2c-bcm2708: Increase timeouts to allow larger transfers
Use the timeout value provided by the I2C_TIMEOUT ioctl when waiting
for completion. The default timeout is 1 second.
See: https://github.com/raspberrypi/linux/issues/260
i2c-bcm2708/BCM270X_DT: Add support for I2C2
The third I2C bus (I2C2) is normally reserved for HDMI use. Careless
use of this bus can break an attached display - use with caution.
It is recommended to disable accesses by VideoCore by setting
hdmi_ignore_edid=1 or hdmi_edid_file=1 in config.txt.
The interface is disabled by default - enable using the
i2c2_iknowwhatimdoing DT parameter.
bcm2708-spi: Don't use static pin configuration with DT
Also remove superfluous error checking - the SPI framework ensures the
validity of the chip_select value.
i2c-bcm2708: Remove non-DT support
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Set the BSC_CLKT clock streching timeout to 35ms as per SMBus specs.
Fixes i2c_bcm2708: Write to FIFO correctly - v2 (#1574)
* i2c: fix i2c_bcm2708: Clear FIFO before sending data
Make sure FIFO gets cleared before trying to send
data in case of a repeated start (COMBINED=Y).
* i2c: fix i2c_bcm2708: Only write to FIFO when not full
Check if FIFO can accept data before writing.
To avoid a peripheral read on the last iteration of a loop,
both bcm2708_bsc_fifo_fill and ~drain are changed as well.
Signed-off-by: Luke Wren <wren6991@gmail.com>
MISC: bcm2835: smi: use clock manager and fix reload issues
Use clock manager instead of self-made clockmanager.
Also fix some error paths that showd up during development
(especially missing release of dma resources on rmmod)
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
bcm2835_smi: re-add dereference to fix DMA transfers
bcm2835_smi_dev: Fix handling of word-odd lengths
The read and write functions did not use the correct pointer offset
when dealing with an odd number of bytes after a DMA transfer. Also,
only handle the remaining odd bytes if the DMA transfer completed
successfully.
Submitted-by: @madimario (GitHub)
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
misc: bcm2835_smi: Use proper enum types for dma_{,un}map_single()
Clang warns:
drivers/misc/bcm2835_smi.c:692:4: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion]
DMA_MEM_TO_DEV);
^~~~~~~~~~~~~~~
./include/linux/dma-mapping.h:406:66: note: expanded from macro 'dma_map_single'
#define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
~~~~~~~~~~~~~~~~~~~~ ^
drivers/misc/bcm2835_smi.c:705:35: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion]
(inst->dev, phy_addr, n_bytes, DMA_MEM_TO_DEV);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
./include/linux/dma-mapping.h:407:70: note: expanded from macro 'dma_unmap_single'
#define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
~~~~~~~~~~~~~~~~~~~~~~ ^
drivers/misc/bcm2835_smi.c:751:12: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion]
DMA_DEV_TO_MEM);
^~~~~~~~~~~~~~~
./include/linux/dma-mapping.h:406:66: note: expanded from macro 'dma_map_single'
#define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
~~~~~~~~~~~~~~~~~~~~ ^
drivers/misc/bcm2835_smi.c:761:50: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion]
dma_unmap_single(inst->dev, phy_addr, n_bytes, DMA_DEV_TO_MEM);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
./include/linux/dma-mapping.h:407:70: note: expanded from macro 'dma_unmap_single'
#define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
~~~~~~~~~~~~~~~~~~~~~~ ^
4 warnings generated.
Use the proper enumerated type to clear up the warning. There is not
actually a bug here because the enumerated types have the same integer
value:
DMA_MEM_TO_DEV = DMA_TO_DEVICE = 1
DMA_DEV_TO_MEM = DMA_FROM_DEVICE = 2
Fixes: 93254d0f7b ("Add SMI driver")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: popcornmix <popcornmix@gmail.com>
BCM270x: Move vc_mem
Make the vc_mem module available for ARCH_BCM2835 by moving it.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
char: vc_mem: Fix up compat ioctls for 64bit kernel
compat_ioctl wasn't defined, so 32bit user/64bit kernel
always failed.
VC_MEM_IOC_MEM_PHYS_ADDR was defined with parameter size
unsigned long, so the ioctl cmd changes between sizes.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
char: vc_mem: Fix all coding style issues.
Cleans up all checkpatch errors in vc_mem.c and vc_mem.h
No functional change to the code.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
char: vc_mem: Delete dead code
There are no error exists once device_create has succeeded, and
therefore no need to call device_destroy from vc_mem_init.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
BCM2835 has two SD card interfaces. This driver uses the other one.
bcm2835-sdhost: Error handling fix, and code clarification
bcm2835-sdhost: Adding overclocking option
Allow a different clock speed to be substitued for a requested 50MHz.
This option is exposed using the "overclock_50" DT parameter.
Note that the sdhost interface is restricted to integer divisions of
core_freq, and the highest sensible option for a core_freq of 250MHz
is 84 (250/3 = 83.3MHz), the next being 125 (250/2) which is much too
high.
Use at your own risk.
bcm2835-sdhost: Round up the overclock, so 62 works for 62.5Mhz
Also only warn once for each overclock setting.
bcm2835-sdhost: Improve error handling and recovery
1) Expose the hw_reset method to the MMC framework, removing many
internal calls by the driver.
2) Reduce overclock setting on error.
3) Increase timeout to cope with high capacity cards.
4) Add properties and parameters to control pio_limit and debug.
5) Reduce messages at probe time.
bcm2835-sdhost: Further improve overclock back-off
bcm2835-sdhost: Clear HBLC for PIO mode
Also update pio_limit default in overlay README.
bcm2835-sdhost: Add the ERASE capability
See: https://github.com/raspberrypi/linux/issues/1076
bcm2835-sdhost: Ignore CRC7 for MMC CMD1
It seems that the sdhost interface returns CRC7 errors for CMD1,
which is the MMC-specific SEND_OP_COND. Returning these errors to
the MMC layer causes a downward spiral, but ignoring them seems
to be harmless.
bcm2835-mmc/sdhost: Remove ARCH_BCM2835 differences
The bcm2835-mmc driver (and -sdhost driver that copied from it)
contains code to handle SDIO interrupts in a threaded interrupt
handler rather than waking the MMC framework thread. The change
follows a patch from Russell King that adds the facility as the
preferred way of working.
However, the new code path is only present in ARCH_BCM2835
builds, which I have taken to be a way of testing the waters
rather than making the change across the board; I can't see
any technical reason why it wouldn't be enabled for MACH_BCM270X
builds. So this patch standardises on the ARCH_BCM2835 code,
removing the old code paths.
bcm2835-sdhost: Don't log timeout errors unless debug=1
The MMC card-discovery process generates timeouts. This is
expected behaviour, so reporting it to the user serves no purpose.
Suppress the reporting of timeout errors unless the debug flag
is on.
bcm2835-sdhost: Add workaround for odd behaviour on some cards
For reasons not understood, the sdhost driver fails when reading
sectors very near the end of some SD cards. The problem could
be related to the similar issue that reading the final sector
of any card as part of a multiple read never completes, and the
workaround is an extension of the mechanism introduced to solve
that problem which ensures those sectors are always read singly.
bcm2835-sdhost: Major revision
This is a significant revision of the bcm2835-sdhost driver. It
improves on the original in a number of ways:
1) Through the use of CMD23 for reads it appears to avoid problems
reading some sectors on certain high speed cards.
2) Better atomicity to prevent crashes.
3) Higher performance.
4) Activity logging included, for easier diagnosis in the event
of a problem.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: Restore ATOMIC flag to PIO sg mapping
Allocation problems have been seen in a wireless driver, and
this is the only change which might have been responsible.
SQUASH: bcm2835-sdhost: Only claim one DMA channel
With both MMC controllers enabled there are few DMA channels left. The
bcm2835-sdhost driver only uses DMA in one direction at a time, so it
doesn't need to claim two channels.
See: https://github.com/raspberrypi/linux/issues/1327
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: Workaround for "slow" sectors
Some cards have been seen to cause timeouts after certain sectors are
read. This workaround enforces a minimum delay between the stop after
reading one of those sectors and a subsequent data command.
Using CMD23 (SET_BLOCK_COUNT) avoids this problem, so good cards will
not be penalised by this workaround.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: Firmware manages the clock divisor
The bcm2835-sdhost driver hands control of the CDIV clock divisor
register to matching firmware, allowing it to adjust to a changing
core clock. This removes the need to use the performance governor or
to enable io_is_busy on the on-demand governor in order to get the
best SD performance.
N.B. As SD clocks must be an integer divisor of the core clock, it is
possible that the SD clock for "turbo" mode can be different (even
lower) than "normal" mode.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: Reset the clock in task context
Since reprogramming the clock can now involve a round-trip to the
firmware it must not be done at atomic context, and a tasklet
is not a task.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: Don't exit cmd wait loop on error
The FAIL flag can be set in the CMD register before command processing
is complete, leading to spurious "failed to complete" errors. This has
the effect of promoting harmless CRC7 errors during CMD1 processing
into errors that can delay and even prevent booting.
Also:
1) Convert the last KERN_ERROR message in the register dumping to
KERN_INFO.
2) Remove an unnecessary reset call from bcm2835_sdhost_add_host.
See: https://github.com/raspberrypi/linux/pull/1492
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: mmc_card_blockaddr fix
Get the definition of mmc_card_blockaddr from drivers/mmc/core/card.h.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: New timer API
mmc: bcm2835-sdhost: Support underclocking
Support underclocking of the SD bus in two ways:
1. using the max-frequency DT property (which currently has no DT
parameter), and
2. using the exiting sd_overclock parameter.
The two methods differ slightly - in the former the MMC subsystem is
aware of the underclocking, while in the latter it isn't - but the
end results should be the same.
See: https://github.com/raspberrypi/linux/issues/2350
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
mmc: bcm2835-sdhost: Add include
highmem.h (needed for kmap_atomic) is pulled in by one of the other
include files, but only with some CONFIG settings. Make the inclusion
explicit to cater for cases where the CONFIG setting is absent.
See: https://github.com/raspberrypi/linux/issues/2366
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
mmc/bcm2835-sdhost: Recover from MMC_SEND_EXT_CSD
If the user issues an "mmc extcsd read", the SD controller receives
what it thinks is a SEND_IF_COND command with an unexpected data block.
The resulting operations leave the FSM stuck in READWAIT, a state which
persists until the MMC framework resets the controller, by which point
the root filesystem is likely to have been unmounted.
A less heavyweight solution is to detect the condition and nudge the
FSM by asserting the (self-clearing) FORCE_DATA_MODE bit.
N.B. This workaround was essentially discovered by accident and without
a full understanding the inner workings of the controller, so it is
fortunate that the "fix" only modifies error paths.
See: https://github.com/raspberrypi/linux/issues/2728
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
mmc: bcm2835-sdhost: Fix warnings on arm64
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: Allow for sg entries that cross pages
The dma_complete handling code calculates a virtual address for a page
then adds an offset, but if the offset is more than a page and HIGHMEM
is in use then the summed address could be in an unmapped (or just
incorrect) page.
The upstream SDHOST driver allows for this possibility - copy the code
that does so.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: Fix DMA channel leak on error/remove
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
mmc: bcm2835-sdhost: Support 64-bit physical addresses
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-sdhost: Replace obsolete struct timeval
struct timeval has been retired due to the impending linux 32-bit tv_sec
rollover (only 18 years to go) - timespec64 is the obvious replacement.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
mmc: sdhost: Pass DT pointer to rpi_firmware_get
Using the rpi_firmware API as intended allows proper reference counting
of the firmware device and means we can remove a downstream patch to
the firmware driver.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
See https://github.com/raspberrypi/linux/issues/5019
If an SD card has degraded performance such that IO operations time out
then the MMC block layer will leak SG DMA mappings in the swiotlb during
recovery. It retries the same SG and this causes the leak, as it is
mapped twice - once in sdhci_pre_req() and again during single-block
reads in sdhci_prepare_data().
Resetting the card (including power-cycling if a regulator for vmmc is
present) ought to be enough to recover a stuck state, so for now don't
try single-block reads in the recovery path.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
mmc: Disable CMD23 transfers on all cards
Pending wire-level investigation of these types of transfers
and associated errors on bcm2835-mmc, disable for now. Fallback of
CMD18/CMD25 transfers will be used automatically by the MMC layer.
Reported/Tested-by: Gellert Weisz <gellert@raspberrypi.org>
mmc: bcm2835-mmc: enable DT support for all architectures
Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now.
Enable Device Tree support for all architectures.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
mmc: bcm2835-mmc: fix probe error handling
Probe error handling is broken in several places.
Simplify error handling by using device managed functions.
Replace pr_{err,info} with dev_{err,info}.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2835-mmc: Add locks when accessing sdhost registers
bcm2835-mmc: Add range of debug options for slowing things down
bcm2835-mmc: Add option to disable some delays
bcm2835-mmc: Add option to disable MMC_QUIRK_BLK_NO_CMD23
bcm2835-mmc: Default to disabling MMC_QUIRK_BLK_NO_CMD23
bcm2835-mmc: Adding overclocking option
Allow a different clock speed to be substitued for a requested 50MHz.
This option is exposed using the "overclock_50" DT parameter.
Note that the mmc interface is restricted to EVEN integer divisions of
250MHz, and the highest sensible option is 63 (250/4 = 62.5), the
next being 125 (250/2) which is much too high.
Use at your own risk.
bcm2835-mmc: Round up the overclock, so 62 works for 62.5Mhz
Also only warn once for each overclock setting.
mmc: bcm2835-mmc: Make available on ARCH_BCM2835
Make the bcm2835-mmc driver available for use on ARCH_BCM2835.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
BCM270x_DT: add bcm2835-mmc entry
Add Device Tree entry for bcm2835-mmc.
In non-DT mode, don't add the device in the board file.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2835-mmc: Don't overwrite MMC capabilities from DT
bcm2835-mmc: Don't override bus width capabilities from devicetree
Take out the force setting of the MMC_CAP_4_BIT_DATA host capability
so that the result read from devicetree via mmc_of_parse() is
preserved.
bcm2835-mmc: Only claim one DMA channel
With both MMC controllers enabled there are few DMA channels left. The
bcm2835-mmc driver only uses DMA in one direction at a time, so it
doesn't need to claim two channels.
See: https://github.com/raspberrypi/linux/issues/1327
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-mmc: New timer API
mmc: bcm2835-mmc: Support underclocking
Support underclocking of the SD bus using the max-frequency DT property
(which currently has no DT parameter). The sd_overclock parameter
already provides another way to achieve the same thing which should be
equivalent in end result, but it is a bug not to support max-frequency
as well.
See: https://github.com/raspberrypi/linux/issues/2350
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
mmc/bcm2835: Recover from MMC_SEND_EXT_CSD
If the user issues an "mmc extcsd read", the SD controller receives
what it thinks is a SEND_IF_COND command with an unexpected data block.
The resulting operations leave the FSM stuck in READWAIT, a state which
persists until the MMC framework resets the controller, by which point
the root filesystem is likely to have been unmounted.
A less heavyweight solution is to detect the condition and nudge the
FSM by asserting the (self-clearing) FORCE_DATA_MODE bit.
N.B. This workaround was essentially discovered by accident and without
a full understanding the inner workings of the controller, so it is
fortunate that the "fix" only modifies error paths.
See: https://github.com/raspberrypi/linux/issues/2728
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-mmc: Fix DMA channel leak
The BCM2835 MMC host driver requests a DMA channel on probe but neglects
to release the channel in the probe error path and on driver unbind.
I'm seeing this happen on every boot of the Compute Module 3: On first
driver probe, DMA channel 2 is allocated and then leaked with a "could
not get clk, deferring probe" message. On second driver probe, channel 4
is allocated.
Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835-mmc: Fix struct mmc_host leak on probe
The BCM2835 MMC host driver requests the bus address of the host's
register map on probe. If that fails, the driver leaks the struct
mmc_host allocated earlier.
Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835-mmc: Fix duplicate free_irq() on remove
The BCM2835 MMC host driver requests its interrupt as a device-managed
resource, so the interrupt is automatically freed after the driver is
unbound.
However on driver unbind, bcm2835_mmc_remove() frees the interrupt
explicitly to avoid invocation of the interrupt handler after driver
structures have been torn down.
The interrupt is thus freed twice, leading to a WARN splat in
__free_irq(). Fix by not requesting the interrupt as a device-managed
resource.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835-mmc: Handle mmc_add_host() errors
The BCM2835 MMC host driver calls mmc_add_host() but doesn't check its
return value. Errors occurring in that function are therefore not
handled. Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835-mmc: Deduplicate reset of driver data on remove
The BCM2835 MMC host driver sets the device's driver data pointer to
NULL on ->remove() even though the driver core subsequently does the
same in __device_release_driver(). Drop the duplicate assignment.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835_mmc: Remove vestigial threaded IRQ
With SDIO processing now managed by the MMC framework with a
workqueue, the bcm2835_mmc driver no longer needs a threaded
IRQ.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Add missing dma_unmap_sg calls to free relevant swiotlb bounce buffers.
This prevents DMA leaks.
Signed-off-by: Yaroslav Rosomakho <yaroslavros@gmail.com>
Limit max_req_size under arm64 (or any other platform that uses swiotlb) to prevent potential buffer overflow due to bouncing.
Signed-off-by: Yaroslav Rosomakho <yaroslavros@gmail.com>
mmc: sdhci: Silence MMC warnings
When the MMC isn't plugged in, the driver will spam the console which is
pretty annoying when using NFS.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
mmc: sdhci-iproc: Fix vmmc regulators on iProc
The Linux support for controlling card power via regulators appears to
be contentious. I would argue that the default behaviour is contrary to
the SDHCI spec - turning off the power writes a reserved value to the
SD Bus Voltage Select field of the Power Control Register, which
seems to kill the Arasan/iProc controller - but fortunately there is a
hook in sdhci_ops to override the behaviour. Borrow the implementation
from sdhci_arasan_set_power.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-mmc: uninitialized_var is no more
Revert "mmc: sdhci-iproc: Fix vmmc regulators on iProc"
This reverts commit aed19399a0.
Commit 6c92ae1e45 ("mmc: sdhci: Introduce sdhci_set_power_and_bus_voltage()")
introduced a generic helper that does the same thing so use that instead in
the following commit.
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
mmc: sdhci-iproc: Fix vmmc regulators (pre-bcm2711)
The Linux support for controlling card power via regulators appears to
be contentious. I would argue that the default behaviour is contrary to
the SDHCI spec - turning off the power writes a reserved value to the
SD Bus Voltage Select field of the Power Control Register, which
seems to kill the Arasan/iProc controller - but fortunately there is a
hook in sdhci_ops to override the behaviour.
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
bcm2835-mmc: Honor return value of mmc_of_parse()
bcm2835_mmc_probe() ignores errors returned by mmc_of_parse() and in
particular ignores -EPROBE_DEFER, which may be returned if the power
sequencing driver configured in the devicetree is compiled as a module.
The user-visible result is that access to the SDIO device fails because
its power sequencing requirements have not been observed. Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Add support for DMA controller of BCM2708 as used in the Raspberry Pi.
Currently it only supports cyclic DMA.
Signed-off-by: Florian Meier <florian.meier@koalo.de>
dmaengine: expand functionality by supporting scatter/gather transfers sdhci-bcm2708 and dma.c: fix for LITE channels
DMA: fix cyclic LITE length overflow bug
dmaengine: bcm2708: Remove chancnt affectations
Mirror bcm2835-dma.c commit 9eba5536a7:
chancnt is already filled by dma_async_device_register, which uses the channel
list to know how much channels there is.
Since it's already filled, we can safely remove it from the drivers' probe
function.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: overwrite dreq only if it is not set
dreq is set when the DMA channel is fetched from Device Tree.
slave_id is set using dmaengine_slave_config().
Only overwrite dreq with slave_id if it is not set.
dreq/slave_id in the cyclic DMA case is not touched, because I don't
have hardware to test with.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: do device registration in the board file
Don't register the device in the driver. Do it in the board file.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: don't restrict DT support to ARCH_BCM2835
Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now.
Add Device Tree support to the non ARCH_BCM2835 case.
Use the same driver name regardless of architecture.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
BCM270x_DT: add bcm2835-dma entry
Add Device Tree entry for bcm2835-dma.
The entry doesn't contain any resources since they are handled
by the arch/arm/mach-bcm270x/dma.c driver.
In non-DT mode, don't add the device in the board file.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2708-dmaengine: Add debug options
BCM270x: Add memory and irq resources to dmaengine device and DT
Prepare for merging of the legacy DMA API arch driver dma.c
with bcm2708-dmaengine by adding memory and irq resources both
to platform file device and Device Tree node.
Don't use BCM_DMAMAN_DRIVER_NAME so we don't have to include mach/dma.h
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: Merge with arch dma.c driver and disable dma.c
Merge the legacy DMA API driver with bcm2708-dmaengine.
This is done so we can use bcm2708_fb on ARCH_BCM2835 (mailbox
driver is also needed).
Changes to the dma.c code:
- Use BIT() macro.
- Cutdown some comments to one line.
- Add mutex to vc_dmaman and use this, since the dev lock is locked
during probing of the engine part.
- Add global g_dmaman variable since drvdata is used by the engine part.
- Restructure for readability:
vc_dmaman_chan_alloc()
vc_dmaman_chan_free()
bcm_dma_chan_free()
- Restructure bcm_dma_chan_alloc() to simplify error handling.
- Use device irq resources instead of hardcoded bcm_dma_irqs table.
- Remove dev_dmaman_register() and code it directly.
- Remove dev_dmaman_deregister() and code it directly.
- Simplify bcm_dmaman_probe() using devm_* functions.
- Get dmachans from DT if available.
- Keep 'dma.dmachans' module argument name for backwards compatibility.
Make it available on ARCH_BCM2835 as well.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: set residue_granularity field
bcm2708-dmaengine supports residue reporting at burst level
but didn't report this via the residue_granularity field.
Without this field set properly we get playback issues with I2S cards.
dmaengine: bcm2708-dmaengine: Fix memory leak when stopping a running transfer
bcm2708-dmaengine: Use more DMA channels (but not 12)
1) Only the bcm2708_fb drivers uses the legacy DMA API, and
it requires a BULK-capable channel, so all other types
(FAST, NORMAL and LITE) can be made available to the regular
DMA API.
2) DMA channels 11-14 share an interrupt. The driver can't
handle this, so don't use channels 12-14 (12 was used, probably
because it appears to have an interrupt, but in reality that
interrupt is for activity on ANY channel). This may explain
a lockup encountered when running out of DMA channels.
The combined effect of this patch is to leave 7 DMA channels
available + channel 0 for bcm2708_fb via the legacy API.
See: https://github.com/raspberrypi/linux/issues/1110https://github.com/raspberrypi/linux/issues/1108
dmaengine: bcm2708: Make legacy API available for bcm2835-dma
bcm2708_fb uses the legacy DMA API, so in order to start using
bcm2835-dma, bcm2835-dma has to support the legacy API. Make this
possible by exporting bcm_dmaman_probe() and bcm_dmaman_remove().
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: Change DT compatible string
Both bcm2835-dma and bcm2708-dmaengine have the same compatible string.
So change compatible to "brcm,bcm2708-dma".
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: Remove driver but keep legacy API
Dropping non-DT support means we don't need this driver,
but we still need the legacy DMA API.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2708-dmaengine - Fix arm64 portability/build issues
dma-bcm2708: Fix module compilation of CONFIG_DMA_BCM2708
bcm2708-dmaengine.c defines functions like bcm_dma_start which are
defined as well in dma-bcm2708.h as inline versions when
CONFIG_DMA_BCM2708 is not defined. This works fine when
CONFIG_DMA_BCM2708 is built in, but when it is selected as module build
fails with redefinition errors because in the build system when
CONFIG_DMA_BCM2708 is selected as module, the macro becomes
CONFIG_DMA_BCM2708_MODULE.
This patch makes the header use CONFIG_DMA_BCM2708_MODULE too when
available.
Fixes https://github.com/raspberrypi/linux/issues/2056
Signed-off-by: Andrei Gherzan <andrei@gherzan.com>
bcm2708-dmaengine: Use platform_get_irq
The platform driver framework no longer creates IRQ resources for
platform devices because they are expected to use platform_get_irq.
This causes the bcm2808_fb acceleration to fail.
Fix the problem by calling platform_get_irq as intended.
See: https://github.com/raspberrypi/linux/issues/5131
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Based on the patch authored by Ali Gholami Rudi at
https://lkml.org/lkml/2009/7/13/153
Provide an ioctl for userspace applications, but only if this operation
is hardware accelerated (otherwide it does not make any sense).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
bcm2708_fb: Add ioctl for reading gpu memory through dma
video: bcm2708_fb: Add compat_ioctl support.
When using a 64 bit kernel with 32 bit userspace we need
compat ioctl handling for FBIODMACOPY as one of the
parameters is a pointer.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Signed-off-by: popcornmix <popcornmix@gmail.com>
bcm2708_fb : Implement blanking support using the mailbox property interface
bcm2708_fb: Add pan and vsync controls
bcm2708_fb: DMA acceleration for fb_copyarea
Based on http://www.raspberrypi.org/phpBB3/viewtopic.php?p=62425#p62425
Also used Simon's dmaer_master module as a reference for tweaking DMA
settings for better performance.
For now busylooping only. IRQ support might be added later.
With non-overclocked Raspberry Pi, the performance is ~360 MB/s
for simple copy or ~260 MB/s for two-pass copy (used when dragging
windows to the right).
In the case of using DMA channel 0, the performance improves
to ~440 MB/s.
For comparison, VFP optimized CPU copy can only do ~114 MB/s in
the same conditions (hindered by reading uncached source buffer).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
bcm2708_fb: report number of dma copies
Add a counter (exported via debugfs) reporting the
number of dma copies that the framebuffer driver
has done, in order to help evaluate different
optimization strategies.
Signed-off-by: Luke Diamand <luked@broadcom.com>
bcm2708_fb: use IRQ for DMA copies
The copyarea ioctl() uses DMA to speed things along. This
was busy-waiting for completion. This change supports using
an interrupt instead for larger transfers. For small
transfers, busy-waiting is still likely to be faster.
Signed-off-by: Luke Diamand <luke@diamand.org>
bcm2708: Make ioctl logging quieter
video: fbdev: bcm2708_fb: Don't panic on error
No need to panic the kernel if the video driver fails.
Just print a message and return an error.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
fbdev: bcm2708_fb: Add ARCH_BCM2835 support
Add Device Tree support.
Pass the device to dma_alloc_coherent() in order to get the
correct bus address on ARCH_BCM2835.
Use the new DMA legacy API header file.
Including <mach/platform.h> is not necessary.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
BCM270x_DT: Add bcm2708-fb device
Add bcm2708-fb to Device Tree and don't add the
platform device when booting in DT mode.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Cleanup of bcm2708_fb file to kernel coding standards
Some minor change to function - remove a use of
in_atomic, plus replacing various debug messages
that manually specify the function name with
("%s",.__func__)
Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
video: bcm2708_fb: Try allocating on the ARM and passing to VPU
Currently the VPU allocates the contiguous buffer for the
framebuffer.
Try an alternate path first where we use dma_alloc_coherent
and pass the buffer to the VPU. Should the VPU firmware not
support that path, then free the buffer and revert to the
old behaviour of using the VPU allocation.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Signed-off-by: popcornmix <popcornmix@gmail.com>
usb: dwc: fix lockdep false positive
Signed-off-by: Kari Suvanto <karis79@gmail.com>
usb: dwc: fix inconsistent lock state
Signed-off-by: Kari Suvanto <karis79@gmail.com>
Add FIQ patch to dwc_otg driver. Enable with dwc_otg.fiq_fix_enable=1. Should give about 10% more ARM performance.
Thanks to Gordon and Costas
Avoid dynamic memory allocation for channel lock in USB driver. Thanks ddv2005.
Add NAK holdoff scheme. Enabled by default, disable with dwc_otg.nak_holdoff_enable=0. Thanks gsh
Make sure we wait for the reset to finish
dwc_otg: fix bug in dwc_otg_hcd.c resulting in silent kernel
memory corruption, escalating to OOPS under high USB load.
dwc_otg: Fix unsafe access of QTD during URB enqueue
In dwc_otg_hcd_urb_enqueue during qtd creation, it was possible that the
transaction could complete almost immediately after the qtd was assigned
to a host channel during URB enqueue, which meant the qtd pointer was no
longer valid having been completed and removed. Usually, this resulted in
an OOPS during URB submission. By predetermining whether transactions
need to be queued or not, this unsafe pointer access is avoided.
This bug was only evident on the Pi model A where a device was attached
that had no periodic endpoints (e.g. USB pendrive or some wlan devices).
dwc_otg: Fix incorrect URB allocation error handling
If the memory allocation for a dwc_otg_urb failed, the kernel would OOPS
because for some reason a member of the *unallocated* struct was set to
zero. Error handling changed to fail correctly.
dwc_otg: fix potential use-after-free case in interrupt handler
If a transaction had previously aborted, certain interrupts are
enabled to track error counts and reset where necessary. On IN
endpoints the host generates an ACK interrupt near-simultaneously
with completion of transfer. In the case where this transfer had
previously had an error, this results in a use-after-free on
the QTD memory space with a 1-byte length being overwritten to
0x00.
dwc_otg: add handling of SPLIT transaction data toggle errors
Previously a data toggle error on packets from a USB1.1 device behind
a TT would result in the Pi locking up as the driver never handled
the associated interrupt. Patch adds basic retry mechanism and
interrupt acknowledgement to cater for either a chance toggle error or
for devices that have a broken initial toggle state (FT8U232/FT232BM).
dwc_otg: implement tasklet for returning URBs to usbcore hcd layer
The dwc_otg driver interrupt handler for transfer completion will spend
a very long time with interrupts disabled when a URB is completed -
this is because usb_hcd_giveback_urb is called from within the handler
which for a USB device driver with complicated processing (e.g. webcam)
will take an exorbitant amount of time to complete. This results in
missed completion interrupts for other USB packets which lead to them
being dropped due to microframe overruns.
This patch splits returning the URB to the usb hcd layer into a
high-priority tasklet. This will have most benefit for isochronous IN
transfers but will also have incidental benefit where multiple periodic
devices are active at once.
dwc_otg: fix NAK holdoff and allow on split transactions only
This corrects a bug where if a single active non-periodic endpoint
had at least one transaction in its qh, on frnum == MAX_FRNUM the qh
would get skipped and never get queued again. This would result in
a silent device until error detection (automatic or otherwise) would
either reset the device or flush and requeue the URBs.
Additionally the NAK holdoff was enabled for all transactions - this
would potentially stall a HS endpoint for 1ms if a previous error state
enabled this interrupt and the next response was a NAK. Fix so that
only split transactions get held off.
dwc_otg: Call usb_hcd_unlink_urb_from_ep with lock held in completion handler
usb_hcd_unlink_urb_from_ep must be called with the HCD lock held. Calling it
asynchronously in the tasklet was not safe (regression in
c4564d4a1a).
This change unlinks it from the endpoint prior to queueing it for handling in
the tasklet, and also adds a check to ensure the urb is OK to be unlinked
before doing so.
NULL pointer dereference kernel oopses had been observed in usb_hcd_giveback_urb
when a USB device was unplugged/replugged during data transfer. This effect
was reproduced using automated USB port power control, hundreds of replug
events were performed during active transfers to confirm that the problem was
eliminated.
USB fix using a FIQ to implement split transactions
This commit adds a FIQ implementaion that schedules
the split transactions using a FIQ so we don't get
held off by the interrupt latency of Linux
dwc_otg: fix device attributes and avoid kernel warnings on boot
dcw_otg: avoid logging function that can cause panics
See: https://github.com/raspberrypi/firmware/issues/21
Thanks to cleverca22 for fix
dwc_otg: mask correct interrupts after transaction error recovery
The dwc_otg driver will unmask certain interrupts on a transaction
that previously halted in the error state in order to reset the
QTD error count. The various fine-grained interrupt handlers do not
consider that other interrupts besides themselves were unmasked.
By disabling the two other interrupts only ever enabled in DMA mode
for this purpose, we can avoid unnecessary function calls in the
IRQ handler. This will also prevent an unneccesary FIQ interrupt
from being generated if the FIQ is enabled.
dwc_otg: fiq: prevent FIQ thrash and incorrect state passing to IRQ
In the case of a transaction to a device that had previously aborted
due to an error, several interrupts are enabled to reset the error
count when a device responds. This has the side-effect of making the
FIQ thrash because the hardware will generate multiple instances of
a NAK on an IN bulk/interrupt endpoint and multiple instances of ACK
on an OUT bulk/interrupt endpoint. Make the FIQ mask and clear the
associated interrupts.
Additionally, on non-split transactions make sure that only unmasked
interrupts are cleared. This caused a hard-to-trigger but serious
race condition when you had the combination of an endpoint awaiting
error recovery and a transaction completed on an endpoint - due to
the sequencing and timing of interrupts generated by the dwc_otg core,
it was possible to confuse the IRQ handler.
Fix function tracing
dwc_otg: whitespace cleanup in dwc_otg_urb_enqueue
dwc_otg: prevent OOPSes during device disconnects
The dwc_otg_urb_enqueue function is thread-unsafe. In particular the
access of urb->hcpriv, usb_hcd_link_urb_to_ep, dwc_otg_urb->qtd and
friends does not occur within a critical section and so if a device
was unplugged during activity there was a high chance that the
usbcore hub_thread would try to disable the endpoint with partially-
formed entries in the URB queue. This would result in BUG() or null
pointer dereferences.
Fix so that access of urb->hcpriv, enqueuing to the hardware and
adding to usbcore endpoint URB lists is contained within a single
critical section.
dwc_otg: prevent BUG() in TT allocation if hub address is > 16
A fixed-size array is used to track TT allocation. This was
previously set to 16 which caused a crash because
dwc_otg_hcd_allocate_port would read past the end of the array.
This was hit if a hub was plugged in which enumerated as addr > 16,
due to previous device resets or unplugs.
Also add #ifdef FIQ_DEBUG around hcd->hub_port_alloc[], which grows
to a large size if 128 hub addresses are supported. This field is
for debug only for tracking which frame an allocate happened in.
dwc_otg: make channel halts with unknown state less damaging
If the IRQ received a channel halt interrupt through the FIQ
with no other bits set, the IRQ would not release the host
channel and never complete the URB.
Add catchall handling to treat as a transaction error and retry.
dwc_otg: fiq_split: use TTs with more granularity
This fixes certain issues with split transaction scheduling.
- Isochronous multi-packet OUT transactions now hog the TT until
they are completed - this prevents hubs aborting transactions
if they get a periodic start-split out-of-order
- Don't perform TT allocation on non-periodic endpoints - this
allows simultaneous use of the TT's bulk/control and periodic
transaction buffers
This commit will mainly affect USB audio playback.
dwc_otg: fix potential sleep while atomic during urb enqueue
Fixes a regression introduced with eb1b482a. Kmalloc called from
dwc_otg_hcd_qtd_add / dwc_otg_hcd_qtd_create did not always have
the GPF_ATOMIC flag set. Force this flag when inside the larger
critical section.
dwc_otg: make fiq_split_enable imply fiq_fix_enable
Failing to set up the FIQ correctly would result in
"IRQ 32: nobody cared" errors in dmesg.
dwc_otg: prevent crashes on host port disconnects
Fix several issues resulting in crashes or inconsistent state
if a Model A root port was disconnected.
- Clean up queue heads properly in kill_urbs_in_qh_list by
removing the empty QHs from the schedule lists
- Set the halt status properly to prevent IRQ handlers from
using freed memory
- Add fiq_split related cleanup for saved registers
- Make microframe scheduling reclaim host channels if
active during a disconnect
- Abort URBs with -ESHUTDOWN status response, informing
device drivers so they respond in a more correct fashion
and don't try to resubmit URBs
- Prevent IRQ handlers from attempting to handle channel
interrupts if the associated URB was dequeued (and the
driver state was cleared)
dwc_otg: prevent leaking URBs during enqueue
A dwc_otg_urb would get leaked if the HCD enqueue function
failed for any reason. Free the URB at the appropriate points.
dwc_otg: Enable NAK holdoff for control split transactions
Certain low-speed devices take a very long time to complete a
data or status stage of a control transaction, producing NAK
responses until they complete internal processing - the USB2.0
spec limit is up to 500mS. This causes the same type of interrupt
storm as seen with USB-serial dongles prior to c8edb238.
In certain circumstances, usually while booting, this interrupt
storm could cause SD card timeouts.
dwc_otg: Fix for occasional lockup on boot when doing a USB reset
dwc_otg: Don't issue traffic to LS devices in FS mode
Issuing low-speed packets when the root port is in full-speed mode
causes the root port to stop responding. Explicitly fail when
enqueuing URBs to a LS endpoint on a FS bus.
Fix ARM architecture issue with local_irq_restore()
If local_fiq_enable() is called before a local_irq_restore(flags) where
the flags variable has the F bit set, the FIQ will be erroneously disabled.
Fixup arch_local_irq_restore to avoid trampling the F bit in CPSR.
Also fix some of the hacks previously implemented for previous dwc_otg
incarnations.
dwc_otg: fiq_fsm: Base commit for driver rewrite
This commit removes the previous FIQ fixes entirely and adds fiq_fsm.
This rewrite features much more complete support for split transactions
and takes into account several OTG hardware bugs. High-speed
isochronous transactions are also capable of being performed by fiq_fsm.
All driver options have been removed and replaced with:
- dwc_otg.fiq_enable (bool)
- dwc_otg.fiq_fsm_enable (bool)
- dwc_otg.fiq_fsm_mask (bitmask)
- dwc_otg.nak_holdoff (unsigned int)
Defaults are specified such that fiq_fsm behaves similarly to the
previously implemented FIQ fixes.
fiq_fsm: Push error recovery into the FIQ when fiq_fsm is used
If the transfer associated with a QTD failed due to a bus error, the HCD
would retry the transfer up to 3 times (implementing the USB2.0
three-strikes retry in software).
Due to the masking mechanism used by fiq_fsm, it is only possible to pass
a single interrupt through to the HCD per-transfer.
In this instance host channels would fall off the radar because the error
reset would function, but the subsequent channel halt would be lost.
Push the error count reset into the FIQ handler.
fiq_fsm: Implement timeout mechanism
For full-speed endpoints with a large packet size, interrupt latency
runs the risk of the FIQ starting a transaction too late in a full-speed
frame. If the device is still transmitting data when EOF2 for the
downstream frame occurs, the hub will disable the port. This change is
not reflected in the hub status endpoint and the device becomes
unresponsive.
Prevent high-bandwidth transactions from being started too late in a
frame. The mechanism is not guaranteed: a combination of bit stuffing
and hub latency may still result in a device overrunning.
fiq_fsm: fix bounce buffer utilisation for Isochronous OUT
Multi-packet isochronous OUT transactions were subject to a few bounday
bugs. Fix them.
Audio playback is now much more robust: however, an issue stands with
devices that have adaptive sinks - ALSA plays samples too fast.
dwc_otg: Return full-speed frame numbers in HS mode
The frame counter increments on every *microframe* in high-speed mode.
Most device drivers expect this number to be in full-speed frames - this
caused considerable confusion to e.g. snd_usb_audio which uses the
frame counter to estimate the number of samples played.
fiq_fsm: save PID on completion of interrupt OUT transfers
Also add edge case handling for interrupt transports.
Note that for periodic split IN, data toggles are unimplemented in the
OTG host hardware - it unconditionally accepts any PID.
fiq_fsm: add missing case for fiq_fsm_tt_in_use()
Certain combinations of bitrate and endpoint activity could
result in a periodic transaction erroneously getting started
while the previous Isochronous OUT was still active.
fiq_fsm: clear hcintmsk for aborted transactions
Prevents the FIQ from erroneously handling interrupts
on a timed out channel.
fiq_fsm: enable by default
fiq_fsm: fix dequeues for non-periodic split transactions
If a dequeue happened between the SSPLIT and CSPLIT phases of the
transaction, the HCD would never receive an interrupt.
fiq_fsm: Disable by default
fiq_fsm: Handle HC babble errors
The HCTSIZ transfer size field raises a babble interrupt if
the counter wraps. Handle the resulting interrupt in this case.
dwc_otg: fix interrupt registration for fiq_enable=0
Additionally make the module parameter conditional for wherever
hcd->fiq_state is touched.
fiq_fsm: Enable by default
dwc_otg: Fix various issues with root port and transaction errors
Process the host port interrupts correctly (and don't trample them).
Root port hotplug now functional again.
Fix a few thinkos with the transaction error passthrough for fiq_fsm.
fiq_fsm: Implement hack for Split Interrupt transactions
Hubs aren't too picky about which endpoint we send Control type split
transactions to. By treating Interrupt transfers as Control, it is
possible to use the non-periodic queue in the OTG core as well as the
non-periodic FIFOs in the hub itself. This massively reduces the
microframe exclusivity/contention that periodic split transactions
otherwise have to enforce.
It goes without saying that this is a fairly egregious USB specification
violation, but it works.
Original idea by Hans Petter Selasky @ FreeBSD.org.
dwc_otg: FIQ support on SMP. Set up FIQ stack and handler on Core 0 only.
dwc_otg: introduce fiq_fsm_spin(un|)lock()
SMP safety for the FIQ relies on register read-modify write cycles being
completed in the correct order. Several places in the DWC code modify
registers also touched by the FIQ. Protect these by a bare-bones lock
mechanism.
This also makes it possible to run the FIQ and IRQ handlers on different
cores.
fiq_fsm: fix build on bcm2708 and bcm2709 platforms
dwc_otg: put some barriers back where they should be for UP
bcm2709/dwc_otg: Setup FIQ on core 1 if >1 core active
dwc_otg: fixup read-modify-write in critical paths
Be more careful about read-modify-write on registers that the FIQ
also touches.
Guard fiq_fsm_spin_lock with fiq_enable check
fiq_fsm: Falling out of the state machine isn't fatal
This edge case can be hit if the port is disabled while the FIQ is
in the middle of a transaction. Make the effects less severe.
Also get rid of the useless return value.
squash: dwc_otg: Allow to build without SMP
usb: core: make overcurrent messages more prominent
Hub overcurrent messages are more serious than "debug". Increase loglevel.
usb: dwc_otg: Don't use dma_to_virt()
Commit 6ce0d20 changes dma_to_virt() which breaks this driver.
Open code the old dma_to_virt() implementation to work around this.
Limit the use of __bus_to_virt() to cases where transfer_buffer_length
is set and transfer_buffer is not set. This is done to increase the
chance that this driver will also work on ARCH_BCM2835.
transfer_buffer should not be NULL if the length is set, but the
comment in the code indicates that there are situations where this
might happen. drivers/usb/isp1760/isp1760-hcd.c also has a similar
comment pointing to a possible: 'usb storage / SCSI bug'.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dwc_otg: Fix crash when fiq_enable=0
dwc_otg: fiq_fsm: Make high-speed isochronous strided transfers work properly
Certain low-bandwidth high-speed USB devices (specialist audio devices,
compressed-frame webcams) have packet intervals > 1 microframe.
Stride these transfers in the FIQ by using the start-of-frame interrupt
to restart the channel at the right time.
dwc_otg: Force host mode to fix incorrect compute module boards
dwc_otg: Add ARCH_BCM2835 support
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dwc_otg: Simplify FIQ irq number code
Dropping ATAGS means we can simplify the FIQ irq number code.
Also add error checking on the returned irq number.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dwc_otg: Remove duplicate gadget probe/unregister function
dwc_otg: Properly set the HFIR
Douglas Anderson reported:
According to the most up to date version of the dwc2 databook, the FRINT
field of the HFIR register should be programmed to:
* 125 us * (PHY clock freq for HS) - 1
* 1000 us * (PHY clock freq for FS/LS) - 1
This is opposed to older versions of the doc that claimed it should be:
* 125 us * (PHY clock freq for HS)
* 1000 us * (PHY clock freq for FS/LS)
and reported lower timing jitter on a USB analyser
dcw_otg: trim xfer length when buffer larger than allocated size is received
dwc_otg: Don't free qh align buffers in atomic context
dwc_otg: Enable the hack for Split Interrupt transactions by default
dwc_otg.fiq_fsm_mask=0xF has long been a suggestion for users with audio stutters or other USB bandwidth issues.
So far we are aware of many success stories but no failure caused by this setting.
Make it a default to learn more.
See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=70437
Signed-off-by: popcornmix <popcornmix@gmail.com>
dwc_otg: Use kzalloc when suitable
dwc_otg: Pass struct device to dma_alloc*()
This makes it possible to get the bus address from Device Tree.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dwc_otg: fix summarize urb->actual_length for isochronous transfers
Kernel does not copy input data of ISO transfers to userspace
if actual_length is set only in ISO transfers and not summarized
in urb->actual_length. Fixesraspberrypi/linux#903
fiq_fsm: Use correct states when starting isoc OUT transfers
In fiq_fsm_start_next_periodic() if an isochronous OUT transfer
was selected, no regard was given as to whether this was a single-packet
transfer or a multi-packet staged transfer.
For single-packet transfers, this had the effect of repeatedly sending
OUT packets with bogus data and lengths.
Eventually if the channel was repeatedly enabled enough times, this
would lock up the OTG core and no further bus transfers would happen.
Set the FSM state up properly if we select a single-packet transfer.
Fixes https://github.com/raspberrypi/linux/issues/1842
dwc_otg: make nak_holdoff work as intended with empty queues
If URBs reading from non-periodic split endpoints were dequeued and
the last transfer from the endpoint was a NAK handshake, the resulting
qh->nak_frame value was stale which would result in unnecessarily long
polling intervals for the first subsequent transfer with a fresh URB.
Fixup qh->nak_frame in dwc_otg_hcd_urb_dequeue and also guard against
a case where a single URB is submitted to the endpoint, a NAK was
received on the transfer immediately prior to receiving data and the
device subsequently resubmits another URB past the qh->nak_frame interval.
Fixes https://github.com/raspberrypi/linux/issues/1709
dwc_otg: fix split transaction data toggle handling around dequeues
See https://github.com/raspberrypi/linux/issues/1709
Fix several issues regarding endpoint state when URBs are dequeued
- If the HCD is disconnected, flush FIQ-enabled channels properly
- Save the data toggle state for bulk endpoints if the last transfer
from an endpoint where URBs were dequeued returned a data packet
- Reset hc->start_pkt_count properly in assign_and_init_hc()
dwc_otg: fix several potential crash sources
On root port disconnect events, the host driver state is cleared and
in-progress host channels are forcibly stopped. This doesn't play
well with the FIQ running in the background, so:
- Guard the disconnect callback with both the host spinlock and FIQ
spinlock
- Move qtd dereference in dwc_otg_handle_hc_fsm() after the early-out
so we don't dereference a qtd that has gone away
- Turn catch-all BUG()s in dwc_otg_handle_hc_fsm() into warnings.
dwc_otg: delete hcd->channel_lock
The lock serves no purpose as it is only held while the HCD spinlock
is already being held.
dwc_otg: remove unnecessary dma-mode channel halts on disconnect interrupt
Host channels are already halted in kill_urbs_in_qh_list() with the
subsequent interrupt processing behaving as if the URB was dequeued
via HCD callback.
There's no need to clobber the host channel registers a second time
as this exposes races between the driver and host channel resulting
in hcd->free_hc_list becoming corrupted.
dwcotg: Allow to build without FIQ on ARM64
Signed-off-by: popcornmix <popcornmix@gmail.com>
dwc_otg: make periodic scheduling behave properly for FS buses
If the root port is in full-speed mode, transfer times at 12mbit/s
would be calculated but matched against high-speed quotas.
Reinitialise hcd->frame_usecs[i] on each port enable event so that
full-speed bandwidth can be tracked sensibly.
Also, don't bother using the FIQ for transfers when in full-speed
mode - at the slower bus speed, interrupt frequency is reduced by
an order of magnitude.
Related issue: https://github.com/raspberrypi/linux/issues/2020
dwc_otg: fiq_fsm: Make isochronous compatibility checks work properly
Get rid of the spammy printk and local pointer mangling.
Also, there is a nominal benefit for using fiq_fsm for isochronous
transfers in FS mode (~1.1k IRQs per second vs 2.1k IRQs per second)
so remove the root port speed check.
dwc_otg: add module parameter int_ep_interval_min
Add a module parameter (defaulting to ignored) that clamps the polling rate
of high-speed Interrupt endpoints to a minimum microframe interval.
The parameter is modifiable at runtime as it is used when activating new
endpoints (such as on device connect).
dwc_otg: fiq_fsm: Add non-periodic TT exclusivity constraints
Certain hub types do not discriminate between pipe direction (IN or OUT)
when considering non-periodic transfers. Therefore these hubs get confused
if multiple transfers are issued in different directions with the same
device address and endpoint number.
Constrain queuing non-periodic split transactions so they are performed
serially in such cases.
Related: https://github.com/raspberrypi/linux/issues/2024
dwc_otg: Fixup change to DRIVER_ATTR interface
dwc_otg: Fix compilation warnings
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
USB_DWCOTG: Disable building dwc_otg as a module (#2265)
When dwc_otg is built as a module, build will fail with the following
error:
ERROR: "DWC_TASK_HI_SCHEDULE" [drivers/usb/host/dwc_otg/dwc_otg.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
Makefile:1199: recipe for target 'modules' failed
make: *** [modules] Error 2
Even if the error is solved by including the missing
DWC_TASK_HI_SCHEDULE function, the kernel will panic when loading
dwc_otg.
As a workaround, simply prevent user from building dwc_otg as a module
as the current kernel does not support it.
See: https://github.com/raspberrypi/linux/issues/2258
Signed-off-by: Malik Olivier Boussejra <malik@boussejra.com>
dwc_otg: New timer API
dwc_otg: Fix removed ACCESS_ONCE->READ_ONCE
dwc_otg: don't unconditionally force host mode in dwc_otg_cil_init()
Add the ability to disable force_host_mode for those that want to use
dwc_otg in both device and host modes.
dwc_otg: Fix a regression when dequeueing isochronous transfers
In 282bed95 (dwc_otg: make nak_holdoff work as intended with empty queues)
the dequeue mechanism was changed to leave FIQ-enabled transfers to run
to completion - to avoid leaving hub TT buffers with stale packets lying
around.
This broke FIQ-accelerated isochronous transfers, as this then meant that
dozens of transfers were performed after the dequeue function returned.
Restore the state machine fence for isochronous transfers.
fiq_fsm: rewind DMA pointer for OUT transactions that fail (#2288)
See: https://github.com/raspberrypi/linux/issues/2140
dwc_otg: add smp_mb() to prevent driver state corruption on boot
Occasional crashes have been seen where the FIQ code dereferences
invalid/random pointers immediately after being set up, leading to
panic on boot.
The crash occurs as the FIQ code races against hcd_init_fiq() and
the hcd_init_fiq() code races against the outstanding memory stores
from dwc_otg_hcd_init(). Use explicit barriers after touching
driver state.
usb: dwc_otg: fix memory corruption in dwc_otg driver
[Upstream commit 51b1b64917]
The move from the staging tree to the main tree exposed a
longstanding memory corruption bug in the dwc2 driver. The
reordering of the driver initialization caused the dwc2 driver
to corrupt the initialization data of the sdhci driver on the
Raspberry Pi platform, which made the bug show up.
The error is in calling to_usb_device(hsotg->dev), since ->dev
is not a member of struct usb_device. The easiest fix is to
just remove the offending code, since it is not really needed.
Thanks to Stephen Warren for tracking down the cause of this.
Reported-by: Andre Heider <a.heider@gmail.com>
Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Paul Zimmerman <paulz@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[lukas: port from upstream dwc2 to out-of-tree dwc_otg driver]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
usb: dwb_otg: Fix unreachable switch statement warning
This warning appears with GCC 7.3.0 from toolchains.bootlin.com:
../drivers/usb/host/dwc_otg/dwc_otg_fiq_fsm.c: In function ‘fiq_fsm_update_hs_isoc’:
../drivers/usb/host/dwc_otg/dwc_otg_fiq_fsm.c:595:61: warning: statement will never be executed [-Wswitch-unreachable]
st->hctsiz_copy.b.xfersize = nrpackets * st->hcchar_copy.b.mps;
~~~~~~~~~~~~~~~~~^~~~
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
dwc_otg: fiq_fsm: fix incorrect DMA register offset calculation
Rationalise the offset and update all call sites.
Fixes https://github.com/raspberrypi/linux/issues/2408
dwc_otg: fix bug with port_addr assignment for single-TT hubs
See https://github.com/raspberrypi/linux/issues/2734
The "Hub Port" field in the split transaction packet was always set
to 1 for single-TT hubs. The majority of single-TT hub products
apparently ignore this field and broadcast to all downstream enabled
ports, which masked the issue. A subset of hub devices apparently
need the port number to be exact or split transactions will fail.
usb: dwc_otg: Clean up build warnings on 64bit kernels
No functional changes. Almost all are changes to logging lines.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
usb: dwc_otg: Use dma allocation for mphi dummy_send buffer
The FIQ driver used a kzalloc'ed buffer for dummy_send,
passing a kernel virtual address to the hardware block.
The buffer is only ever used for a dummy read, so it
should be harmless, but there is the chance that it will
cause exceptions.
Use a dma allocation so that we have a genuine bus address,
and read from that.
Free the allocation when done for good measure.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
dwc_otg: only do_split when we actually need to do a split
The previous test would fail if the root port was in fullspeed mode
and there was a hub between the FS device and the root port. While
the transfer worked, the schedule mangling performed for high-speed
split transfers would break leading to an 8ms polling interval.
dwc_otg: fix locking around dequeueing and killing URBs
kill_urbs_in_qh_list() is practically only ever called with the fiq lock
already held, so don't spinlock twice in the case where we need to cancel
an isochronous transfer.
Also fix up a case where the global interrupt register could be read with
the fiq lock not held.
Fixes the deadlock seen in https://github.com/raspberrypi/linux/issues/2907
ARM64/DWC_OTG: Port dwc_otg driver to ARM64
In ARM64, the FIQ mechanism used by this driver is not current
implemented. As a workaround, reqular IRQ is used instead
of FIQ.
In a separate change, the IRQ-CPU mapping is round robined
on ARM64 to increase concurrency and allow multiple interrupts
to be serviced at a time. This reduces the need for FIQ.
Tests Run:
This mechanism is most likely to break when multiple USB devices
are attached at the same time. So the system was tested under
stress.
Devices:
1. USB Speakers playing back a FLAC audio through VLC
at 96KHz.(Higher then typically, but supported on my speakers).
2. sftp transferring large files through the buildin ethernet
connection which is connected through USB.
3. Keyboard and mouse attached and being used.
Although I do occasionally hear some glitches, the music seems to
play quite well.
Signed-off-by: Michael Zoran <mzoran@crowfest.net>
usb: dwc_otg: Clean up interrupt claiming code
The FIQ/IRQ interrupt number identification code is scattered through
the dwc_otg driver. Rationalise it, simplifying the code and solving
an existing issue.
See: https://github.com/raspberrypi/linux/issues/2612
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
dwc_otg: Choose appropriate IRQ handover strategy
2711 has no MPHI peripheral, but the ARM Control block can fake
interrupts. Use the size of the DTB "mphi" reg block to determine
which is required.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
usb: host: dwc_otg: fix compiling in separate directory
The dwc_otg Makefile does not respect the O=path argument correctly:
include paths in CFLAGS are given relatively to object path, not source
path. Compiling in a separate directory yields #include errors.
Signed-off-by: Marek Behún <marek.behun@nic.cz>
dwc_otg: use align_buf for small IN control transfers (#3150)
The hardware will do a 4-byte write to memory on any IN packet received
that is between 1 and 3 bytes long. This tramples memory in the uvcvideo
driver, as it uses a sequence of 1- and 2-byte control transfers to
query the min/max/range/step of each individual camera control and
gives us buffers that are offsets into a struct.
Catch small control transfers in the data phase and use the align_buf
to bounce the correct number of bytes into the URB's buffer.
In general, short packets on non-control endpoints should be OK as URBs
should have enough buffer space for a wMaxPacket size transfer.
See: https://github.com/raspberrypi/linux/issues/3148
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
dwc_otg: Declare DMA capability with HCD_DMA flag
Following [1], USB controllers have to declare DMA capabilities in
order for them to be used by adding the HCD_DMA flag to their hc_driver
struct.
[1] 7b81cb6bdd ("usb: add a HCD_DMA flag instead of guestimating DMA capabilities")
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
dwc_otg: checking the urb->transfer_buffer too early (#3332)
After enable the HIGHMEM and VMSPLIT_3G, the dwc_otg driver doesn't
work well on Pi2/3 boards with 1G physical ram. Users experience
the failure when copying a file of 600M size to the USB stick. And
at the same time, the dmesg shows:
usb 1-1.1.2: reset high-speed USB device number 8 using dwc_otg
sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
blk_update_request: I/O error, dev sda, sector 3024048 op 0x1:(WRITE) flags 0x4000 phys_seg 15 prio class 0
When this happens, the sg_buf sent to the driver is located in the
highmem region, the usb_sg_init() in the core/message.c will leave
transfer_buffer to NULL if the sg_buf is in highmem, but in the
dwc_otg driver, it returns -EINVAL unconditionally if transfer_buffer
is NULL.
The driver can handle the situation of buffer to be NULL, if it is in
DMA mode, it will convert an address from transfer_dma.
But if the conversion fails or it is in the PIO mode, we should check
buffer and return -EINVAL if it is NULL.
BugLink: https://bugs.launchpad.net/bugs/1852510
Signed-off-by: Hui Wang <hui.wang@canonical.com>
dwc_otg: constrain endpoint max packet and transfer size on split IN
The hcd would unconditionally set the transfer length to the endpoint
packet size for non-isoc IN transfers. If the remaining buffer length
was less than the length of returned data, random memory would get
scribbled over, with bad effects if it crossed a page boundary.
Force a babble error if this happens by limiting the max transfer size
to the available buffer space. DMA will stop writing to memory on a
babble condition.
The hardware expects xfersize to be an integer multiple of maxpacket
size, so override hcchar.b.mps as well.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
dwc_otg: fiq_fsm: pause when cancelling split transactions
Non-periodic splits will DMA to/from the driver-provided transfer_buffer,
which may be freed immediately after the dequeue call returns. Block until
we know the transfer is complete.
A similar delay is needed when cleaning up disconnects, as the FIQ could
have started a periodic transfer in the previous microframe to the one
that triggered a disconnect.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
dwc_otg: fiq_fsm: add a barrier on entry into FIQ handler(s)
On BCM2835, there is no hardware guarantee that multiple outstanding
reads to different peripherals will complete in-order. The FIQ code
uses peripheral reads without barriers for performance, so in the case
where a read to a slow peripheral was issued immediately prior to FIQ
entry, the first peripheral read that the FIQ did could end up with
wrong read data returned.
Add dsb(sy) on entry so that all outstanding reads are retired.
The FIQ only issues reads to the dwc_otg core, so per-read barriers
in the handler itself are not required.
On BCM2836 and BCM2837 the barrier is not strictly required due to
differences in how the peripheral bus is implemented, but having
arch-specific handlers that introduce different latencies is risky.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
dwc_otg: whitelist_table is now productlist_table
dwc_otg: initialise sched_frame for periodic QHs that were parked
If a periodic QH has no remaining QTDs, then it is removed from all
periodic schedules. When re-adding, initialise the sched_frame and
start_split_frame from the current value of the frame counter.
See https://bugs.launchpad.net/raspbian/+bug/1819560
and
https://github.com/raspberrypi/linux/issues/3883
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
dwc_otg: Minimise header and fix build warnings
Delete a large amount of unused declaration from "usb.h", some of which
were causing build warnings, and get the module building cleanly.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
dwc-otg: fix clang -Wignored-attributes warning
warning: attribute declaration must precede definition
dwc-otg: fix clang -Wsometimes-uninitialized warning
warning: variable 'retval' is used uninitialized whenever 'if' condition is false
dwc-otg: fix clang -Wpointer-bool-conversion warning
warning: address of array 'desc->wMaxPacketSize' will always evaluate to 'true'
The wMaxPacketSize field is actually a two element array which content should
be accessed via the UGETW macro.
dwc_otg: fix an undeclared variable
Replace an undeclared variable used by DWC_DEBUGPL with the real endpoint address. DWC_DEBUGPL does nothing with DEBUG undefined so it did not go wrong before.
Signed-off-by: Zixuan Wang <wangzixuan@sjtu.edu.cn>
dwc_otg: Update NetBSD usb.h header licence
NetBSD have changed their licensing requirements such that the 2-clause
licence is preferred. Update usb.h in the downstream dwc_otg code
accordingly.
See https://www.netbsd.org/about/redistribution.html for more
information.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
dwc_otg: pay attention to qh->interval when rescheduling periodic queues
A regression introduced in https://github.com/raspberrypi/linux/pull/3887
meant that if the newly scheduled transfer immediately returned data, and
the driver resubmitted a single URB after every transfer, then the effective
polling interval would end up being approx 1ms.
Use the larger of SCHEDULE_SLOP or the configured endpoint interval.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Signed-off-by: popcornmix <popcornmix@gmail.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2709: Drop platform smp and timer init code
irq-bcm2836 handles this through these functions:
bcm2835_init_local_timer_frequency()
bcm2836_arm_irqchip_smp_init()
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm270x: Use watchdog for reboot/poweroff
The watchdog driver already has support for reboot/poweroff.
Make use of this and remove the code from the platform files.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
board_bcm2835: Remove coherent dma pool increase - API has gone
Signed-off-by: Noralf Tronnes <notro@tronnes.org>
SQUASH: pinctrl: bcm2835: Set base for bcm2711 GPIO to 0
Without this patch GPIOs don't seem to work properly, primarily
noticeable as broken LEDs.
Squash with "pinctrl-bcm2835: Set base to 0 give expected gpio numbering"
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Under some circumstances on BCM283x processors data loss can be
observed - a single byte missing from the TX output stream. These bytes
are always the last byte of a batch of 8 written from pl011_tx_chars
when from_irq is true, meaning that the FIFO full flag is not checked
before writing.
The transmit optimisation relies on the FIFO being half-empty when the
TX interrupt is raised. Instrumenting the driver further showed that
the failure case correlated with the TX FIFO full flag being set at the
point where the last byte was written to the data register, which
explains the data loss but not how the FIFO appeared to be prematurely
full. A possible explanation is that a FIFO write was in flight at the
time the interrupt was raised, but as yet there is no hypothesis as to
how this might occur.
In the absence of a clear understanding of the failure mechanism, avoid
the problem by checking the FIFO levels before writing the last byte of
the group, which will have minimal performance impact.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The BCM2835 PL011 implementation seems to have a bug that can lead to a
transmission lockup if CTS changes frequently. A workaround was added to
the driver with a vendor-specific flag to enable it, but this flag is
currently not set for ARM implementations.
Add a "cts-event-workaround" property to Pi DTBs and use the presence
of that property to force the flag to be enabled in the driver.
See: https://github.com/raspberrypi/linux/issues/1280
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The pl011 register accessor functions use the _relaxed versions of the
standard readl() and writel() functions, meaning that there are no
automatic memory barriers. When polling a FIFO status register to check
for fullness, it is necessary to ensure that any outstanding writes have
completed; otherwise the flags are effectively stale, making it possible
that the next write is to a full FIFO.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
For applications of the LAN78xx that don't have valid programmed
EEPROMs or OTPs, enabling both LEDs and auto-negotiation by default
seems reasonable.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The syscon node defines a register range that duplicates that used by
the local_intc node on bcm2836/7. Since irq-bcm2835 and irq-bcm2836 are
built in and always present together (both drivers are enabled by
CONFIG_ARCH_BCM2835), it is possible to replace the syscon usage with a
global variable that simplifies the code. Doing so does lose the
locking provided by regmap, but as only one side is using the regmap
interface (irq-bcm2835 uses readl and write) there is no loss of
atomicity.
See: https://github.com/raspberrypi/firmware/issues/926
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
This adds a debug module parameter to aid in debugging transfer issues
by printing info to the kernel log. When enabled, status values are
collected in the interrupt routine and msg info in
bcm2835_i2c_start_transfer(). This is done in a way that tries to avoid
affecting timing. Having printk in the isr can mask issues.
debug values (additive):
1: Print info on error
2: Print info on all transfers
3: Print messages before transfer is started
The value can be changed at runtime:
/sys/module/i2c_bcm2835/parameters/debug
Example output, debug=3:
[ 747.114448] bcm2835_i2c_xfer: msg(1/2) write addr=0x54, len=2 flags= [i2c1]
[ 747.114463] bcm2835_i2c_xfer: msg(2/2) read addr=0x54, len=32 flags= [i2c1]
[ 747.117809] start_transfer: msg(1/2) write addr=0x54, len=2 flags= [i2c1]
[ 747.117825] isr: remain=2, status=0x30000055 : TA TXW TXD TXE [i2c1]
[ 747.117839] start_transfer: msg(2/2) read addr=0x54, len=32 flags= [i2c1]
[ 747.117849] isr: remain=32, status=0xd0000039 : TA RXR TXD RXD [i2c1]
[ 747.117861] isr: remain=20, status=0xd0000039 : TA RXR TXD RXD [i2c1]
[ 747.117870] isr: remain=8, status=0x32 : DONE TXD RXD [i2c1]
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Christopher Alexander Tobias Schulze - May 2, 2015, 11:57 a.m.
This patch fixes a problem with VFP state save and restore related
to exception handling (panic with message "BUG: unsupported FP
instruction in kernel mode") present on VFP11 floating point units
(as used with ARM1176JZF-S CPUs, e.g. on first generation Raspberry
Pi boards). This patch was developed and discussed on
https://github.com/raspberrypi/linux/issues/859
A precondition to see the crashes is that floating point exception
traps are enabled. In this case, the VFP11 might determine that a FPU
operation needs to trap at a point in time when it is not possible to
signal this to the ARM11 core any more. The VFP11 will then set the
FPEXC.EX bit and store the trapped opcode in FPINST. (In some cases,
a second opcode might have been accepted by the VFP11 before the
exception was detected and could be reported to the ARM11 - in this
case, the VFP11 also sets FPEXC.FP2V and stores the second opcode in
FPINST2.)
If FPEXC.EX is set, the VFP11 will "bounce" the next FPU opcode issued
by the ARM11 CPU, which will be seen by the ARM11 as an undefined opcode
trap. The VFP support code examines the FPEXC.EX and FPEXC.FP2V bits
to decide what actions to take, i.e., whether to emulate the opcodes
found in FPINST and FPINST2, and whether to retry the bounced instruction.
If a user space application has left the VFP11 in this "pending trap"
state, the next FPU opcode issued to the VFP11 might actually be the
VSTMIA operation vfp_save_state() uses to store the FPU registers
to memory (in our test cases, when building the signal stack frame).
In this case, the kernel crashes as described above.
This patch fixes the problem by making sure that vfp_save_state() is
always entered with FPEXC.EX cleared. (The current value of FPEXC has
already been saved, so this does not corrupt the context. Clearing
FPEXC.EX has no effects on FPINST or FPINST2. Also note that many
callers already modify FPEXC by setting FPEXC.EN before invoking
vfp_save_state().)
This patch also addresses a second problem related to FPEXC.EX: After
returning from signal handling, the kernel reloads the VFP context
from the user mode stack. However, the current code explicitly clears
both FPEXC.EX and FPEXC.FP2V during reload. As VFP11 requires these
bits to be preserved, this patch disables clearing them for VFP
implementations belonging to architecture 1. There should be no
negative side effects: the user can set both bits by executing FPU
opcodes anyway, and while user code may now place arbitrary values
into FPINST and FPINST2 (e.g., non-VFP ARM opcodes) the VFP support
code knows which instructions can be emulated, and rejects other
opcodes with "unhandled bounce" messages, so there should be no
security impact from allowing reloading FPEXC.EX and FPEXC.FP2V.
Signed-off-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
At present there is no mechanism to specify driver load order,
which can lead to deferrals and repeated retries until successful.
Since this situation is expected, reduce the dmesg level to
INFO and mention that the operation will be retried.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The Raspberry Pi firmware looks at the RSTS register to know which
partition to boot from. The reboot syscall command
LINUX_REBOOT_CMD_RESTART2 supports passing in a string argument.
Add support for passing in a partition number 0..63 to boot from.
Partition 63 is a special partiton indicating halt.
If the partition doesn't exist, the firmware falls back to partition 0.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Load driver early since at least bcm2708_fb doesn't support deferred
probing and even if it did, we don't want the video driver deferred.
Support the legacy DMA API which is needed by bcm2708_fb.
Don't mask out channel 2.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
An alternative strategy would be to use "rpi,spidev" instead, but that
would require many Raspberry Pi Device Tree changes.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Add a duplicate irq range with an offset on the hwirq's so the
driver can detect that enable_fiq() is used.
Tested with downstream dwc_otg USB controller driver.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: popcornmix <popcornmix@gmail.com>
SQUASH: smsc95xx: Use dev_mod_addr to set MAC addr
Since adeef3e321 ("net: constify netdev->dev_addr") it has been
illegal to write to the dev_addr MAC address field. Later commits
have added explicit checks that it hasn't been modified by nefarious
means. The dev_addr_mod helper function is the accepted way to change
the dev_addr field, so use it.
Squash with 96c1def63ee1 ("Allow mac address to be set in smsc95xx").
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
smsc95xx is adjusting truesize when it shouldn't, and following a recent patch from Eric this is now triggering warnings.
This patch stops smsc95xx from changing truesize.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
This reverts commit 92516cd97f.
Thi commit "Bluetooth: Always request for user confirmation for Just
Works" prevents BLE devices pairing in (at least) the Raspberry Pi OS
GUI. After reverting it, pairing works again.
If another solution to the problem is found then this reversion will
be removed.
See: https://github.com/raspberrypi/linux/issues/4139
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This reverts commit ffee202a78.
The commit "Bluetooth: Always request for user confirmation for Just
Works" prevents BLE devices pairing in (at least) the Raspberry Pi OS
GUI. After reverting it, pairing works again. Although this companion
commit ("... (LE SC)") has not been demonstrated to be problematic,
it follows the same logic and therefore could affect some use cases.
If another solution to the problem is found then this reversion will
be removed.
See: https://github.com/raspberrypi/linux/issues/4139
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The fw_node pointer has already been retrieved, and using it allows
us to remove a downstream patch to the firmware driver.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This falls under the same "we can reprogram glitch-free as long as we
pause generation" rule as updating the div/frac fields. This can be
used for runtime reclocking of V3D to manage power leakage.
Signed-off-by: Eric Anholt <eric@anholt.net>
The debug text for how many clocks have been registered
uses "%d" with a size_t. Correct it to "%zd".
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
The VPU is responsible for managing the core clock, usually under
direction from the bcm2835-cpufreq driver but not via the clk-bcm2835
driver. Since the core frequency can change without warning, it is
safer to report the maximum clock rate to users of the core clock -
I2C, SPI and the mini UART - to err on the safe side when calculating
clock divisors.
If the DT node for the clock driver includes a reference to the
firmware node, use the firmware API to query the maximum core clock
instead of reading the divider registers.
Prior to this patch, a "100KHz" I2C bus was sometimes clocked at about
160KHz. In particular, switching to the 4.9 kernel was likely to break
SenseHAT usage on a Pi3.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The claim-clocks property can be used to prevent PLLs and dividers
from being marked as critical. It contains a vector of clock IDs,
as defined by dt-bindings/clock/bcm2835.h.
Use this mechanism to claim PLLD_DSI0, PLLD_DSI1, PLLH_AUX and
PLLH_PIX for the vc4_kms_v3d driver.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The VPU configures and relies on several PLLs and dividers. Mark all
enabled dividers and their PLLs as CRITICAL to prevent the kernel from
switching them off.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Add the bare minimum needed to boot BCM2708 from a Device Tree.
Signed-off-by: Noralf Tronnes <notro@tronnes.org>
BCM2708: DT: change 'axi' nodename to 'soc'
Change DT node named 'axi' to 'soc' so it matches ARCH_BCM2835.
The VC4 bootloader fills in certain properties in the 'axi' subtree,
but since this is part of an upstreaming effort, the name is changed.
Signed-off-by: Noralf Tronnes notro@tronnes.org
BCM2708_DT: Correct length of the peripheral space
Use dts-dirs feature for overlays.
The kernel makefiles have a dts-dirs target that is for vendor subdirectories.
Using this fixes the install_dtbs target, which previously did not install the overlays.
BCM270X_DT: configure I2S DMA channels
Signed-off-by: Matthias Reichl <hias@horus.com>
BCM270X_DT: switch to bcm2835-i2s
I2S soundcard drivers with proper devicetree support (i.e. not linking
to the cpu_dai/platform via name but to cpu/platform via of_node)
will work out of the box without any modifications.
When the kernel is compiled without devicetree support the platform
code will instantiate the bcm2708-i2s driver and I2S soundcard drivers
will link to it via name, as before.
Signed-off-by: Matthias Reichl <hias@horus.com>
SDIO-overlay: add poll_once-boolean parameter
Add paramter to toggle sdio-device-polling
done every second or once at boot-time.
Signed-off-by: Patrick Boettcher <patrick.boettcher@posteo.de>
BCM270X_DT: Make mmc overlay compatible with current firmware
The original DT overlay logic followed a merge-then-patch procedure,
i.e. parameters are applied to the loaded overlay before the overlay
is merged into the base DTB. This sequence has been changed to
patch-then-merge, in order to support parameterised node names, and
to protect against bad overlays. As a result, overrides (parameters)
must only target labels in the overlay, but the overlay can obviously target nodes in the base DTB.
mmc-overlay.dts (that switches back to the original mmc sdcard
driver) is the only overlay violating that rule, and this patch
fixes it.
bcm270x_dt: Use the sdhost MMC controller by default
The "mmc" overlay reverts to using the other controller.
squash: Add cprman to dt
BCM270X_DT: Use clk_core for I2C interfaces
BCM270X_DT: Use bcm283x.dtsi, bcm2835.dtsi and bcm2836.dtsi
The mainline Device Tree files are quite close to downstream now.
Let's use bcm283x.dtsi, bcm2835.dtsi and bcm2836.dtsi as base files
for our dts files.
Mainline dts files are based on these files:
bcm2835-rpi.dtsi
bcm2835.dtsi bcm2836.dtsi
bcm283x.dtsi
Current downstream are based on these:
bcm2708.dtsi bcm2709.dtsi bcm2710.dtsi
bcm2708_common.dtsi
This patch introduces this dependency:
bcm2708.dtsi bcm2709.dtsi
bcm2708-rpi.dtsi
bcm270x.dtsi
bcm2835.dtsi bcm2836.dtsi
bcm283x.dtsi
And:
bcm2710.dtsi
bcm2708-rpi.dtsi
bcm270x.dtsi
bcm283x.dtsi
bcm270x.dtsi contains the downstream bcm283x.dtsi diff.
bcm2708-rpi.dtsi is the downstream version of bcm2835-rpi.dtsi.
Other changes:
- The led node has moved from /soc/leds to /leds. This is not a problem
since the label is used to reference it.
- The clk_osc reg property changes from 6 to 3.
- The gpu nodes has their interrupt property set in the base file.
- the clocks label does not point to the /clocks node anymore, but
points to the cprman node. This is not a problem since the overlays
that use the clock node refer to it directly: target-path = "/clocks";
- some nodes now have 2 labels since mainline and downstream differs in
this respect: cprman/clocks, spi0/spi, gpu/vc4.
- some nodes doesn't have an explicit status = "okay" since they're not
disabled in the base file: watchdog and random.
- gpiomem doesn't need an explicit status = "okay".
- bcm2708-rpi-cm.dts got the hpd-gpios property from bcm2708_common.dtsi,
it's now set directly in that file.
- bcm2709-rpi-2-b.dts has the timer node moved from /soc/timer to /timer.
- Removed clock-frequency property on the bcm{2709,2710}.dtsi timer nodes.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
BCM270X_DT: Use raspberrypi-power to turn on USB power
Use the raspberrypi-power driver to turn on USB power.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
BCM270X_DT: Add a .dtbo target, use for overlays
Change the filenames and extensions to keep the pre-DDT style of
overlay (<name>-overlay.dtb) distinct from new ones that use a
different style of local fixups (<name>.dtbo), and to match other
platforms.
The RPi firmware uses the DDTK trailer atom to choose which type of
overlay to use for each kernel.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
BCM270X_DT: Don't generate "linux,phandle" props
The EPAPR standard says to use "phandle" properties to store phandles,
rather than the deprecated "linux,phandle" version. By default, dtc
generates both, but adding "-H epapr" causes it to only generate
"phandle"s, saving some space and clutter.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
BCM270X_DT: Add overlay for enc28j60 on SPI2
Works on SPI2 for compute module
BCM270X_DT: Add midi-uart0 overlay
MIDI requires 31.25kbaud, a baudrate unsupported by Linux. The
midi-uart0 overlay configures uart0 (ttyAMA0) to use a fake clock
so that requesting 38.4kbaud actually gets 31.25kbaud.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
BCM270X_DT: Add i2c-sensor overlay
The i2c-sensor overlay is a container for various pressure and
temperature sensors, currently bmp085 and bmp280. The standalone
bmp085_i2c-sensor overlay is now deprecated.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
BCM270X_DT: overlays/*-overlay.dtb -> overlays/*.dtbo (#1752)
We now create overlays as .dtbo files.
build: support for .dtbo files for dtb overlays
Kernel 4.4.6+ on RaspberryPi support .dtbo files for overlays, instead of .dtb.
Patch the kernel, which has faulty rules to generate .dtbo the way yocto does
Signed-off-by: Herve Jourdain <herve.jourdain@neuf.fr>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
BCM270X: Drop position requirement for CMA in VC4 overlay.
No longer necessary since 2aefcd5761,
and will probably let peeople that want to choose a larger CMA
allocation (particularly on pi0/1).
Signed-off-by: Eric Anholt <eric@anholt.net>
BCM270X_DT: RPi Device Tree tidy
Use the upstream sdhost node, add thermal-zones, and factor out some
common elements.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
kbuild: Silence unhelpful DTC warnings
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
BCM270X_DT: DT build rules no longer arch-specific
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Pi 0-3 have no deep colour support and only 24bpp output,
so max_bpc should remain as 8.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
RGB is a mandatory format for all DVI and HDMI monitors so there's
no need to check for presence of the DRM_COLOR_FORMAT_RGB444 bit in
color_formats.
More importantly this checks breaks working around EDID issues with
eg video=HDMI-A-1:1024x768D or drm.edid_firmware=edid/1024x768.bin
as the RGB444 bit is only set when a valid EDID with digital bit set in
the input byte is present - which isn't the case when no EDID can be
read from the display device at all or with the in-built kernel EDIDs,
which mimic analog (VGA) displays without the digital bit set.
So drop the check, if we output video on the HDMI connector we can
assume that the display can accept 8bit RGB.
Signed-off-by: Matthias Reichl <hias@horus.com>
The HVS can change AXI request mode based on how full the COB
FIFOs are.
Until now the vc4 driver has been relying on the firmware to
have set these to sensible values.
With HVS channel 2 now being used for live video, change the
panic mode for all channels to be explicitly set by the driver,
and the same for all channels.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
With vc4-drv node not being under /soc on Pi4, we need to
adopt the correct DMA parameters from a suitable sub-component.
Add "brcm,bcm2711-hvs" to that list of components.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
This aligns vc4 with Intel, AMD and Synopsis drivers and fixes max bpc
connector property not working as expected on monitors with YCbCr 4:2:2
support but not deep color support.
max_bpc in connector state is clamped at max_bpc from display info and
the latter only takes deep color modes into account so it will always
be 8, even if the display can do 4:2:2 12-bit output.
Signed-off-by: Matthias Reichl <hias@horus.com>
Support displaying DRM_FORMAT_YUV444 and DRM_FORMAT_YVU444 formats.
Tested with kmstest and kodi. e.g.
kmstest -r 1920x1080@60 -f 400x300-YU24
Note: without the shift of width, only half the chroma is fetched,
resulting in correct left half of image and corrupt colours on right half.
The increase in width shouldn't affect fetching of Y data,
as the hardware will clamp at dest width.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
When running a kunit test, the driver runs with a mock device. As such,
any attempt to read or write to a hardware register will fail the
current test immediately.
The dlist allocation management recently introduced will read the
current frame count from the HVS to defer its destruction until a
subsequent frame has been output. This obviously involves a register
read that fails the Kunit tests.
Change the destruction deferral function to destroy the allocation
immediately if we run under kunit. This is essentially harmless since
the main reason for that deferall is to prevent any access to the
hardware dlist while a frame described by that list is rendered. On our
mock driver, we have neither a hardware dlist nor a rendering, so it
doesn't matter.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
We'll need to destroy a dlist allocation in multiple code paths, so
let's move it to a separate function.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
The vc4_hvs_dlist_allocation structure has a list that we don't
initialize when we allocate a new instance.
This makes any call reading the list structure (such as list_empty) fail
with a NULL pointer dereference.
Let's make sure our list is always initialized.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
During normal operations, the cursor position update is done through an
asynchronous plane update, which on the vc4 driver basically just
modifies the right dlist word to move the plane to the new coordinates.
However, when we have the overscan margins setup, we fall back to a
regular commit when we are next to the edges. And since that commit
happens to be on a cursor plane, it's considered a legacy cursor update
by KMS.
The main difference it makes is that it won't wait for its completion
(ie, next vblank) before returning. This means if we have multiple
commits happening in rapid succession, we can have several of them
happening before the next vblank.
In parallel, our dlist allocation is tied to a CRTC state, and each time
we do a commit we end up with a new CRTC state, with the previous one
being freed. This means that we free our previous dlist entry (but don't
clear it though) every time a new one is being committed.
Now, if we were to have two commits happening before the next vblank, we
could end up freeing reusing the same dlist entries before the next
vblank.
Indeed, we would start from an initial state taking, for example, the
dlist entries 10 to 20, then start a commit taking the entries 20 to 30
and setting the dlist pointer to 20, and freeing the dlist entries 10 to
20. However, since we haven't reach vblank yet, the HVS is still using
the entries 10 to 20.
If we were to make a new commit now, chances are the allocator are going
to give the 10 to 20 entries back, and we would change their content to
match the new state. If vblank hasn't happened yet, we just corrupted
the active dlist entries.
A first attempt to solve this was made by creating an intermediate dlist
buffer to store the current (ie, as of the last commit) dlist content,
that we would update each time the HVS is done with a frame. However, if
the interrupt handler missed the vblank window, we would end up copying
our intermediate dlist to the hardware one during the composition,
essentially creating the same issue.
Since making sure that our interrupt handler runs within a fixed,
constrained, time window would require to make Linux a real-time kernel,
this seems a bit out of scope.
Instead, we can work around our original issue by keeping the dlist
slots allocation longer. That way, we won't reuse a dlist slot while
it's still in flight. In order to achieve this, instead of freeing the
dlist slot when its associated CRTC state is destroyed, we'll queue it
in a list.
A naive implementation would free the buffers in that queue when we get
our end of frame interrupt. However, there's still a race since, just
like in the shadow dlist case, we don't control when the handler for
that interrupt is going to run. Thus, we can end up with a commit adding
an old dlist allocation to our queue during the window between our
actual interrupt and when our handler will run. And since that buffer is
still being used for the composition of the current frame, we can't free
it right away, exposing us to the original bug.
Fortunately for us, the hardware provides a frame counter that is
increased each time the first line of a frame is being generated.
Associating the frame counter the image is supposed to go away to the
allocation, and then only deallocate buffers that have a counter below
or equal to the one we see when the deallocation code should prevent the
above race from occuring.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Add a custom property "Output format" that allows the overriding
of the default colourspace choice in the way that the old
firmware hdmi_pixel_encoding property did. If the chosen format is not
supported, then it will still drop back to the older behaviour.
This won't be acceptable to upstream, but it adds back the missing
functionality of hdmi_pixel_encoding.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The CSC requires the coefficients to be swapped around for
4:4:4 mode, but the swap was incorrectly defined.
Correct the swap.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
We regularly get dmesg error reports of:
[ 18.184066] hdmi-audio-codec hdmi-audio-codec.3.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[ 18.184098] MAI: soc_pcm_open() failed (-19)
Currently I get 30 of these when booting to desktop.
We always say, ignore they are harmless, but removing them would be good.
A bit of investigation shows, for me, the errors are all generated by second, unused hdmi interface.
It shows as an alsa device, and pulseaudio attempts to open it (numerous times), generating a kernel
error message each time.
systemctl --user restart pulseaudio.service generates 6 additional error messages.
The error messages all come through:
a009a9c0d7/sound/soc/soc-pcm.c (L39)
which suggests returning ENOTSUPP, rather that ENODEV will be quiet. And indeed it is.
Note: earlier kernels do not have the quiet ENOTSUPP, so additional cherry-picks will be needed to backport
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
See: https://forum.libreelec.tv/thread/24783-tv-avr-turns-back-on-right-after-turning-them-off
While the kernel provides a :D flag for assuming device is connected,
it doesn't stop this function from being called and generating a cec_phys_addr_invalidate
message when hotplug is deasserted.
That message provokes a flurry of CEC messages which for many users results in the TV
switching back on again and it's very hard to get it to stay switched off.
It seems to only occur with an AVR and TV connected but has been observed across a
number of manufacturers.
The issue started with https://github.com/raspberrypi/linux/pull/4371
and this provides an optional way of getting back the old behaviour
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
The missing 32-bit per pixel ABGR and various "RGB with an X value"
formats are added. Change sent by Dave Stevenson.
Signed-off-by: David Plowman <david.plowman@raspberrypi.com>
FKMS doesn't have an HVS and it's expected. Return from the debugfs init
function immediately if we're running with fkms.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Now that cursors are implemented as regular planes, all cursor
movements result in atomic updates. As the firmware-kms driver
doesn't support asynchronous updates, these are synchronous, which
limits the update rate to the screen refresh rate. Xorg seems unaware
of this (or at least of the effect of this), because if the mouse is
configured with a higher update rate than the screen then continuous
mouse movement results in an increasing backlog of mouse events -
cue extreme lag.
Add minimal support for asynchronous updates - limited to cursor
planes - to eliminate the lag.
See: https://github.com/raspberrypi/linux/pull/4971https://github.com/raspberrypi/linux/issues/4988
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Margins may be implemented by scaling the planes, but as there
is no way of intercepting the set_property for a standard property,
and all planes are checked in drm_atomic_check_only before the
connectors, there's now way to add the planes into the state
from the driver.
If the margin properties change, add all corresponding planes to
the state.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The stuff never really worked, and leads to lots of fun because it
out-of-order frees atomic states. Which upsets KASAN, among other
things.
For async updates we now have a more solid solution with the
->atomic_async_check and ->atomic_async_commit hooks. Support for that
for msm and vc4 landed. nouveau and i915 have their own commit
routines, doing something similar.
For everyone else it's probably better to remove the use-after-free
bug, and encourage folks to use the async support instead. The
affected drivers which register a legacy cursor plane and don't either
use the new async stuff or their own commit routine are: amdgpu,
atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use
atomic_async_check/commit done in
commit 674e78acae
Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting
solid legacy cursor support _and_ they are using the atomic helpers in
their fully glory. So we can retire this.
v3: Paper over msm and i915 regression. The complete_all is the only
thing missing afaict.
v4: Rebased on recent kernel, added extra link for vc4 bug.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199425
Link: https://lore.kernel.org/all/20220221134155.125447-9-maxime@cerno.tech/
Cc: mikita.lipski@amd.com
Cc: Michel Dänzer <michel@daenzer.net>
Cc: harry.wentland@amd.com
Cc: Rob Clark <robdclark@gmail.com>
Cc: "Kazlauskas, Nicholas" <nicholas.kazlauskas@amd.com>
Tested-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
When the margins are changed, the dlist needs to be regenerated
with the changed updated dest regions for each of the planes.
Setting the zpos_changed flag is sufficient to trigger that
without doing a full modeset, therefore set it should the
margins be changed.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The HVS always composes in the RGB domain, but there is a colourspace
conversion block on the output to allow for sending YCbCr over the
HDMI interface.
The colourspace on that link is configurable via the "Colorspace"
property on the connector, and that updates the infoframes. There
is also selection of limited or full range based on the mode selected
or an override.
Add code to update the CSC as well so that the metadata matches the
image data.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
mipi_dsi_attach can fail due to resources not being available
yet, therefore do not log error messages should they occur.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The MIPI_DSI_MODE_* flags have fairly terse descriptions and no reference
to the DSI specification as to their exact meaning. Usage has therefore
been rather fluid.
Extend the descriptions and provide references to the part of the
MIPI DSI specification regarding what they mean.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Copy Intel's "Broadcast RGB" property semantics to add manual override
of the HDMI pixel range for monitors that don't abide by the content
of the AVI Infoframe.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Still under investigation, but the conditions under which the HVS
will accept values written to the gamma PWL are not straightforward.
Disable gamma on HVS5 again until it can be resolved to avoid
gamma being enabled with an incorrect table.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add a check to vc4_hvs_gamma_check to ensure a new non-empty
gamma LUT is of the correct length before accepting it.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Two calls were made to drm_crtc_enable_color_mgmt to add gamma
and CTM, however they were both set to add the gamma properties,
so they ended up added twice.
Fixes: 766cc6b1f7 "drm/vc4: Add CTM support"
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
With HVS5 the gamma block is now only reprogrammed with
a disable/enable. Loading the table from vc4_hvs_init_channel
(called from vc4_hvs_atomic_enable) appears to be at an
invalid point in time and so isn't applied.
Switch to enabling and disabling the gamma table instead. This
isn't safe if the pipeline is running, but it isn't now.
For HVS4 it is safe to enable and disable dynamically, so
adopt that approach there too.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
drm_crtc_legacy_gamma_set updates the gamma_lut blob unconditionally,
which leads to unnecessary reprogramming of hardware.
Check whether the blob contents has actually changed before
signalling that it has been updated.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add predefined modelines for the 240p (NTSC) and 288p (PAL) progressive
modes, and report them through vc4_vec_connector_get_modes().
Signed-off-by: Mateusz Kwiatkowski <kfyatek+publicgit@gmail.com>
Make vc4_vec_encoder_atomic_check() accept arbitrary modelines, as long
as they result in somewhat sane output from the VEC. The bounds have
been determined empirically. Additionally, add support for the
progressive 262-line and 312-line modes.
Signed-off-by: Mateusz Kwiatkowski <kfyatek+publicgit@gmail.com>
The HVS Gamma block can only be updated when idle, so we need to disable
the HVS channel when the gamma property is set in an atomic commit.
Since the pixelvalve cannot have its assigned channel halted without
stalling unless it's disabled as well, in our case that means forcing a
full disable / enable cycle on the pipeline.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
BCM2711 changes from a 256 entry lookup table to a 16 point
piecewise linear function as the pipeline bitdepth has increased
to make a LUT unwieldy.
Implement a simple conversion from a 256 entry LUT that userspace
is likely to expect to 16 evenly spread points in the PWL. This
could be improved with curve fitting at a later date.
Co-developed-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
This is a squash of all firmware-kms related patches from previous
branches, up to and including
"drm/vc4: Set the possible crtcs mask correctly for planes with FKMS"
plus a couple of minor fixups for the 5.9 branch.
Please refer to earlier branches for full history.
This patch includes work by Eric Anholt, James Hughes, Phil Elwell,
Dave Stevenson, Dom Cobley, and Jonathon Bell.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
drm/vc4: Fixup firmware-kms after "drm/atomic: Pass the full state to CRTC atomic enable/disable"
Prototype for those calls changed, so amend fkms (which isn't
upstream) to match.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
drm/vc4: Fixup fkms for API change
Atomic flush and check changed API, so fix up the downstream-only
FKMS driver.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
drm/vc4: Make normalize_zpos conditional on using fkms
Eric's view was that there was no point in having zpos
support on vc4 as all the planes had the same functionality.
Can be later squashed into (and fixes):
drm/vc4: Add firmware-kms mode
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
drm/vc4: FKMS: Change of Broadcast RGB mode needs a mode change
The Broadcast RGB (aka HDMI limited/full range) property is only
notified to the firmware on mode change, so this needs to be
signalled when set.
https://github.com/raspberrypi/firmware/issues/1580
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
vc4/drv: Only notify firmware of display done with kms
fkms driver still wants firmware display to be active
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
ydrm/vc4: fkms: Fix margin calculations for the right/bottom edges
The calculations clipped the right/bottom edge of the clipped
range based on the left/top margins.
https://github.com/raspberrypi/linux/issues/4447
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
drm/vc4: fkms: Use new devm_rpi_firmware_get api
drm/kms: Add allow_fb_modifiers
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Similar to the ch7006 and nouveau drivers, introduce a "tv_mode" module
parameter that allow setting the TV norm by specifying vc4.tv_norm= on
the kernel command line.
If that is not specified, try inferring one of the most popular norms
(PAL or NTSC) from the video mode specified on the command line. On
Raspberry Pis, this causes the most common cases of the sdtv_mode
setting in config.txt to be respected.
Signed-off-by: Mateusz Kwiatkowski <kfyatek+publicgit@gmail.com>
Under FKMS, the firmware (via FKMS) also requires the VideoCore cache
aliases for image planes, as defined by the dma-ranges under /soc.
Add rpi-firmware-kms to the list of acceptable nodes to look for
to copy dma config from.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
commit 3d07fa1dd1 upstream.
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:
# cat /sys/kernel/debug/tracing/trace_pipe
cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy
Zero the allocation of pipe_cpumask to avoid the problem.
Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com
Cc: stable@vger.kernel.org
Fixes: c2489bb7e6 ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes")
Reviewed-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 202e14222f ]
Given the difficulty of auditing all of userspace to figure out whether
every memfd_create() user has switched to passing MFD_EXEC and
MFD_NOEXEC_SEAL flags, it seems far less distruptive to make it possible
for older programs that don't make use of executable memfds to run under
vm.memfd_noexec=2. Otherwise, a small dependency change can result in
spurious errors. For programs that don't use executable memfds, passing
MFD_NOEXEC_SEAL is functionally a no-op and thus having the same
In addition, every failure under vm.memfd_noexec=2 needs to print to the
kernel log so that userspace can figure out where the error came from.
The concerns about pr_warn_ratelimited() spam that caused the switch to
pr_warn_once()[1,2] do not apply to the vm.memfd_noexec=2 case.
This is a user-visible API change, but as it allows programs to do
something that would be blocked before, and the sysctl itself was broken
and recently released, it seems unlikely this will cause any issues.
[1]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/
Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-2-7ff9e3e10ba6@cyphar.com
Fixes: 105ff5339f ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0499942928 ]
Commit 679875d1d8 ("sc16is7xx: Separate GPIOs from modem control lines")
and commit 21144bab4f ("sc16is7xx: Handle modem status lines")
changed the function of the GPIOs pins to act as modem control
lines without any possibility of selecting GPIO function.
As a consequence, applications that depends on GPIO lines configured
by default as GPIO pins no longer work as expected.
Also, the change to select modem control lines function was done only
for channel A of dual UART variants (752/762). This was not documented
in the log message.
Allow to specify GPIO or modem control line function in the device
tree, and for each of the ports (A or B).
Do so by using the new device-tree property named
"nxp,modem-control-line-ports" (property added in separate patch).
When registering GPIO chip controller, mask-out GPIO pins declared as
modem control lines according to this new DT property.
Fixes: 679875d1d8 ("sc16is7xx: Separate GPIOs from modem control lines")
Fixes: 21144bab4f ("sc16is7xx: Handle modem status lines")
Cc: stable@vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Lech Perczak <lech.perczak@camlingroup.com>
Tested-by: Lech Perczak <lech.perczak@camlingroup.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230807214556.540627-5-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 656f9aec07 ]
This is a port of commit 379eb01c21 ("riscv: Ensure the value
of FP registers in the core dump file is up to date").
The values of FP/SIMD registers in the core dump file come from the
thread.fpu. However, kernel saves the FP/SIMD registers only before
scheduling out the process. If no process switch happens during the
exception handling, kernel will not have a chance to save the latest
values of FP/SIMD registers. So it may cause their values in the core
dump file incorrect. To solve this problem, force fpr_get()/simd_get()
to save the FP/SIMD registers into the thread.fpu if the target task
equals the current task.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2545a2c02b upstream.
This code was previously part of the VIDEO_IPU3_CIO2 driver, which could
be built-in or a loadable module, but after the move it turned into a
builtin-only driver. This fails to link when the I2C subsystem is a
module:
x86_64-linux-ld: drivers/media/pci/intel/ipu-bridge.o: in function `ipu_bridge_unregister_sensors':
ipu-bridge.c:(.text+0x50): undefined reference to `i2c_unregister_device'
x86_64-linux-ld: drivers/media/pci/intel/ipu-bridge.o: in function `ipu_bridge_init':
ipu-bridge.c:(.text+0x9c9): undefined reference to `i2c_acpi_new_device_by_fwnode'
In general, drivers should not have to be built-in, so change the option
to a tristate with the corresponding dependency. This in turn opens a
new problem with the dependency, as the IPU bridge can be a loadable module
while the ipu3 driver itself is built-in, producing a new link failure:
86_64-linux-ld: drivers/media/pci/intel/ipu3/ipu3-cio2.o: in function `cio2_pci_probe':
ipu3-cio2.c:(.text+0x197e): undefined reference to `ipu_bridge_init'
In order to fix this, restore the old Kconfig option that controlled
the ipu bridge driver before it was split out, but make it select a
hidden symbol that now corresponds to the bridge driver.
When other drivers get added that share ipu-bridge, this should cover
all corner cases, and allow any combination of them to be built-in
or modular.
Link: https://lore.kernel.org/linux-media/20230727122331.2421453-1-arnd@kernel.org
Fixes: 881ca25978 ("media: ipu3-cio2: rename cio2 bridge to ipu bridge and move out of ipu3")'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f7f984fa8 upstream.
Starting from SPR, the basic uncore PMON information is retrieved from
the discovery table (resides in an MMIO space populated by BIOS). It is
called the discovery method. The existing value of the type->num_boxes
is from the discovery table.
On some SPR variants, there is a firmware bug that makes the value from the
discovery table incorrect. We use the value from the
SPR_MSR_UNC_CBO_CONFIG MSR to replace the one from the discovery table:
38776cc45e ("perf/x86/uncore: Correct the number of CHAs on SPR")
Unfortunately, the SPR_MSR_UNC_CBO_CONFIG isn't available for the EMR
XCC (Always returns 0), but the above firmware bug doesn't impact the
EMR XCC.
Don't let the value from the MSR replace the existing value from the
discovery table.
Fixes: 38776cc45e ("perf/x86/uncore: Correct the number of CHAs on SPR")
Reported-by: Stephane Eranian <eranian@google.com>
Reported-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20230905134248.496114-1-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65e710899f upstream.
With ":text =0xcccc", ld.lld fills unused text area with 0xcccc0000.
Example objdump -D output:
ffffffff82b04203: 00 00 add %al,(%rax)
ffffffff82b04205: cc int3
ffffffff82b04206: cc int3
ffffffff82b04207: 00 00 add %al,(%rax)
ffffffff82b04209: cc int3
ffffffff82b0420a: cc int3
Replace it with ":text =0xcccccccc", so we get the following instead:
ffffffff82b04203: cc int3
ffffffff82b04204: cc int3
ffffffff82b04205: cc int3
ffffffff82b04206: cc int3
ffffffff82b04207: cc int3
ffffffff82b04208: cc int3
gcc/ld doesn't seem to have the same issue. The generated code stays the
same for gcc/ld.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixes: 7705dc8557 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
Link: https://lore.kernel.org/r/20230906175215.2236033-1-song@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d7d72a34e upstream.
On large enclaves we hit the softlockup warning with following call trace:
xa_erase()
sgx_vepc_release()
__fput()
task_work_run()
do_exit()
The latency issue is similar to the one fixed in:
8795359e35 ("x86/sgx: Silence softlockup detection when releasing large enclaves")
The test system has 64GB of enclave memory, and all is assigned to a single VM.
Release of 'vepc' takes a longer time and causes long latencies, which triggers
the softlockup warning.
Add cond_resched() to give other tasks a chance to run and reduce
latencies, which also avoids the softlockup detector.
[ mingo: Rewrote the changelog. ]
Fixes: 540745ddbc ("x86/sgx: Introduce virtual EPC for use by KVM guests")
Reported-by: Yu Zhang <yu.zhang@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Yu Zhang <yu.zhang@ionos.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Haitao Huang <haitao.huang@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59cf445754 upstream.
Commit 85d07c5562 ("USB: core: Unite old scheme and new scheme
descriptor reads") altered the way USB devices are enumerated
following detection, and in the process it messed up the
initialization of SuperSpeed (or faster) devices:
[ 31.650759] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
[ 31.663107] usb 2-1: device descriptor read/8, error -71
[ 31.952697] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 3 using xhci_hcd
[ 31.965122] usb 2-1: device descriptor read/8, error -71
[ 32.080991] usb usb2-port1: attempt power cycle
...
The problem was caused by the commit forgetting that in SuperSpeed or
faster devices, the device descriptor uses a logarithmic encoding of
the bMaxPacketSize0 value. (For some reason I thought the 255 case in
the switch statement was meant for these devices, but it isn't -- it
was meant for Wireless USB and is no longer needed.)
We can fix the oversight by testing for buf->bMaxPacketSize0 = 9
(meaning 512, the actual maxpacket size for ep0 on all SuperSpeed
devices) and straightening out the logic that checks and adjusts our
initial guesses of the maxpacket value.
Reported-and-tested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Closes: https://lore.kernel.org/linux-usb/20230810002257.nadxmfmrobkaxgnz@synopsys.com/
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: 85d07c5562 ("USB: core: Unite old scheme and new scheme descriptor reads")
Link: https://lore.kernel.org/r/8809e6c5-59d5-4d2d-ac8f-6d106658ad73@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff33299ec8 upstream.
Syzbot reported an out-of-bounds read in sysfs.c:read_descriptors():
BUG: KASAN: slab-out-of-bounds in read_descriptors+0x263/0x280 drivers/usb/core/sysfs.c:883
Read of size 8 at addr ffff88801e78b8c8 by task udevd/5011
CPU: 0 PID: 5011 Comm: udevd Not tainted 6.4.0-rc6-syzkaller-00195-g40f71e7cd3c6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
print_report mm/kasan/report.c:462 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:572
read_descriptors+0x263/0x280 drivers/usb/core/sysfs.c:883
...
Allocated by task 758:
...
__do_kmalloc_node mm/slab_common.c:966 [inline]
__kmalloc+0x5e/0x190 mm/slab_common.c:979
kmalloc include/linux/slab.h:563 [inline]
kzalloc include/linux/slab.h:680 [inline]
usb_get_configuration+0x1f7/0x5170 drivers/usb/core/config.c:887
usb_enumerate_device drivers/usb/core/hub.c:2407 [inline]
usb_new_device+0x12b0/0x19d0 drivers/usb/core/hub.c:2545
As analyzed by Khazhy Kumykov, the cause of this bug is a race between
read_descriptors() and hub_port_init(): The first routine uses a field
in udev->descriptor, not expecting it to change, while the second
overwrites it.
Prior to commit 45bf39f8df ("USB: core: Don't hold device lock while
reading the "descriptors" sysfs file") this race couldn't occur,
because the routines were mutually exclusive thanks to the device
locking. Removing that locking from read_descriptors() exposed it to
the race.
The best way to fix the bug is to keep hub_port_init() from changing
udev->descriptor once udev has been initialized and registered.
Drivers expect the descriptors stored in the kernel to be immutable;
we should not undermine this expectation. In fact, this change should
have been made long ago.
So now hub_port_init() will take an additional argument, specifying a
buffer in which to store the device descriptor it reads. (If udev has
not yet been initialized, the buffer pointer will be NULL and then
hub_port_init() will store the device descriptor in udev as before.)
This eliminates the data race responsible for the out-of-bounds read.
The changes to hub_port_init() appear more extensive than they really
are, because of indentation changes resulting from an attempt to avoid
writing to other parts of the usb_device structure after it has been
initialized. Similar changes should be made to the code that reads
the BOS descriptor, but that can be handled in a separate patch later
on. This patch is sufficient to fix the bug found by syzbot.
Reported-and-tested-by: syzbot+18996170f8096c6174d0@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/000000000000c0ffe505fe86c9ca@google.com/#r
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Khazhy Kumykov <khazhy@google.com>
Fixes: 45bf39f8df ("USB: core: Don't hold device lock while reading the "descriptors" sysfs file")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/b958b47a-9a46-4c22-a9f9-e42e42c31251@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de28e469da upstream.
The usb_get_device_descriptor() routine reads the device descriptor
from the udev device and stores it directly in udev->descriptor. This
interface is error prone, because the USB subsystem expects in-memory
copies of a device's descriptors to be immutable once the device has
been initialized.
The interface is changed so that the device descriptor is left in a
kmalloc-ed buffer, not copied into the usb_device structure. A
pointer to the buffer is returned to the caller, who is then
responsible for kfree-ing it. The corresponding changes needed in the
various callers are fairly small.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/d0111bb6-56c1-4f90-adf2-6cfe152f6561@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85d07c5562 upstream.
In preparation for reworking the usb_get_device_descriptor() routine,
it is desirable to unite the two different code paths responsible for
initially determining endpoint 0's maximum packet size in a newly
discovered USB device. Making this determination presents a
chicken-and-egg sort of problem, in that the only way to learn the
maxpacket value is to get it from the device descriptor retrieved from
the device, but communicating with the device to retrieve a descriptor
requires us to know beforehand the ep0 maxpacket size.
In practice this problem is solved in two different ways, referred to
in hub.c as the "old scheme" and the "new scheme". The old scheme
(which is the approach recommended by the USB-2 spec) involves asking
the device to send just the first eight bytes of its device
descriptor. Such a transfer uses packets containing no more than
eight bytes each, and every USB device must have an ep0 maxpacket size
>= 8, so this should succeed. Since the bMaxPacketSize0 field of the
device descriptor lies within the first eight bytes, this is all we
need.
The new scheme is an imitation of the technique used in an early
Windows USB implementation, giving it the happy advantage of working
with a wide variety of devices (some of them at the time would not
work with the old scheme, although that's probably less true now). It
involves making an initial guess of the ep0 maxpacket size, asking the
device to send up to 64 bytes worth of its device descriptor (which is
only 18 bytes long), and then resetting the device to clear any error
condition that might have resulted from the guess being wrong. The
initial guess is determined by the connection speed; it should be
correct in all cases other than full speed, for which the allowed
values are 8, 16, 32, and 64 (in this case the initial guess is 64).
The reason for this patch is that the old- and new-scheme parts of
hub_port_init() use different code paths, one involving
usb_get_device_descriptor() and one not, for their initial reads of
the device descriptor. Since these reads have essentially the same
purpose and are made under essentially the same circumstances, this is
illogical. It makes more sense to have both of them use a common
subroutine.
This subroutine does basically what the new scheme's code did, because
that approach is more general than the one used by the old scheme. It
only needs to know how many bytes to transfer and whether or not it is
being called for the first iteration of a retry loop (in case of
certain time-out errors). There are two main differences from the
former code:
We initialize the bDescriptorType field of the transfer buffer
to 0 before performing the transfer, to avoid possibly
accessing an uninitialized value afterward.
We read the device descriptor into a temporary buffer rather
than storing it directly into udev->descriptor, which the old
scheme implementation used to do.
Since the whole point of this first read of the device descriptor is
to determine the bMaxPacketSize0 value, that is what the new routine
returns (or an error code). The value is stored in a local variable
rather than in udev->descriptor. As a side effect, this necessitates
moving a section of code that checks the bcdUSB field for SuperSpeed
devices until after the full device descriptor has been retrieved.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/r/495cb5d4-f956-4f4a-a875-1e67e9489510@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f236433064 upstream.
Some usb hubs will negotiate DisplayPort Alt mode with the device
but will then negotiate a data role swap after entering the alt
mode. The data role swap causes the device to unregister all alt
modes, however the usb hub will still send Attention messages
even after failing to reregister the Alt Mode. type_altmode_attention
currently does not verify whether or not a device's altmode partner
exists, which results in a NULL pointer error when dereferencing
the typec_altmode and typec_altmode_ops belonging to the altmode
partner.
Verify the presence of a device's altmode partner before sending
the Attention message to the Alt Mode driver.
Fixes: 8a37d87d72 ("usb: typec: Bus type for alternate modes")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230814180559.923475-1-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c97cd0b4b5 upstream.
When sending Discover Identity messages to a Port Partner that uses Power
Delivery v2 and SVDM v1, we currently send PD v2 messages with SVDM v2.0,
expecting the port partner to respond with its highest supported SVDM
version as stated in Section 6.4.4.2.3 in the Power Delivery v3
specification. However, sending SVDM v2 to some Power Delivery v2 port
partners results in a NAK whereas sending SVDM v1 does not.
NAK messages can be handled by the initiator (PD v3 section 6.4.4.2.5.1),
and one solution could be to resend Discover Identity on a lower SVDM
version if possible. But, Section 6.4.4.3 of PD v2 states that "A NAK
response Should be taken as an indication not to retry that particular
Command."
Instead, we can set the SVDM version to the maximum one supported by the
negotiated PD revision. When operating in PD v2, this obeys Section
6.4.4.2.3, which states the SVDM field "Shall be set to zero to indicate
Version 1.0." In PD v3, the SVDM field "Shall be set to 01b to indicate
Version 2.0."
Fixes: c34e85fa69 ("usb: typec: tcpm: Send DISCOVER_IDENTITY from dedicated work")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230731165926.1815338-1-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e520d0b6be upstream.
Allocate extra space for terminating element at:
drivers/cpufreq/brcmstb-avs-cpufreq.c:
449 table[i].frequency = CPUFREQ_TABLE_END;
and add code comment to make this clear.
This fixes the following -Warray-bounds warning seen after building
ARM with multi_v7_defconfig (GCC 13):
In function 'brcm_avs_get_freq_table',
inlined from 'brcm_avs_cpufreq_init' at drivers/cpufreq/brcmstb-avs-cpufreq.c:623:15:
drivers/cpufreq/brcmstb-avs-cpufreq.c:449:28: warning: array subscript 5 is outside array bounds of 'void[60]' [-Warray-bounds=]
449 | table[i].frequency = CPUFREQ_TABLE_END;
In file included from include/linux/node.h:18,
from include/linux/cpu.h:17,
from include/linux/cpufreq.h:12,
from drivers/cpufreq/brcmstb-avs-cpufreq.c:44:
In function 'devm_kmalloc_array',
inlined from 'devm_kcalloc' at include/linux/device.h:328:9,
inlined from 'brcm_avs_get_freq_table' at drivers/cpufreq/brcmstb-avs-cpufreq.c:437:10,
inlined from 'brcm_avs_cpufreq_init' at drivers/cpufreq/brcmstb-avs-cpufreq.c:623:15:
include/linux/device.h:323:16: note: at offset 60 into object of size 60 allocated by 'devm_kmalloc'
323 | return devm_kmalloc(dev, bytes, flags);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -Warray-bounds.
Link: https://github.com/KSPP/linux/issues/324
Fixes: de322e0859 ("cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9c83f71ee upstream.
We were reading the length of the scatterlist sg after copying value of
tsg inside.
So we are using the size of the previous scatterlist and for the first
one we are using an unitialised value.
Fix this by copying tsg in sg[0] before reading the size.
Fixes : 8a1012d3f2 ("crypto: stm32 - Support for STM32 HASH module")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Bourgoin <thomas.bourgoin@foss.st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7cf224246 upstream.
Building dasd_eckd.o with latest clang reveals this bug:
CC drivers/s390/block/dasd_eckd.o
drivers/s390/block/dasd_eckd.c:1082:3: warning: 'snprintf' will always be truncated;
specified size is 1, but format string expands to at least 11 [-Wfortify-source]
1082 | snprintf(print_uid, sizeof(*print_uid),
| ^
drivers/s390/block/dasd_eckd.c:1087:3: warning: 'snprintf' will always be truncated;
specified size is 1, but format string expands to at least 10 [-Wfortify-source]
1087 | snprintf(print_uid, sizeof(*print_uid),
| ^
Fix this by moving and using the existing UID_STRLEN for the arrays
that are being written to. Also rename UID_STRLEN to DASD_UID_STRLEN
to clarify its scope.
Fixes: 23596961b4 ("s390/dasd: split up dasd_eckd_read_conf")
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1923
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230828153142.2843753-2-hca@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea5717cb13 upstream.
OS installers are relying on /sys/firmware/ipl/has_secure to be
present on machines supporting secure boot. This file is present
for all IPL types, but not the unknown type, which prevents a secure
installation when an LPAR is booted in HMC via FTP(s), because
this is an unknown IPL type in linux. While at it, also add the secure
file.
Fixes: c9896acc78 ("s390/ipl: Provide has_secure sysfs attribute")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8f40a0bcc upstream.
Commit fb08a1908c ("dax: simplify the dax_device <-> gendisk
association") introduced new logic for gendisk association, requiring
drivers to explicitly call dax_add_host() and dax_remove_host().
For dcssblk driver, some dax_remove_host() calls were missing, e.g. in
device remove path. The commit also broke error handling for out_dax case
in device add path, resulting in an extra put_device() w/o the previous
get_device() in that case.
This lead to stale xarray entries after device add / remove cycles. In the
case when a previously used struct gendisk pointer (xarray index) would be
used again, because blk_alloc_disk() happened to return such a pointer, the
xa_insert() in dax_add_host() would fail and go to out_dax, doing the extra
put_device() in the error path. In combination with an already flawed error
handling in dcssblk (device_register() cleanup), which needs to be
addressed in a separate patch, this resulted in a missing device_del() /
klist_del(), and eventually in the kernel crash with list_add corruption on
a subsequent device_add() / klist_add().
Fix this by adding the missing dax_remove_host() calls, and also move the
put_device() in the error path to restore the previous logic.
Fixes: fb08a1908c ("dax: simplify the dax_device <-> gendisk association")
Cc: <stable@vger.kernel.org> # 5.17+
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f741bd7178 upstream.
iov_iter_extract_pages() doesn't correctly handle skipping over initial
zero-length entries in ITER_KVEC and ITER_BVEC-type iterators.
The problem is that it accidentally reduces maxsize to 0 when it
skipping and thus runs to the end of the array and returns 0.
Fix this by sticking the calculated size-to-copy in a new variable
rather than back in maxsize.
Fixes: 7d58fe7310 ("iov_iter: Add a function to extract a page list from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cd474e573 upstream.
Interrupts are blocked in SDEI context, per the SDEI spec: "The client
interrupts cannot preempt the event handler." If we crashed in the SDEI
handler-running context (as with ACPI's AGDI) then we need to clean up the
SDEI state before proceeding to the crash kernel so that the crash kernel
can have working interrupts.
Track the active SDEI handler per-cpu so that we can COMPLETE_AND_RESUME
the handler, discarding the interrupted context.
Fixes: f5df269618 ("arm64: kernel: Add arch-specific SDEI entry code and CPU masking")
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Cc: stable@vger.kernel.org
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: Mihai Carabas <mihai.carabas@oracle.com>
Link: https://lore.kernel.org/r/20230627002939.2758-1-scott@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe8c3623ab upstream.
After commit 30696378f6 ("pstore/ram: Do not treat empty buffers as
valid"), initialization would assume a prz was valid after seeing that
the buffer_size is zero (regardless of the buffer start position). This
unchecked start value means it could be outside the bounds of the buffer,
leading to future access panics when written to:
sysdump_panic_event+0x3b4/0x5b8
atomic_notifier_call_chain+0x54/0x90
panic+0x1c8/0x42c
die+0x29c/0x2a8
die_kernel_fault+0x68/0x78
__do_kernel_fault+0x1c4/0x1e0
do_bad_area+0x40/0x100
do_translation_fault+0x68/0x80
do_mem_abort+0x68/0xf8
el1_da+0x1c/0xc0
__raw_writeb+0x38/0x174
__memcpy_toio+0x40/0xac
persistent_ram_update+0x44/0x12c
persistent_ram_write+0x1a8/0x1b8
ramoops_pstore_write+0x198/0x1e8
pstore_console_write+0x94/0xe0
...
To avoid this, also check if the prz start is 0 during the initialization
phase. If not, the next prz sanity check case will discover it (start >
size) and zap the buffer back to a sane state.
Fixes: 30696378f6 ("pstore/ram: Do not treat empty buffers as valid")
Cc: Yunlong Xing <yunlong.xing@unisoc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Enlin Mu <enlin.mu@unisoc.com>
Link: https://lore.kernel.org/r/20230801060432.1307717-1-yunlong.xing@unisoc.com
[kees: update commit log with backtrace and clarifications]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b4b53ca0b upstream.
Calls to lookup_user_key() require a corresponding key_put() to
decrement the usage counter. Once it reaches zero, we schedule key GC.
Therefore decrement struct key.usage in alg_set_by_key_serial().
Fixes: 7984ceb134 ("crypto: af_alg - Support symmetric encryption via keyring keys")
Cc: <stable@vger.kernel.org>
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4240e2ebe6 upstream.
The Instruction Fetch (IF) units on current AMD Zen-based systems do not
guarantee a synchronous #MC is delivered for poison consumption errors.
Therefore, MCG_STATUS[EIPV|RIPV] will not be set. However, the
microarchitecture does guarantee that the exception is delivered within
the same context. In other words, the exact rIP is not known, but the
context is known to not have changed.
There is no architecturally-defined method to determine this behavior.
The Code Segment (CS) register is always valid on such IF unit poison
errors regardless of the value of MCG_STATUS[EIPV|RIPV].
Add a quirk to save the CS register for poison consumption from the IF
unit banks.
This is needed to properly determine the context of the error.
Otherwise, the severity grading function will assume the context is
IN_KERNEL due to the m->cs value being 0 (the initialized value). This
leads to unnecessary kernel panics on data poison errors due to the
kernel believing the poison consumption occurred in kernel context.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230814200853.29258-1-yazen.ghannam@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 919dc32095 upstream.
If an fsverity builtin signature is given for a file but the
".fs-verity" keyring is empty, there's no real reason to run the PKCS#7
parser. Skip this to avoid the PKCS#7 attack surface when builtin
signature support is configured into the kernel but is not being used.
This is a hardening improvement, not a fix per se, but I've added
Fixes and Cc stable to get it out to more users.
Fixes: 432434c9f8 ("fs-verity: support builtin file signatures")
Cc: stable@vger.kernel.org
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/20230820173237.2579-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef5b52a631 upstream.
When the hash algorithm for the signature is not available the digest size
is 0 and the signature in the certificate is marked as unsupported.
When validating a self-signed certificate, this needs to be checked,
because otherwise trying to validate the signature will fail with an
warning:
Loading compiled-in X.509 certificates
WARNING: CPU: 0 PID: 1 at crypto/rsa-pkcs1pad.c:537 \
pkcs1pad_verify+0x46/0x12c
...
Problem loading in-kernel X.509 certificate (-22)
Signed-off-by: Thore Sommer <public@thson.de>
Cc: stable@vger.kernel.org # v4.7+
Fixes: 6c2dc5ae4a ("X.509: Extract signature digest and make self-signed cert checks earlier")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 977ad86c2a upstream.
There was a previous attempt to fix an out-of-bounds access in the DCCP
error handlers, but that fix assumed that the error handlers only want
to access the first 8 bytes of the DCCP header. Actually, they also look
at the DCCP sequence number, which is stored beyond 8 bytes, so an
explicit pskb_may_pull() is required.
Fixes: 6706a97fec ("dccp: fix out of bound access in dccp_v4_err()")
Fixes: 1aa9d1a0e7 ("ipv6: dccp: fix out of bound access in dccp_v6_err()")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c53e847ff upstream.
All posix lock ops, for all lockspaces (gfs2 file systems) are
sent to userspace (dlm_controld) through a single misc device.
The dlm_controld daemon reads the ops from the misc device
and sends them to other cluster nodes using separate, per-lockspace
cluster api communication channels. The ops for a single lockspace
are ordered at this level, so that the results are received in
the same sequence that the requests were sent. When the results
are sent back to the kernel via the misc device, they are again
funneled through the single misc device for all lockspaces. When
the dlm code in the kernel processes the results from the misc
device, these results will be returned in the same sequence that
the requests were sent, on a per-lockspace basis. A recent change
in this request/reply matching code missed the "per-lockspace"
check (fsid comparison) when matching request and reply, so replies
could be incorrectly matched to requests from other lockspaces.
Cc: stable@vger.kernel.org
Reported-by: Barry Marson <bmarson@redhat.com>
Fixes: 57e2c2f2d9 ("fs: dlm: fix mismatch of plock results from userspace")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72105dcfa3 upstream.
A benchmark stress test (12-40 machines x 48hours) found that DCN315 has
cases where DC writes to an indirect register to set the smu clock msg
id, but when we go to read the same indirect register the returned msg
id doesn't match with what we just set it to. So, to fix this retry the
write until the register's value matches with the requested value.
Cc: stable@vger.kernel.org # 6.1+
Fixes: f949039961 ("drm/amd/display: Add DCN315 CLK_MGR")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Fudong Wang <fudong.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe6518d547 upstream.
Memory is allocated for dynamic loading when audio daemon is trying
to attach to audioPD on DSP side. This memory is allocated from
reserved CMA memory region and needs ownership assignment to
new VMID in order to use it from audioPD.
In the current implementation, arguments are not correctly passed
to the scm call which might result in failure of dynamic loading
on audioPD. Added changes to pass correct arguments during daemon
attach request.
Fixes: 0871561055 ("misc: fastrpc: Add support for audiopd")
Cc: stable <stable@kernel.org>
Tested-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230811115643.38578-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f5ba4b3e1 upstream.
The lscpu command is broken since commit cab56b51ec ("parisc: Fix
device names in /proc/iomem") added the PA pathname to all PA
devices, includig the CPUs.
lscpu parses /proc/cpuinfo and now believes it found different CPU
types since every CPU is listed with an unique identifier (PA
pathname).
Fix this problem by simply dropping the PA pathname when listing the
CPUs in /proc/cpuinfo. There is no need to show the pathname in this
procfs file.
Fixes: cab56b51ec ("parisc: Fix device names in /proc/iomem")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ccf61486fe upstream.
Due to an oversight in commit 1b3044e39a ("procfs: fix pthread
cross-thread naming if !PR_DUMPABLE") in switching from REG to NOD,
chmod operations on /proc/thread-self/comm were no longer blocked as
they are on almost all other procfs files.
A very similar situation with /proc/self/environ was used to as a root
exploit a long time ago, but procfs has SB_I_NOEXEC so this is simply a
correctness issue.
Ref: https://lwn.net/Articles/191954/
Ref: 6d76fa58b0 ("Don't allow chmod() on the /proc/<pid>/ files")
Fixes: 1b3044e39a ("procfs: fix pthread cross-thread naming if !PR_DUMPABLE")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Message-Id: <20230713141001.27046-1-cyphar@cyphar.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a721de848 upstream.
Commit a33df75c63 ("block: use an xarray for disk->part_tbl") remove
disk_expand_part_tbl() in add_partition(), which means all kinds of
devices will support extended dynamic `dev_t`.
However, some devices with GENHD_FL_NO_PART are not expected to add or
resize partition.
Fix this by adding check of GENHD_FL_NO_PART before add or resize
partition.
Fixes: a33df75c63 ("block: use an xarray for disk->part_tbl")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230831075900.1725842-1-lilingfeng@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5260bd6d36 upstream.
This reverts commit d5af729dc2.
d5af729dc2 ("PCI: Mark NVIDIA T4 GPUs to avoid bus reset") avoided
Secondary Bus Reset on the T4 because the reset seemed to not work when the
T4 was directly attached to a Root Port.
But NVIDIA thinks the issue is probably related to some issue with the Root
Port, not with the T4. The T4 provides neither PM nor FLR reset, so
masking bus reset compromises this device for assignment scenarios.
Revert d5af729dc2 as requested by Wu Zongyong. This will leave SBR
broken in the specific configuration Wu tested, as it was in v6.5, so Wu
will debug that further.
Link: https://lore.kernel.org/r/ZPqMCDWvITlOLHgJ@wuzongyong-alibaba
Link: https://lore.kernel.org/r/20230908201104.GA305023@bhelgaas
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a7693e6bb upstream.
ntb_transport_tx_free_entry() never returns 0 with the current
calculation. If head == tail, then it would return qp->tx_max_entry.
Change compare to tail >= head and when they are equal, a 0 would be
returned.
Fixes: e74bfeedad ("NTB: Add flow control to the ntb_netdev")
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: renlonglong <ren.longlong@h3c.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc79bd2738 upstream.
The tx tail index is not reset when the link goes down. This causes the
tail index to go out of sync when the link goes down and comes back up.
Refactor the ntb_qp_link_down_reset() and reset the tail index as well.
Fixes: 2849b5d706 ("NTB: Reset transport QP link stats on down")
Reported-by: Yuan Y Lu <yuan.y.lu@intel.com>
Tested-by: Yuan Y Lu <yuan.y.lu@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f195a1a6fe upstream.
Currently when the transport receive packets after netdev has closed the
transport returns error and triggers tx errors to be incremented and
carrier to be stopped. There is no reason to return error if the device is
already closed. Drop the packet and return 0.
Fixes: e26a5843f7 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Reported-by: Yuan Y Lu <yuan.y.lu@intel.com>
Tested-by: Yuan Y Lu <yuan.y.lu@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5694ba13b0 upstream.
For a device with no Power Management Capability, pci_power_up() previously
returned 0 (success) if the platform was able to put the device in D0,
which led to pci_set_full_power_state() trying to read PCI_PM_CTRL, even
though it doesn't exist.
Since dev->pm_cap == 0 in this case, pci_set_full_power_state() actually
read the wrong register, interpreted it as PCI_PM_CTRL, and corrupted
dev->current_state. This led to messages like this in some cases:
pci 0000:01:00.0: Refused to change power state from D3hot to D0
To prevent this, make pci_power_up() always return a negative failure code
if the device lacks a Power Management Capability, even if non-PCI platform
power management has been able to put the device in D0. The failure will
prevent pci_set_full_power_state() from trying to access PCI_PM_CTRL.
Fixes: e200904b27 ("PCI/PM: Split pci_power_up()")
Link: https://lore.kernel.org/r/20230824013738.1894965-1-chenfeiyang@loongson.cn
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ca10f3e31 upstream.
The driver retries certain register reads 3 times if the returned value is
0. This was done because the controller could return 0 for certain
registers if other registers were being accessed concurrently by the BMC.
In certain systems with increased BMC interactions, the register values
returned can be 0 for longer than 3 retries. Change the retry count from 3
to 30 for the affected registers to prevent problems with out-of-band
management.
Fixes: b899202901 ("scsi: mpt3sas: Add separate function for aero doorbell reads")
Cc: stable@vger.kernel.org
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230829090020.5417-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e13d6528b upstream.
I3C masters are expected to support hot-join. This means at initialization
time we might not yet discover any device and this should not be treated
as a fatal error.
During the DAA procedure which happens at probe time, if no device has
joined, all CCC will be NACKed (from a bus perspective). This leads to an
early return with an error code which fails the probe of the master.
Let's avoid this by just telling the core through an I3C_ERROR_M2
return command code that no device was discovered, which is a valid
situation. This way the master will no longer bail out and fail to probe
for a wrong reason.
Cc: stable@vger.kernel.org
Fixes: dd3c52846d ("i3c: master: svc: Add Silvaco I3C master driver")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230831141324.2841525-1-Frank.Li@nxp.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6834c8c59 upstream.
The minimum level of gcc supported for building the kernel is v5.1.
v5.x releases of gcc emitted a three instruction sequence for
-mprofile-kernel:
mflr r0
std r0, 16(r1)
bl _mcount
It is only with the v6.x releases that gcc started emitting the two
instruction sequence for -mprofile-kernel, omitting the second store
instruction.
With the older three instruction sequence, the actual ftrace location
can be the 5th instruction into a function. Update the allowed offset
for ftrace location from 12 to 16 to accommodate the same.
Cc: stable@vger.kernel.org
Fixes: 7af82ff90a ("powerpc/ftrace: Ignore weak functions")
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7b265908a9461e38fc756ef9b569703860a80621.1687166935.git.naveen@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 303be4b335 upstream.
When I do LTP test, LTP test case ksm06 caused panic at
break_ksm_pmd_entry
-> pmd_leaf (Huge page table but False)
-> pte_present (panic)
The reason is pmd_leaf() is not defined, So like commit 501b810467
("mips: mm: add p?d_leaf() definitions") add p?d_leaf() definition for
LoongArch.
Fixes: 09cfefb7fa ("LoongArch: Add memory management")
Cc: stable@vger.kernel.org
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 687eb3c42f upstream.
With introduction of ERI access control in RG.0 base address of the PMU
unit registers has changed. Add support for the new PMU configuration.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45500dc4e0 upstream.
io-wq will retry iopoll even when it failed with -EAGAIN. If that
races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers,
such workers might potentially infinitely spin retrying iopoll again and
again and each time failing on some allocation / waiting / etc. Don't
keep spinning if io-wq is dying.
Fixes: 561fb04a6a ("io_uring: replace workqueue usage with io-wq")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebdfefc09c upstream.
If we setup the ring with SQPOLL, then that polling thread has its
own io-wq setup. This means that if the application uses
IORING_REGISTER_IOWQ_AFF to set the io-wq affinity, we should not be
setting it for the invoking task, but rather the sqpoll task.
Add an sqpoll helper that parks the thread and updates the affinity,
and use that one if we're using SQPOLL.
Fixes: fe76421d1d ("io_uring: allow user configurable IO thread CPU affinity")
Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/discussions/884
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbc0285433 upstream.
It is possible for xa_load() to observe a sibling entry pointing to
another sibling entry. An example:
Thread A: Thread B:
xa_store_range(xa, entry, 188, 191, gfp);
xa_load(xa, 191);
entry = xa_entry(xa, node, 63);
[entry is a sibling of 188]
xa_store_range(xa, entry, 184, 191, gfp);
if (xa_is_sibling(entry))
offset = xa_to_sibling(entry);
entry = xa_entry(xas->xa, node, offset);
[entry is now a sibling of 184]
It is sufficient to go around this loop until we hit a non-sibling entry.
Sibling entries always point earlier in the node, so we are guaranteed
to terminate this search.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Fixes: 6b24ca4a1a ("mm: Use multi-index entries in the page cache")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 847fb80cc0 upstream.
If function pwrdm_read_prev_pwrst() returns -EINVAL, we will end
up accessing array pwrdm->state_counter through negative index
-22. This is wrong and the compiler is legitimately warning us
about this potential problem.
Fix this by sanity checking the value stored in variable _prev_
before accessing array pwrdm->state_counter.
Address the following -Warray-bounds warning:
arch/arm/mach-omap2/powerdomain.c:178:45: warning: array subscript -22 is below array bounds of 'unsigned int[4]' [-Warray-bounds]
Link: https://github.com/KSPP/linux/issues/307
Fixes: ba20bb1269 ("OMAP: PM counter infrastructure.")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/20230607050639.LzbPn%25lkp@intel.com/
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Message-ID: <ZIFVGwImU3kpaGeH@work>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cf1a126de upstream.
Kmemleak reported the following leak info in try_smi_init():
unreferenced object 0xffff00018ecf9400 (size 1024):
comm "modprobe", pid 2707763, jiffies 4300851415 (age 773.308s)
backtrace:
[<000000004ca5b312>] __kmalloc+0x4b8/0x7b0
[<00000000953b1072>] try_smi_init+0x148/0x5dc [ipmi_si]
[<000000006460d325>] 0xffff800081b10148
[<0000000039206ea5>] do_one_initcall+0x64/0x2a4
[<00000000601399ce>] do_init_module+0x50/0x300
[<000000003c12ba3c>] load_module+0x7a8/0x9e0
[<00000000c246fffe>] __se_sys_init_module+0x104/0x180
[<00000000eea99093>] __arm64_sys_init_module+0x24/0x30
[<0000000021b1ef87>] el0_svc_common.constprop.0+0x94/0x250
[<0000000070f4f8b7>] do_el0_svc+0x48/0xe0
[<000000005a05337f>] el0_svc+0x24/0x3c
[<000000005eb248d6>] el0_sync_handler+0x160/0x164
[<0000000030a59039>] el0_sync+0x160/0x180
The problem was that when an error occurred before handlers registration
and after allocating `new_smi->si_sm`, the variable wouldn't be freed in
the error handling afterwards since `shutdown_smi()` hadn't been
registered yet. Fix it by adding a `kfree()` in the error handling path
in `try_smi_init()`.
Cc: stable@vger.kernel.org # 4.19+
Fixes: 7960f18a56 ("ipmi_si: Convert over to a shutdown handler")
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Co-developed-by: GONG, Ruiqi <gongruiqi@huaweicloud.com>
Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com>
Message-Id: <20230629123328.2402075-1-gongruiqi@huaweicloud.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d3c7d2a29 upstream.
Select V4L2_FWNODE and VIDEO_V4L2_SUBDEV_API for all sensor drivers. This
also adds the options to drivers that don't specifically need them, these
are still seldom used drivers using old APIs. The upside is that these
should now all compile --- many drivers have had missing dependencies.
The "menu" is replaced by selectable "menuconfig" to select the needed
V4L2_FWNODE and VIDEO_V4L2_SUBDEV_API options.
Also select MEDIA_CONTROLLER which VIDEO_V4L2_SUBDEV_API effectively
depends on, and add the I2C dependency to the menu.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: stable@vger.kernel.org # for >= 6.1
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 607bcc4213 upstream.
Fix the following smatch warning:
drivers/media/i2c/ccs/ccs-data.c:524 ccs_data_parse_rules() warn: address
of NULL pointer 'rules'
The CCS static data rule parser does not check an if rule has been
obtained before checking for other rule types (which depend on the if
rule). In practice this means parsing invalid CCS static data could lead
to dereferencing a NULL pointer.
Reported-by: Hans Verkuil <hverkuil@xs4all.nl>
Fixes: a6b396f410 ("media: ccs: Add CCS static data parser library")
Cc: stable@vger.kernel.org # for 5.11 and up
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b8272ff4a upstream.
Xiongfeng reported and debugged a self deadlock of the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
CPU1 CPU2
T1 sets cfs_quota
starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU2
T1 initiates offlining of CPU1
Hotplug operation starts
...
'period_timer' expires and is re-enqueued on CPU1
...
take_cpu_down()
CPU1 shuts down and does not handle timers
anymore. They have to be migrated in the
post dead hotplug steps by the control task.
T1 runs the post dead offline operation
T1 is scheduled out
T1 waits for 'period_timer' to expire
T1 waits there forever if it is scheduled out before it can execute the hrtimer
offline callback hrtimers_dead_cpu().
Cure this by delegating the hotplug control operation to a worker thread on
an online CPU. This takes the initiating user space task, which might be
affected by the bandwidth timer, completely out of the picture.
Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c83ad36a18 upstream.
Currently, for double invoke call_rcu(), will dump rcu_head objects memory
info, if the objects is not allocated from the slab allocator, the
vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to
be held, since the call_rcu() can be invoked in interrupt context,
therefore, there is a possibility of spinlock deadlock scenarios.
And in Preempt-RT kernel, the rcutorture test also trigger the following
lockdep warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
3 locks held by swapper/0/1:
#0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
#1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
#2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
irq event stamp: 565512
hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
Preemption disabled at:
[<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xb0
dump_stack+0x14/0x20
__might_resched+0x1aa/0x280
? __pfx_rcu_torture_err_cb+0x10/0x10
rt_spin_lock+0x53/0x130
? find_vmap_area+0x1f/0x70
find_vmap_area+0x1f/0x70
vmalloc_dump_obj+0x20/0x60
mem_dump_obj+0x22/0x90
__call_rcu_common+0x5bf/0x940
? debug_smp_processor_id+0x1b/0x30
call_rcu_hurry+0x14/0x20
rcu_torture_init+0x1f82/0x2370
? __pfx_rcu_torture_leak_cb+0x10/0x10
? __pfx_rcu_torture_leak_cb+0x10/0x10
? __pfx_rcu_torture_init+0x10/0x10
do_one_initcall+0x6c/0x300
? debug_smp_processor_id+0x1b/0x30
kernel_init_freeable+0x2b9/0x540
? __pfx_kernel_init+0x10/0x10
kernel_init+0x1f/0x150
ret_from_fork+0x40/0x50
? __pfx_kernel_init+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
The previous patch fixes this by using the deadlock-safe best-effort
version of find_vm_area. However, in case of failure print the fact that
the pointer was a vmalloc pointer so that we print at least something.
Link: https://lkml.kernel.org/r/20230904180806.1002832-2-joel@joelfernandes.org
Fixes: 98f180837a ("mm: Make mem_dump_obj() handle vmalloc() memory")
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1dbd8a849 upstream.
When doing mkfs.xfs on a pmem device, the following warning was
reported:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct
Modules linked in:
CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:submit_bio_noacct+0x340/0x520
......
Call Trace:
<TASK>
? submit_bio_noacct+0xd5/0x520
submit_bio+0x37/0x60
async_pmem_flush+0x79/0xa0
nvdimm_flush+0x17/0x40
pmem_submit_bio+0x370/0x390
__submit_bio+0xbc/0x190
submit_bio_noacct_nocheck+0x14d/0x370
submit_bio_noacct+0x1ef/0x520
submit_bio+0x55/0x60
submit_bio_wait+0x5a/0xc0
blkdev_issue_flush+0x44/0x60
The root cause is that submit_bio_noacct() needs bio_op() is either
WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign
REQ_OP_WRITE when allocating flush bio, so submit_bio_noacct just fail
the flush bio.
Simply fix it by adding the missing REQ_OP_WRITE for flush bio. And we
could fix the flush order issue and do flush optimization later.
Cc: stable@vger.kernel.org # 6.3+
Fixes: b4a6bb3a67 ("block: add a sanity check for non-write flush/fua bios")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Tested-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 358040e380 upstream.
The update of rate_num/den and msbits were factored out to
fixup_unreferenced_params() function to be called explicitly after the
hw_refine or hw_params procedure. It's called from
snd_pcm_hw_refine_user(), but it's forgotten in the PCM compat ioctl.
This ended up with the incomplete rate_num/den and msbits parameters
when 32bit compat ioctl is used.
This patch adds the missing call in snd_pcm_ioctl_hw_params_compat().
Reported-by: Meng_Cai@novatek.com.cn
Fixes: f9a076bff0 ("ALSA: pcm: calculate non-mask/non-interval parameters always when possible")
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230829134344.31588-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 669281ee7e upstream.
MGLRU has a LRU list for each zone for each type (anon/file) in each
generation:
long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
The min_seq (oldest generation) can progress independently for each
type but the max_seq (youngest generation) is shared for both anon and
file. This is to maintain a common frame of reference.
In order for eviction to advance the min_seq of a type, all the per-zone
lists in the oldest generation of that type must be empty.
The eviction logic only considers pages from eligible zones for
eviction or promotion.
scan_folios() {
...
for (zone = sc->reclaim_idx; zone >= 0; zone--) {
...
sort_folio(); // Promote
...
isolate_folio(); // Evict
}
...
}
Consider the system has the movable zone configured and default 4
generations. The current state of the system is as shown below
(only illustrating one type for simplicity):
Type: ANON
Zone DMA32 Normal Movable Device
Gen 0 0 0 4GB 0
Gen 1 0 1GB 1MB 0
Gen 2 1MB 4GB 1MB 0
Gen 3 1MB 1MB 1MB 0
Now consider there is a GFP_KERNEL allocation request (eligible zone
index <= Normal), evict_folios() will return without doing any work
since there are no pages to scan in the eligible zones of the oldest
generation. Reclaim won't make progress until triggered from a ZONE_MOVABLE
allocation request; which may not happen soon if there is a lot of free
memory in the movable zone. This can lead to OOM kills, although there
is 1GB pages in the Normal zone of Gen 1 that we have not yet tried to
reclaim.
This issue is not seen in the conventional active/inactive LRU since
there are no per-zone lists.
If there are no (not enough) folios to scan in the eligible zones, move
folios from ineligible zone (zone_index > reclaim_index) to the next
generation. This allows for the progression of min_seq and reclaiming
from the next generation (Gen 1).
Qualcomm, Mediatek and raspberrypi [1] discovered this issue independently.
[1] https://github.com/raspberrypi/linux/issues/5395
Link: https://lkml.kernel.org/r/20230802025606.346758-1-kaleshsingh@google.com
Fixes: ac35a49023 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Reported-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> [mediatek]
Tested-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Steven Barrett <steven@liquorix.net>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ea35288c8 upstream.
Commit bf5c25d608 ("skbuff: in skb_segment, call zerocopy functions
once per nskb") added the call to zero copy functions in skb_segment().
The change introduced a bug in skb_segment() because skb_orphan_frags()
may possibly change the number of fragments or allocate new fragments
altogether leaving nrfrags and frag to point to the old values. This can
cause a panic with stacktrace like the one below.
[ 193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
[ 193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G O 5.15.123+ #26
[ 193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
[ 194.021892] Call Trace:
[ 194.027422] <TASK>
[ 194.072861] tcp_gso_segment+0x107/0x540
[ 194.082031] inet_gso_segment+0x15c/0x3d0
[ 194.090783] skb_mac_gso_segment+0x9f/0x110
[ 194.095016] __skb_gso_segment+0xc1/0x190
[ 194.103131] netem_enqueue+0x290/0xb10 [sch_netem]
[ 194.107071] dev_qdisc_enqueue+0x16/0x70
[ 194.110884] __dev_queue_xmit+0x63b/0xb30
[ 194.121670] bond_start_xmit+0x159/0x380 [bonding]
[ 194.128506] dev_hard_start_xmit+0xc3/0x1e0
[ 194.131787] __dev_queue_xmit+0x8a0/0xb30
[ 194.138225] macvlan_start_xmit+0x4f/0x100 [macvlan]
[ 194.141477] dev_hard_start_xmit+0xc3/0x1e0
[ 194.144622] sch_direct_xmit+0xe3/0x280
[ 194.147748] __dev_queue_xmit+0x54a/0xb30
[ 194.154131] tap_get_user+0x2a8/0x9c0 [tap]
[ 194.157358] tap_sendmsg+0x52/0x8e0 [tap]
[ 194.167049] handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
[ 194.173631] handle_tx+0xcd/0xe0 [vhost_net]
[ 194.176959] vhost_worker+0x76/0xb0 [vhost]
[ 194.183667] kthread+0x118/0x140
[ 194.190358] ret_from_fork+0x1f/0x30
[ 194.193670] </TASK>
In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
local variable in skb_segment() stale. This resulted in the code hitting
i >= nrfrags prematurely and trying to move to next frag_skb using
list_skb pointer, which was NULL, and caused kernel panic. Move the call
to zero copy functions before using frags and nr_frags.
Fixes: bf5c25d608 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reported-by: Amit Goyal <agoyal@purestorage.com>
Cc: stable@vger.kernel.org
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e994764976 upstream.
sctp_mt_check doesn't validate the flag_count field. An attacker can
take advantage of that to trigger a OOB read and leak memory
information.
Add the field validation in the checkentry function.
Fixes: 2e4e6a17af ("[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables")
Cc: stable@vger.kernel.org
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69c5d284f6 upstream.
The xt_u32 module doesn't validate the fields in the xt_u32 structure.
An attacker may take advantage of this to trigger an OOB read by setting
the size fields with a value beyond the arrays boundaries.
Add a checkentry function to validate the structure.
This was originally reported by the ZDI project (ZDI-CAN-18408).
Fixes: 1b50b8a371 ("[NETFILTER]: Add u32 match")
Cc: stable@vger.kernel.org
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28427f368f upstream.
Fix skb_ensure_writable() size. Don't use nft_tcp_header_pointer() to
make it explicit that pointers point to the packet (not local buffer).
Fixes: 99d1712bc4 ("netfilter: exthdr: tcp option set support")
Fixes: 7890cbea66 ("netfilter: exthdr: add support for tcp option removal")
Cc: stable@vger.kernel.org
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 050d91c03b upstream.
The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can
lead to the use of wrong `CIDR_POS(c)` for calculating array offsets,
which can lead to integer underflow. As a result, it leads to slab
out-of-bound access.
This patch adds back the IP_SET_HASH_WITH_NET0 macro to
ip_set_hash_netportnet to address the issue.
Fixes: 886503f34d ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net")
Suggested-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 915d975b2f upstream.
Blamed commit changed:
ptr = kmalloc(size);
if (ptr)
size = ksize(ptr);
to:
size = kmalloc_size_roundup(size);
ptr = kmalloc(size);
This allowed various crash as reported by syzbot [1]
and Kyle Zeng.
Problem is that if @size is bigger than 0x80000001,
kmalloc_size_roundup(size) returns 2^32.
kmalloc_reserve() uses a 32bit variable (obj_size),
so 2^32 is truncated to 0.
kmalloc(0) returns ZERO_SIZE_PTR which is not handled by
skb allocations.
Following trace can be triggered if a netdev->mtu is set
close to 0x7fffffff
We might in the future limit netdev->mtu to more sensible
limit (like KMALLOC_MAX_SIZE).
This patch is based on a syzbot report, and also a report
and tentative fix from Kyle Zeng.
[1]
BUG: KASAN: user-memory-access in __build_skb_around net/core/skbuff.c:294 [inline]
BUG: KASAN: user-memory-access in __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
Write of size 32 at addr 00000000fffffd10 by task syz-executor.4/22554
CPU: 1 PID: 22554 Comm: syz-executor.4 Not tainted 6.1.39-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Call trace:
dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:279
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:286
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x120/0x1a0 lib/dump_stack.c:106
print_report+0xe4/0x4b4 mm/kasan/report.c:398
kasan_report+0x150/0x1ac mm/kasan/report.c:495
kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189
memset+0x40/0x70 mm/kasan/shadow.c:44
__build_skb_around net/core/skbuff.c:294 [inline]
__alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
alloc_skb include/linux/skbuff.h:1316 [inline]
igmpv3_newpack+0x104/0x1088 net/ipv4/igmp.c:359
add_grec+0x81c/0x1124 net/ipv4/igmp.c:534
igmpv3_send_cr net/ipv4/igmp.c:667 [inline]
igmp_ifc_timer_expire+0x1b0/0x1008 net/ipv4/igmp.c:810
call_timer_fn+0x1c0/0x9f0 kernel/time/timer.c:1474
expire_timers kernel/time/timer.c:1519 [inline]
__run_timers+0x54c/0x710 kernel/time/timer.c:1790
run_timer_softirq+0x28/0x4c kernel/time/timer.c:1803
_stext+0x380/0xfbc
____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:79
call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:84
invoke_softirq kernel/softirq.c:437 [inline]
__irq_exit_rcu+0x1c0/0x4cc kernel/softirq.c:683
irq_exit_rcu+0x14/0x78 kernel/softirq.c:695
el0_interrupt+0x7c/0x2e0 arch/arm64/kernel/entry-common.c:717
__el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:724
el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:729
el0t_64_irq+0x1a0/0x1a4 arch/arm64/kernel/entry.S:584
Fixes: 12d6c1d3a2 ("skbuff: Proactively round up to kmalloc bucket size")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1acfe2c122 ]
In current packed virtqueue implementation, the avail_wrap_counter won't
flip, in the case when the driver supplies a descriptor chain with a
length equals to the queue size; total_sg == vq->packed.vring.num.
Let’s assume the following situation:
vq->packed.vring.num=4
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0
Then the driver adds a descriptor chain containing 4 descriptors.
We expect the following result with avail_wrap_counter flipped:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 1
But, the current implementation gives the following result:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0
To reproduce the bug, you can set a packed queue size as small as
possible, so that the driver is more likely to provide a descriptor
chain with a length equal to the packed queue size. For example, in
qemu run following commands:
sudo qemu-system-x86_64 \
-enable-kvm \
-nographic \
-kernel "path/to/kernel_image" \
-m 1G \
-drive file="path/to/rootfs",if=none,id=disk \
-device virtio-blk,drive=disk \
-drive file="path/to/disk_image",if=none,id=rwdisk \
-device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
indirect_desc=off \
-append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
Inside the VM, create a directory and mount the rwdisk device on it. The
rwdisk will hang and mount operation will not complete.
This commit fixes the wrap counter error by flipping the
packed.avail_wrap_counter, when start of descriptor chain equals to the
end of descriptor chain (head == i).
Fixes: 1ce9e6055f ("virtio_ring: introduce packed ring support")
Signed-off-by: Yuan Yao <yuanyaogoog@chromium.org>
Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ae15aceaa9 ]
We try to build affinity mask via create_affinity_masks()
unconditionally which may lead several issues:
- the affinity mask is not used for parent without affinity support
(only VDUSE support the affinity now)
- the logic of create_affinity_masks() might not work for devices
other than block. For example it's not rare in the networking device
where the number of queues could exceed the number of CPUs. Such
case breaks the current affinity logic which is based on
group_cpus_evenly() who assumes the number of CPUs are not less than
the number of groups. This can trigger a warning[1]:
if (ret >= 0)
WARN_ON(nr_present + nr_others < numgrps);
Fixing this by only build the affinity masks only when
- Driver passes affinity descriptor, driver like virtio-blk can make
sure to limit the number of queues when it exceeds the number of CPUs
- Parent support affinity setting config ops
This help to avoid the warning. More optimizations could be done on
top.
[1]
[ 682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0
[ 682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
[ 682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
[ 682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
[ 682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293
[ 682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000
[ 682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030
[ 682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0
[ 682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800
[ 682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041
[ 682.146692] FS: 00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000
[ 682.146695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0
[ 682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 682.146701] Call Trace:
[ 682.146703] <TASK>
[ 682.146705] ? __warn+0x7b/0x130
[ 682.146709] ? group_cpus_evenly+0x1aa/0x1c0
[ 682.146712] ? report_bug+0x1c8/0x1e0
[ 682.146717] ? handle_bug+0x3c/0x70
[ 682.146721] ? exc_invalid_op+0x14/0x70
[ 682.146723] ? asm_exc_invalid_op+0x16/0x20
[ 682.146727] ? group_cpus_evenly+0x1aa/0x1c0
[ 682.146729] ? group_cpus_evenly+0x15c/0x1c0
[ 682.146731] create_affinity_masks+0xaf/0x1a0
[ 682.146735] virtio_vdpa_find_vqs+0x83/0x1d0
[ 682.146738] ? __pfx_default_calc_sets+0x10/0x10
[ 682.146742] virtnet_find_vqs+0x1f0/0x370
[ 682.146747] virtnet_probe+0x501/0xcd0
[ 682.146749] ? vp_modern_get_status+0x12/0x20
[ 682.146751] ? get_cap_addr.isra.0+0x10/0xc0
[ 682.146754] virtio_dev_probe+0x1af/0x260
[ 682.146759] really_probe+0x1a5/0x410
Fixes: 3dad56823b ("virtio-vdpa: Support interrupt affinity spreading mechanism")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230811091539.1359865-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61bfbf7951 ]
The field 'transition_task' of policy structure is used to track the
task which is performing the frequency transition. Using this field to
print a warning once detect a case where the same task is calling
_begin() again before completing the preivous frequency transition via
the _end().
However, there is a potential race condition in _end() and _begin() APIs
while updating the field 'transition_task' of policy, the scenario is
depicted below:
Task A Task B
/* 1st freq transition */
Invoke _begin() {
...
...
}
/* 2nd freq transition */
Invoke _begin() {
... //waiting for A to
... //clear
... //transition_ongoing
... //in _end() for
... //the 1st transition
|
Change the frequency |
|
Invoke _end() { |
... |
... |
transition_ongoing = false; V
transition_ongoing = true;
transition_task = current;
transition_task = NULL;
... //A overwrites the task
... //performing the transition
... //result in error warning.
}
To fix this race condition, the transition_lock of policy structure is
now acquired before updating policy structure in _end() API. Which ensure
that only one task can update the 'transition_task' field at a time.
Link: https://lore.kernel.org/all/b3c61d8a-d52d-3136-fbf0-d1de9f1ba411@huawei.com/
Fixes: ca654dc3a9 ("cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78e04bbff8 ]
Since the commit referenced in the Fixes: tag below the VMBus client driver
is walking the ACPI namespace up from the VMBus ACPI device to the ACPI
namespace root object trying to find Hyper-V MMIO ranges.
However, if it is not able to find them it ends trying to walk resources of
the ACPI namespace root object itself.
This object has all-ones handle, which causes a NULL pointer dereference
in the ACPI code (from dereferencing this pointer with an offset).
This in turn causes an oops on boot with VMBus host implementations that do
not provide Hyper-V MMIO ranges in their VMBus ACPI device or its
ancestors.
The QEMU VMBus implementation is an example of such implementation.
I guess providing these ranges is optional, since all tested Windows
versions seem to be able to use VMBus devices without them.
Fix this by explicitly terminating the lookup at the ACPI namespace root
object.
Note that Linux guests under KVM/QEMU do not use the Hyper-V PV interface
by default - they only do so if the KVM PV interface is missing or
disabled.
Example stack trace of such oops:
[ 3.710827] ? __die+0x1f/0x60
[ 3.715030] ? page_fault_oops+0x159/0x460
[ 3.716008] ? exc_page_fault+0x73/0x170
[ 3.716959] ? asm_exc_page_fault+0x22/0x30
[ 3.717957] ? acpi_ns_lookup+0x7a/0x4b0
[ 3.718898] ? acpi_ns_internalize_name+0x79/0xc0
[ 3.720018] acpi_ns_get_node_unlocked+0xb5/0xe0
[ 3.721120] ? acpi_ns_check_object_type+0xfe/0x200
[ 3.722285] ? acpi_rs_convert_aml_to_resource+0x37/0x6e0
[ 3.723559] ? down_timeout+0x3a/0x60
[ 3.724455] ? acpi_ns_get_node+0x3a/0x60
[ 3.725412] acpi_ns_get_node+0x3a/0x60
[ 3.726335] acpi_ns_evaluate+0x1c3/0x2c0
[ 3.727295] acpi_ut_evaluate_object+0x64/0x1b0
[ 3.728400] acpi_rs_get_method_data+0x2b/0x70
[ 3.729476] ? vmbus_platform_driver_probe+0x1d0/0x1d0 [hv_vmbus]
[ 3.730940] ? vmbus_platform_driver_probe+0x1d0/0x1d0 [hv_vmbus]
[ 3.732411] acpi_walk_resources+0x78/0xd0
[ 3.733398] vmbus_platform_driver_probe+0x9f/0x1d0 [hv_vmbus]
[ 3.734802] platform_probe+0x3d/0x90
[ 3.735684] really_probe+0x19b/0x400
[ 3.736570] ? __device_attach_driver+0x100/0x100
[ 3.737697] __driver_probe_device+0x78/0x160
[ 3.738746] driver_probe_device+0x1f/0x90
[ 3.739743] __driver_attach+0xc2/0x1b0
[ 3.740671] bus_for_each_dev+0x70/0xc0
[ 3.741601] bus_add_driver+0x10e/0x210
[ 3.742527] driver_register+0x55/0xf0
[ 3.744412] ? 0xffffffffc039a000
[ 3.745207] hv_acpi_init+0x3c/0x1000 [hv_vmbus]
Fixes: 7f163a6fd9 ("drivers:hv: Modify hv_vmbus to search for all MMIO ranges available.")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/fd8e64ceeecfd1d95ff49021080cf699e88dbbde.1691606267.git.maciej.szmigiero@oracle.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8cae665743 ]
There are two issues in the current PRS disable sysfs store function
wq_prs_disable_store():
1. Since PRS disable knob is invisible if PRS disable is not supported
in WQ, it's redundant to check PRS support again in the store function
again. Remove the redundant PRS support check.
2. Since PRS disable is read-only when the device is not configurable,
PRS disable cannot be changed on the device. Add device configurable
check in the store function.
Fixes: f2dc327131 ("dmaengine: idxd: add per wq PRS disable")
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230811012635.535413-2-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db4bfcba7b ]
Use "select" to ensure that the required kconfig symbols are set
as expected.
Drop HOSTAUDIO since it is now equivalent to UML_SOUND.
Set CONFIG_SOUND=m in ARCH=um defconfig files to maintain the
status quo of the default configs.
Allow SOUND with UML regardless of HAS_IOMEM. Otherwise there is a
kconfig warning for unmet dependencies. (This was not an issue when
SOUND was defined in arch/um/drivers/Kconfig. I have done 50 randconfig
builds and didn't find any issues.)
This fixes build errors when CONFIG_SOUND is not set:
ld: arch/um/drivers/hostaudio_kern.o: in function `hostaudio_cleanup_module':
hostaudio_kern.c:(.exit.text+0xa): undefined reference to `unregister_sound_mixer'
ld: hostaudio_kern.c:(.exit.text+0x15): undefined reference to `unregister_sound_dsp'
ld: arch/um/drivers/hostaudio_kern.o: in function `hostaudio_init_module':
hostaudio_kern.c:(.init.text+0x19): undefined reference to `register_sound_dsp'
ld: hostaudio_kern.c:(.init.text+0x31): undefined reference to `register_sound_mixer'
ld: hostaudio_kern.c:(.init.text+0x49): undefined reference to `unregister_sound_dsp'
and this kconfig warning:
WARNING: unmet direct dependencies detected for SOUND
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: d886e87cb8 ("sound: make OSS sound core optional")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: lore.kernel.org/r/202307141416.vxuRVpFv-lkp@intel.com
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-um@lists.infradead.org
Cc: Tejun Heo <tj@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: linux-kbuild@vger.kernel.org
Cc: alsa-devel@alsa-project.org
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 730094577e ]
The tty LED trigger uses the obsolete LED_ON & LED_OFF constants when
setting LED brightness. This is bad because the LED_ON constant is equal
to 1, and so when activating the tty LED trigger on a LED class device
with max_brightness greater than 1, the LED is dimmer than it can be
(when max_brightness is 255, the LED is very dimm indeed; some devices
translate 1/255 to 0, so the LED is OFF all the time).
Instead of directly setting brightness to a specific value, use the
led_blink_set_oneshot() function from LED core to configure the blink.
This function takes the current configured brightness as blink
brightness if not zero, and max brightness otherwise.
This also changes the behavior of the TTY LED trigger. Previously if
rx/tx stats kept changing, the LED was ON all the time they kept
changing. With this patch the LED will blink on TTY activity.
Fixes: fd4a641ac8 ("leds: trigger: implement a tty trigger")
Signed-off-by: Marek Behún <kabel@kernel.org>
Link: https://lore.kernel.org/r/20230802090753.13611-1-kabel@kernel.org
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 065d099f1b ]
Given channel intensity, LED brightness and max LED brightness, the
multicolor LED framework helper led_mc_calc_color_components() computes
the color channel brightness as
chan_brightness = brightness * chan_intensity / max_brightness
Consider the situation when (brightness, intensity, max_brightness) is
for example (16, 15, 255), then chan_brightness is computed to 0
although the fractional divison would give 0.94, which should be rounded
to 1.
Use DIV_ROUND_CLOSEST here for the division to give more realistic
component computation:
chan_brightness = DIV_ROUND_CLOSEST(brightness * chan_intensity,
max_brightness)
Fixes: 55d5d3b46b ("leds: multicolor: Introduce a multicolor class definition")
Signed-off-by: Marek Behún <kabel@kernel.org>
Link: https://lore.kernel.org/r/20230801124931.8661-1-kabel@kernel.org
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4afcb58ea4 ]
nvmem_cell_read_u32() may return -EPROBE_DEFER if NVMEM supplier has not
yet been probed. Future reprobe may succeed, so printing:
i.mx8mm_thermal 30260000.tmu: Failed to read OCOTP nvmem cell (-517).
to the log is confusing. Fix this by using dev_err_probe. This also
elevates the message from warning to error, which is more correct: The
log message is only ever printed in probe error path and probe aborts
afterwards, so it really warrants an error-level message.
Fixes: 4032916488 ("thermal/drivers/imx: Add support for loading calibration data from OCOTP")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Marek Vasut <marex@denx.de>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230708112647.2897294-1-a.fatoum@pengutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2bba1acf7a ]
Each LVTS thermal controller can have up to four sensors, each capable
of triggering its own interrupt when its measured temperature crosses
the configured threshold. The threshold for each sensor is handled
separately by the thermal framework, since each one is registered with
its own thermal zone and trips. However, the temperature thresholds are
configured on the controller, and therefore are shared between all
sensors on that controller.
When the temperature measured by the sensors is different enough to
cause the thermal framework to configure different thresholds for each
one, interrupts start triggering on sensors outside the last threshold
configured.
To address the issue, track the thresholds required by each sensor and
only actually set the highest one in the hardware, and disable
interrupts for all sensors outside the current configured range.
Fixes: f5f633b182 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230706153823.201943-7-nfraprado@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77354eaef8 ]
The thermal framework might leave the low threshold unset if there
aren't any lower trip points. This leaves the register zeroed, which
translates to a very high temperature for the low threshold. The
interrupt for this threshold is then immediately triggered, and the
state machine gets stuck, preventing any other temperature monitoring
interrupts to ever trigger.
(The same happens by not setting the Cold or Hot to Normal thresholds
when using those)
Set the unused threshold to a valid low value. This value was chosen so
that for any valid golden temperature read from the efuse, when the
value is converted to raw and back again to milliCelsius, the result
doesn't underflow.
Fixes: f5f633b182 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230706153823.201943-6-nfraprado@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 487bf099e8 ]
Out of the many interrupts supported by the hardware, the only ones of
interest to the driver currently are:
* The temperature went over the high offset threshold, for any of the
sensors
* The temperature went below the low offset threshold, for any of the
sensors
* The temperature went over the stage3 threshold
These are the only thresholds configured by the driver through the
OFFSETH, OFFSETL, and PROTTC registers, respectively.
The current interrupt mask in LVTS_MONINT_CONF, enables many more
interrupts, including data ready on sensors for both filtered and
immediate mode. These are not only not handled by the driver, but they
are also triggered too often, causing unneeded overhead. Disable these
unnecessary interrupts.
The meaning of each bit can be seen in the comment describing
LVTS_MONINTST in the IRQ handler.
Fixes: f5f633b182 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230706153823.201943-5-nfraprado@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f79e996c7e ]
There are two kinds of temperature monitoring interrupts available:
* High Offset, Low Offset
* Hot, Hot to normal, Cold
The code currently uses the hot/h2n/cold interrupts, however in a way
that doesn't work: the cold threshold is left uninitialized, which
prevents the other thresholds from ever triggering, and the h2n
interrupt is used as the lower threshold, which prevents the hot
interrupt from triggering again after the thresholds are updated by the
thermal framework, since a hot interrupt can only trigger again after
the hot to normal interrupt has been triggered.
But better yet than addressing those issues, is to use the high/low
offset interrupts instead. This way only two thresholds need to be
managed, which have a simpler state machine, making them a better match
to the thermal framework's high and low thresholds.
Fixes: f5f633b182 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230706153823.201943-4-nfraprado@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64de162e34 ]
Each controller can be configured to operate on immediate or filtered
mode. On filtered mode, the sensors are enabled by setting the
corresponding bits in MONCTL0, while on immediate mode, by setting
MSRCTL1.
Previously, the code would set MSRCTL1 for all four sensors when
configured to immediate mode, but given that the controller might not
have all four sensors connected, this would cause interrupts to trigger
for non-existent sensors. Fix this by handling the MSRCTL1 register
analogously to the MONCTL0: only enable the sensors that were declared.
Fixes: f5f633b182 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230706153823.201943-3-nfraprado@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 19a1d46bd6 ]
inno_write is used to configure 0xaa reg, that also hold the
POST_PLL_POWER_DOWN bit.
When POST_PLL_REFCLK_SEL_TMDS is configured the power down bit is not
taken into consideration.
Fix this by keeping the power down bit until configuration is complete.
Also reorder the reg write order for consistency.
Fixes: 53706a1168 ("phy: add Rockchip Innosilicon hdmi phy")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20230615171005.2251032-5-jonas@kwiboo.se
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5ef343c1d ]
inno_hdmi_phy_rk3328_clk_recalc_rate() is returning a rate not found
in the pre pll config table when the fractal divider is used.
This can prevent proper power_on because a tmdsclock for the new rate
is not found in the pre pll config table.
Fix this by saving and returning a rounded pixel rate that exist
in the pre pll config table.
Fixes: 53706a1168 ("phy: add Rockchip Innosilicon hdmi phy")
Signed-off-by: Zheng Yang <zhengyang@rock-chips.com>
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20230615171005.2251032-3-jonas@kwiboo.se
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50c5e6f41d ]
Kernel PASID and user PASID are separately enabled. User needs to know the
user PASID enabling status to decide how to use IDXD device in user space.
This is done via the attribute /sys/bus/dsa/devices/dsa0/pasid_enabled.
It's unnecessary for user to know the kernel PASID enabling status because
user won't use the kernel PASID. But instead of showing the user PASID
enabling status, the attribute shows the kernel PASID enabling status. Fix
the issue by showing the user PASID enabling status in the attribute.
Fixes: 42a1b73852 ("dmaengine: idxd: Separate user and kernel pasid enabling")
Signed-off-by: Rex Zhang <rex.zhang@intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230614062706.1743078-1-rex.zhang@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60177390fa ]
brcmnand controller can only access the flash spare area up to certain
bytes based on the ECC level. It can be less than the actual flash spare
area size. For example, for many NAND chip supporting ECC BCH-8, it has
226 bytes spare area. But controller can only uses 218 bytes. So brcmand
driver overrides the mtd oobsize with the controller's accessible spare
area size. When the nand base driver utilizes the nand_device object, it
resets the oobsize back to the actual flash spare aprea size from
nand_memory_organization structure and controller may not able to access
all the oob area as mtd advises.
This change fixes the issue by overriding the oobsize in the
nand_memory_organization structure to the controller's accessible spare
area size.
Fixes: a7ab085d7c ("mtd: rawnand: Initialize the nand_device object")
Signed-off-by: William Zhang <william.zhang@broadcom.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-6-william.zhang@broadcom.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3163f635b2 ]
Warning happened in rb_end_commit() at code:
if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
rb_commit+0x402/0x4a0
Call Trace:
ring_buffer_unlock_commit+0x42/0x250
trace_buffer_unlock_commit_regs+0x3b/0x250
trace_event_buffer_commit+0xe5/0x440
trace_event_buffer_reserve+0x11c/0x150
trace_event_raw_event_sched_switch+0x23c/0x2c0
__traceiter_sched_switch+0x59/0x80
__schedule+0x72b/0x1580
schedule+0x92/0x120
worker_thread+0xa0/0x6f0
It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:
Write on CPU 0 Swap buffer by per_cpu/cpu0/snapshot on CPU 1
-------- --------
tracing_snapshot_write()
[...]
ring_buffer_lock_reserve()
cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
[...]
rb_reserve_next_event()
[...]
ring_buffer_swap_cpu()
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
if (local_read(&cpu_buffer_b->committing))
goto out_dec;
buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
// 2. cpu_buffer has swapped here.
rb_start_commit(cpu_buffer);
if (unlikely(READ_ONCE(cpu_buffer->buffer)
!= buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
[...] // has not changed here.
return NULL;
}
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;
[...]
// 4. Reserve event from 'cpu_buffer_a'.
ring_buffer_unlock_commit()
[...]
cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
rb_commit(cpu_buffer)
rb_end_commit() // 6. WARN for the wrong 'committing' state !!!
Based on above analysis, we can easily reproduce by following testcase:
``` bash
#!/bin/bash
dmesg -n 7
sysctl -w kernel.panic_on_warn=1
TR=/sys/kernel/tracing
echo 7 > ${TR}/buffer_size_kb
echo "sched:sched_switch" > ${TR}/set_event
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
```
To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.
Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes: f1affcaaa8 ("tracing: Add snapshot in the per_cpu trace directories")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c9f4c45c8e ]
The Gather Data Sampling (GDS) vulnerability is common to all Skylake
processors. However, the "client" Skylakes* are now in this list:
https://www.intel.com/content/www/us/en/support/articles/000022396/processors.html
which means they are no longer included for new vulnerabilities here:
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
or in other GDS documentation. Thus, they were not included in the
original GDS mitigation patches.
Mark SKYLAKE and SKYLAKE_L as vulnerable to GDS to match all the
other Skylake CPUs (which include Kaby Lake). Also group the CPUs
so that the ones that share the exact same vulnerabilities are next
to each other.
Last, move SRBDS to the end of each line. This makes it clear at a
glance that SKYLAKE_X is unique. Of the five Skylakes, it is the
only "server" CPU and has a different implementation from the
clients of the "special register" hardware, making it immune to SRBDS.
This makes the diff much harder to read, but the resulting table is
worth it.
I very much appreciate the report from Michael Zhivich about this
issue. Despite what level of support a hardware vendor is providing,
the kernel very much needs an accurate and up-to-date list of
vulnerable CPUs. More reports like this are very welcome.
* Client Skylakes are CPUID 406E3/506E3 which is family 6, models
0x4E and 0x5E, aka INTEL_FAM6_SKYLAKE and INTEL_FAM6_SKYLAKE_L.
Reported-by: Michael Zhivich <mzhivich@akamai.com>
Fixes: 8974eb5882 ("x86/speculation: Add Gather Data Sampling mitigation")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96c1fa04f0 ]
In commit 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") the
new function report_idle_softirq() was created by breaking code out of the
existing can_stop_idle_tick() for kernels v5.18 and newer.
In doing so, the code essentially went from a one conditional:
if (a && b && c)
warn();
to a three conditional:
if (!a)
return;
if (!b)
return;
if (!c)
return;
warn();
But that conversion got the condition for the RT specific
local_bh_blocked() wrong. The original condition was:
!local_bh_blocked()
but the conversion failed to negate it so it ended up as:
if (!local_bh_blocked())
return false;
This issue lay dormant until another fixup for the same commit was added
in commit a7e282c777 ("tick/rcu: Fix bogus ratelimit condition").
This commit realized the ratelimit was essentially set to zero instead
of ten, and hence *no* softirq pending messages would ever be issued.
Once this commit was backported via linux-stable, both the v6.1 and v6.4
preempt-rt kernels started printing out 10 instances of this at boot:
NOHZ tick-stop error: local softirq work is pending, handler #80!!!
Remove the negation and return when local_bh_blocked() evaluates to true to
bring the correct behaviour back.
Fixes: 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Wen Yang <wenyang.linux@foxmail.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230818200757.1808398-1-paul.gortmaker@windriver.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f20d32612 ]
Presently, if a call to logi_dj_recv_send_report() fails, we do
not learn about the error until after sending short
HID_OUTPUT_REPORT with hid_hw_raw_request().
To handle this somewhat unlikely issue, return on error in
logi_dj_recv_send_report() (minding ugly sleep workaround) and
take into account the result of hid_hw_raw_request().
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 6a9ddc8978 ("HID: logitech-dj: enable notifications on connect/disconnect")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Link: https://lore.kernel.org/r/20230613101635.77820-1-n.zhandarovich@fintech.ru
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc202c57e9 ]
When trying to destroy QP or CQ, we first decrease the refcount and
potentially free memory regions allocated for the object and then
request the device to destroy the object. If the device fails, the
object isn't fully destroyed so the user/IB core can try to destroy the
object again which will lead to underflow when trying to decrease an
already zeroed refcount.
Deallocate resources in reverse order of allocating them to safely free
them.
Fixes: ff6629f88c ("RDMA/efa: Do not delay freeing of DMA pages")
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Link: https://lore.kernel.org/r/20230822082725.31719-1-ynachum@amazon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b056327bee ]
The siw_connect can go to err in below after cep is allocated successfully:
1. If siw_cm_alloc_work returns failure. In this case socket is not
assoicated with cep so siw_cep_put can't be called by siw_socket_disassoc.
We need to call siw_cep_put twice since cep->kref is increased once after
it was initialized.
2. If siw_cm_queue_work can't find a work, which means siw_cep_get is not
called in siw_cm_queue_work, so cep->kref is increased twice by siw_cep_get
and when associate socket with cep after it was initialized. So we need to
call siw_cep_put three times (one in siw_socket_disassoc).
3. siw_send_mpareqrep returns error, this scenario is similar as 2.
So we need to remove one siw_cep_put in the error path.
Fixes: 6c52fdc244 ("rdma/siw: connection management")
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230821133255.31111-2-guoqing.jiang@linux.dev
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a19755519 ]
There is a long call chain that &fip->ctlr_lock is acquired by isr
fnic_isr_msix_wq_copy() under hard IRQ context. Thus other process context
code acquiring the lock should disable IRQ, otherwise deadlock could happen
if the IRQ preempts the execution while the lock is held in process context
on the same CPU.
[ISR]
fnic_isr_msix_wq_copy()
-> fnic_wq_copy_cmpl_handler()
-> fnic_fcpio_cmpl_handler()
-> fnic_fcpio_flogi_reg_cmpl_handler()
-> fnic_flush_tx()
-> fnic_send_frame()
-> fcoe_ctlr_els_send()
-> spin_lock_bh(&fip->ctlr_lock)
[Process Context]
1. fcoe_ctlr_timer_work()
-> fcoe_ctlr_flogi_send()
-> spin_lock_bh(&fip->ctlr_lock)
2. fcoe_ctlr_recv_work()
-> fcoe_ctlr_recv_handler()
-> fcoe_ctlr_recv_els()
-> fcoe_ctlr_announce()
-> spin_lock_bh(&fip->ctlr_lock)
3. fcoe_ctlr_recv_work()
-> fcoe_ctlr_recv_handler()
-> fcoe_ctlr_recv_els()
-> fcoe_ctlr_flogi_retry()
-> spin_lock_bh(&fip->ctlr_lock)
4. -> fcoe_xmit()
-> fcoe_ctlr_els_send()
-> spin_lock_bh(&fip->ctlr_lock)
spin_lock_bh() is not enough since fnic_isr_msix_wq_copy() is a
hardirq.
These flaws were found by an experimental static analysis tool I am
developing for irq-related deadlock.
The patch fix the potential deadlocks by spin_lock_irqsave() to disable
hard irq.
Fixes: 794d98e77f ("[SCSI] libfcoe: retry rejected FLOGI to another FCF if possible")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230817074708.7509-1-dg573847474@gmail.com
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb6d73d9ad ]
Currently irdma allows zero-length STAGs to be programmed in HW during
the kernel mode fast register flow. Zero-length MR or STAG registration
disable HW memory length checks.
Improve gaps in bounds checking in irdma by preventing zero-length STAG or
MR registrations except if the IB_PD_UNSAFE_GLOBAL_RKEY is set.
This addresses the disclosure CVE-2023-25775.
Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Christopher Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230818144838.1758-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0a232f1e1 ]
smp_call_function_single() will allocate an IPI interrupt vector to
the target processor and send a function call request to the interrupt
vector. After the target processor receives the IPI interrupt, it will
execute arm_trbe_remove_coresight_cpu() call request in the interrupt
handler.
According to the device_unregister() stack information, if other process
is useing the device, the down_write() may sleep, and trigger deadlocks
or unexpected errors.
arm_trbe_remove_coresight_cpu
coresight_unregister
device_unregister
device_del
kobject_del
__kobject_del
sysfs_remove_dir
kernfs_remove
down_write ---------> it may sleep
Add a helper arm_trbe_disable_cpu() to disable TRBE precpu irq and reset
per TRBE.
Simply call arm_trbe_remove_coresight_cpu() directly without useing the
smp_call_function_single(), which is the same as registering the TRBE
coresight device.
Fixes: 3fbf7f011f ("coresight: sink: Add TRBE driver")
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Link: https://lore.kernel.org/r/20230814093813.19152-2-hejunhao3@huawei.com
[ Remove duplicate cpumask checks during removal ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[ v3 - Remove the operation of assigning NULL to cpudata->drvdata ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230818084052.10116-1-hejunhao3@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20872584b8 ]
For cp error case, there will be dirty meta/node pages remained after
f2fs_write_checkpoint() in f2fs_put_super(), drop them explicitly, and
do sanity check on reference count of dirty pages and inflight IOs.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: eb61c2cca2 ("f2fs: fix to account cp stats correctly")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9bf1dcbdfd ]
As reported, status debugfs entry shows inconsistent GC stats as below:
GC calls: 6008 (BG: 6161)
- data segments : 3053 (BG: 3053)
- node segments : 2955 (BG: 2955)
Total GC calls is larger than BGGC calls, the reason is:
- f2fs_stat_info.call_count accounts total migrated section count
by f2fs_gc()
- f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from
background gc_thread
Another issue is gc_foreground_calls sysfs entry shows total GC call
count rather than FGGC call count.
This patch changes as below for fix:
- account GC calls and migrated segment count separately
- support to account migrated section count if it enables large section
mode
- fix to show correct value in gc_foreground_calls sysfs entry
Fixes: fc7100ea2a ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 958ccbbf1c ]
syzbot reports a f2fs bug as below:
UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3275:19
index 1409 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
inline_data_addr fs/f2fs/f2fs.h:3275 [inline]
__recover_inline_status fs/f2fs/inode.c:113 [inline]
do_read_inode fs/f2fs/inode.c:480 [inline]
f2fs_iget+0x4730/0x48b0 fs/f2fs/inode.c:604
f2fs_fill_super+0x640e/0x80c0 fs/f2fs/super.c:4601
mount_bdev+0x276/0x3b0 fs/super.c:1391
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue was bisected to:
commit d48a7b3a72
Author: Chao Yu <chao@kernel.org>
Date: Mon Jan 9 03:49:20 2023 +0000
f2fs: fix to do sanity check on extent cache correctly
The root cause is we applied both v1 and v2 of the patch, v2 is the right
fix, so it needs to revert v1 in order to fix reported issue.
v1:
commit d48a7b3a72 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/20230109034920.492914-1-chao@kernel.org/
v2:
commit 269d119481 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/20230207134808.1827869-1-chao@kernel.org/
Reported-by: syzbot+601018296973a481f302@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000fcf0690600e4d04d@google.com/
Fixes: d48a7b3a72 ("f2fs: fix to do sanity check on extent cache correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33c7ae8f49 ]
Fix the following smatch warning:
drivers/media/i2c/rdacm21.c:373 ov10640_check_id() error: uninitialized
symbol 'val'.
Initialize 'val' to 0 in the ov10640_check_id() function.
Fixes: 2b821698dc ("media: i2c: rdacm21: Power up OV10640 before OV490")
Reported-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84b4bd7e0d ]
When the ov2680_power_on() "sensor soft reset failed" path is hit during
probe() the WARN() about putting an enabled regulator at
drivers/regulator/core.c:2398 triggers 3 times (once for each regulator),
filling dmesg with backtraces.
Fix this by properly disabling the regulators on ov2680_power_on() errors.
Fixes: 3ee47cad3e ("media: ov2680: Add Omnivision OV2680 sensor driver")
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Acked-by: Rui Miguel Silva <rmfrfs@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0e97a4b4f ]
ov2680_set_fmt() which == V4L2_SUBDEV_FORMAT_TRY was getting
the try_fmt v4l2_mbus_framefmt struct from the passed in sd_state
and then storing the contents of that into the return by reference
format->format struct.
While the right thing to do would be filling format->format based on
the just looked up mode and then store the results of that in
sd_state->pads[0].try_fmt .
Before the previous change introducing ov2680_fill_format() this
resulted in ov2680_set_fmt() which == V4L2_SUBDEV_FORMAT_TRY always
returning the zero-ed out sd_state->pads[0].try_fmt in format->format
breaking callers using this.
After the introduction of ov2680_fill_format() which at least
initializes sd_state->pads[0].try_fmt properly, format->format
is now always being filled with the default 800x600 mode set by
ov2680_init_cfg() independent of the actual requested mode.
Move the filling of format->format with ov2680_fill_format() to
before the if (which == V4L2_SUBDEV_FORMAT_TRY) and then store
the filled in format->format in sd_state->pads[0].try_fmt to
fix this.
Note this removes the fmt local variable because IMHO having a local
variable which points to a sub-struct of one of the function arguments
just leads to confusion when reading the code.
Fixes: 3ee47cad3e ("media: ov2680: Add Omnivision OV2680 sensor driver")
Acked-by: Rui Miguel Silva <rmfrfs@gmail.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d6849b220 ]
Add a ov2680_fill_format() helper function and use this everywhere were
a v4l2_mbus_framefmt struct needs to be filled in so that the driver always
fills it consistently.
This is a preparation patch for fixing ov2680_set_fmt()
which == V4L2_SUBDEV_FORMAT_TRY calls not properly filling in
the passed in v4l2_mbus_framefmt struct.
Note that for ov2680_init_cfg() this now simply always fills
the try_fmt struct of the passed in sd_state. This is correct because
ov2680_init_cfg() is never called with a NULL sd_state so the old
sd_state check is not necessary.
Fixes: 3ee47cad3e ("media: ov2680: Add Omnivision OV2680 sensor driver")
Acked-by: Rui Miguel Silva <rmfrfs@gmail.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e521b9cc1a ]
On ov2680_set_fmt() calls with format->which == V4L2_SUBDEV_FORMAT_TRY,
ov2680_set_fmt() does not talk to the sensor.
So in this case there is no need to lock the sensor->lock mutex or
to check that the sensor is streaming.
Fixes: 3ee47cad3e ("media: ov2680: Add Omnivision OV2680 sensor driver")
Acked-by: Rui Miguel Silva <rmfrfs@gmail.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49c282d5a8 ]
VIDEO_V4L2_SUBDEV_API is now automatically selected in Kconfig
for all sensor drivers. Remove the ifdef CONFIG_VIDEO_V4L2_SUBDEV_API
checks.
This is a preparation patch for fixing ov2680_set_fmt()
which == V4L2_SUBDEV_FORMAT_TRY calls not properly filling in
the passed in v4l2_mbus_framefmt struct.
Fixes: 3ee47cad3e ("media: ov2680: Add Omnivision OV2680 sensor driver")
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Acked-by: Rui Miguel Silva <rmfrfs@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5d08ad330 ]
ov2680_vflip_disable() / ov2680_hflip_disable() pass BIT(0) instead of
0 as value to ov2680_mod_reg().
While fixing this also:
1. Stop having separate enable/disable functions for hflip / vflip
2. Move the is_streaming check, which is unique to hflip / vflip
into the ov2680_set_?flip() functions.
for a nice code cleanup.
Fixes: 3ee47cad3e ("media: ov2680: Add Omnivision OV2680 sensor driver")
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Acked-by: Rui Miguel Silva <rmfrfs@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50a7bad4e0 ]
The index into ov2680_hv_flip_bayer_order[] should be 0-3, but
ov2680_bayer_order() was using 0 + BIT(2) + (BIT(2) << 1) as
max index, while the intention was to use: 0 + 1 + 2 as max index.
Fix the index calculation in ov2680_bayer_order(), while at it
also just use the ctrl values rather then reading them back using
a slow i2c-read transaction.
This also allows making the function void, since there now are
no more i2c-reads to error check.
Note the check for the ctrls being NULL is there to allow
adding an ov2680_fill_format() helper later, which will call
ov2680_set_bayer_order() during probe() before the ctrls are created.
[Sakari Ailus: Change all users of ov2680_set_bayer_order() here]
Fixes: 3ee47cad3e ("media: ov2680: Add Omnivision OV2680 sensor driver")
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Acked-by: Rui Miguel Silva <rmfrfs@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b5a42e6ae ]
Quoting the OV2680 datasheet:
"3.2 exposure and gain control
In the OV2680, the exposure time and gain are set manually from an external
controller. The OV2680 supports manual gain and exposure control only for
normal applications, no auto mode."
And indeed testing with the atomisp_ov2680 fork of ov2680.c has shown that
auto-exposure and auto-gain do not work.
Note that the code setting the auto-exposure flag was broken, callers
of ov2680_exposure_set() were directly passing !!ctrls->auto_exp->val as
"bool auto_exp" value, but ctrls->auto_exp is a menu control with:
enum v4l2_exposure_auto_type {
V4L2_EXPOSURE_AUTO = 0,
V4L2_EXPOSURE_MANUAL = 1,
...
So instead of passing !!ctrls->auto_exp->val they should have been passing
ctrls->auto_exp->val == V4L2_EXPOSURE_AUTO, iow the passed value was
inverted of what it should have been.
Also remove ov2680_g_volatile_ctrl() since without auto support the gain
and exposure controls are not volatile.
This also fixes the control values not being properly applied in
ov2680_mode_set(). The 800x600 mode register-list also sets gain,
exposure and vflip overriding the last set ctrl values.
ov2680_mode_set() does call ov2680_gain_set() and ov2680_exposure_set()
but did this before writing the mode register-list, so these values
would still be overridden by the mode register-list.
Add a v4l2_ctrl_handler_setup() call after writing the mode register-list
to restore all ctrl values. Also remove the ctrls->gain->is_new check from
ov2680_gain_set() so that the gain always gets restored properly.
Last since ov2680_mode_set() now calls v4l2_ctrl_handler_setup(), remove
the v4l2_ctrl_handler_setup() call after ov2680_mode_restore() since
ov2680_mode_restore() calls ov2680_mode_set().
Fixes: 3ee47cad3e ("media: ov2680: Add Omnivision OV2680 sensor driver")
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Acked-by: Rui Miguel Silva <rmfrfs@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a210df337c ]
The initial state of RESETB input signal of OV5640 should be
asserted, i.e. the sensor should be in reset. This is not the
case, make it so.
Since the subsequent assertion of RESETB signal is no longer
necessary and the timing of the power sequencing could be
slightly adjusted, add annotations to the delays which match
OV5640 datasheet rev. 2.03, both:
figure 2-3 power up timing with internal DVDD
figure 2-4 power up timing with external DVDD source
The 5..10ms delay between PWDN assertion and RESETB assertion
is not even documented in the power sequencing diagram, and
with this reset fix, it is no longer even necessary.
Fixes: 19a81c1426 ("[media] add Omnivision OV5640 sensor driver")
Reported-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Tested-by: Jai Luthra <j-luthra@ti.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98cb72d3b9 ]
Set OV5640_REG_IO_MIPI_CTRL00 bit 2 to 1 instead of 0, since 1 means
MIPI CSI2 interface, while 0 means CPI parallel interface.
In the ov5640_set_power_mipi() the interface should obviously be set
to MIPI CSI2 since this functions is used to power up the sensor when
operated in MIPI CSI2 mode. The sensor should not be in CPI mode in
that case.
This fixes a corner case where capturing the first frame on i.MX8MN
with CSI/ISI resulted in corrupted frame.
Fixes: aa4bb8b883 ("media: ov5640: Re-work MIPI startup sequence")
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> # [Test on imx6q]
Signed-off-by: Marek Vasut <marex@denx.de>
Tested-by: Jai Luthra <j-luthra@ti.com> # [Test on bplay, sk-am62]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 276e14e6c3 ]
Some digitizers (notably XP-Pen Artist 24) do not report the Invert
usage when erasing. This causes the device to be permanently stuck with
the BTN_TOOL_RUBBER tool after sending Eraser, as Invert is the only
usage that can release the tool. In this state, Touch and Inrange are
no longer reported to userspace, rendering the pen unusable.
Prior to commit 87562fcd13 ("HID: input: remove the need for
HID_QUIRK_INVERT"), BTN_TOOL_RUBBER was never set and Eraser events were
simply translated into BTN_TOUCH without causing an inconsistent state.
Introduce HID_QUIRK_NOINVERT for such digitizers and detect them during
hidinput_configure_usage(). This quirk causes the tool to be released
as soon as Eraser is reported as not set. Set BTN_TOOL_RUBBER in
input->keybit when mapping Eraser.
Fixes: 87562fcd13 ("HID: input: remove the need for HID_QUIRK_INVERT")
Co-developed-by: Nils Fuhler <nils@nilsfuhler.de>
Signed-off-by: Nils Fuhler <nils@nilsfuhler.de>
Signed-off-by: Illia Ostapyshyn <ostapyshyn@sra.uni-hannover.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 699fb50d99 ]
In the current code, devres_release_all() only gets called if the device
has a bus and has been probed.
This leads to issues when using bus-less or driver-less devices where
the device might never get freed if a managed resource holds a reference
to the device. This is happening in the DRM framework for example.
We should thus call devres_release_all() in the device_del() function to
make sure that the device-managed actions are properly executed when the
device is unregistered, even if it has neither a bus nor a driver.
This is effectively the same change than commit 2f8d16a996 ("devres:
release resources on device_del()") that got reverted by commit
a525a3ddea ("driver core: free devres in device_release") over
memory leaks concerns.
This patch effectively combines the two commits mentioned above to
release the resources both on device_del() and device_release() and get
the best of both worlds.
Fixes: a525a3ddea ("driver core: free devres in device_release")
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20230720-kunit-devm-inconsistencies-test-v3-3-6aa7e074f373@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11e0a7c8e0 ]
Commit 567f97bd38 ("media: ipu3-cio2: support multiple sensors and VCMs
with same HID") introduced an on stack vcm_name and then uses this for
the name field of the software_node struct used for the vcm.
But the software_node struct is much longer lived then the current
stack-frame, so this is no good.
Instead extend the ipu_node_names struct with an extra field to store
the vcm software_node name and use that.
Note this also changes the length of the allocated buffer from
ACPI_ID_LEN + 4 to 16. the name is filled with "<ipu_vcm_types[x]>-%u"
where ipu_vcm_types[x] is not an ACPI_ID. The maximum length of
the strings in the ipu_vcm_types[] array is 11 + 5 bytes for "-255\0"
means 16 bytes are needed in the worst case scenario.
Fixes: 567f97bd38 ("media: ipu3-cio2: support multiple sensors and VCMs with same HID")
Cc: Bingbu Cao <bingbu.cao@intel.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 881ca25978 ]
cio2 bridge was involved along with IPU3. However, in fact all Intel IPUs
besides IPU3 CIO2 need this bridge driver. This patch move bridge driver
out of ipu3 directory and rename as ipu-bridge. Then it can be worked with
IPU3 and other Intel IPUs.
Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Stable-dep-of: 11e0a7c8e0 ("media: ipu-bridge: Do not use on stack memory for software_node.name field")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 284be56931 ]
When ipu_bridge_parse_rotation() and ipu_bridge_parse_orientation() run
sensor->adev is not set yet.
So if either of the dev_warn() calls about unknown values are hit this
will lead to a NULL pointer deref.
Set sensor->adev earlier, with a borrowed ref to avoid making unrolling
on errors harder, to fix this.
Fixes: 485aa3df0d ("media: ipu3-cio2: Parse sensor orientation and rotation")
Cc: Fabian Wüthrich <me@fabwu.ch>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ca2fbab99 ]
CONFIG_VIDEO_IMX_MEDIA isn't needed on arm64 platforms since commit
9f257f502c ("media: imx: Unstage the imx7-media-csi driver") which
moved the last arm64 driver depending on that Kconfig symbol out of
staging. Drop it from the arm64 defconfig.
Fixes: 9f257f502c ("media: imx: Unstage the imx7-media-csi driver")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6283e4834c ]
As per information from Qualcomm [1], this property is not really
supported beyond msm8916 (HFI V1) and some newer HFI versions really
dislike receiving it, going as far as crashing the device.
Only consider toggling it (via the module option) on HFIV1.
While at it, get rid of the global static variable (which defaulted
to zero) which was never explicitly assigned to for V1.
Note: [1] is a reply to the actual message in question, as lore did not
properly receive some of the emails..
[1] https://lore.kernel.org/lkml/955cd520-3881-0c22-d818-13fe9a47e124@linaro.org/
Fixes: 7ed9e0b339 ("media: venus: hfi, vdec: v6 Add IS_V6() to existing IS_V4() if locations")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f33cb49081 ]
The if statement that compares msgs[i].len != 3 is always false because
it is in a code block where msg[i].len is equal to 3. The check is
redundant and can be removed.
As detected by cppcheck static analysis:
drivers/media/usb/go7007/go7007-i2c.c:168:20: warning: Opposite inner
'if' condition leads to a dead code block. [oppositeInnerCondition]
Link: https://lore.kernel.org/linux-media/20230727174007.635572-1-colin.i.king@gmail.com
Fixes: 866b8695d6 ("Staging: add the go7007 video driver")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 948a77aaec ]
The adap_configured() callback was called with the adap->lock mutex
held if the 'configured' argument was false, and without the adap->lock
mutex held if that argument was true.
That was very confusing, and so split this up in a adap_unconfigured()
callback and a high-level configured() callback.
This also makes it easier to understand when the mutex is held: all
low-level adap_* callbacks are called with the mutex held. All other
callbacks are called without that mutex held.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: f1b5716430 ("media: cec: add optional adap_configured callback")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da53c36ddd ]
A potential deadlock was found by Zheng Zhang with a local syzkaller
instance.
The problem is that when a non-blocking CEC transmit is canceled by calling
cec_data_cancel, that in turn can call the high-level received() driver
callback, which can call cec_transmit_msg() to transmit a new message.
The cec_data_cancel() function is called with the adap->lock mutex held,
and cec_transmit_msg() tries to take that same lock.
The root cause is that the received() callback can either be used to pass
on a received message (and then adap->lock is not held), or to report a
canceled transmit (and then adap->lock is held).
This is confusing, so create a new low-level adap_nb_transmit_canceled
callback that reports back that a non-blocking transmit was canceled.
And the received() callback is only called when a message is received,
as was the case before commit f9d0ecbf56 ("media: cec: correctly pass
on reply results") complicated matters.
Reported-by: Zheng Zhang <zheng.zhang@email.ucr.edu>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: f9d0ecbf56 ("media: cec: correctly pass on reply results")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7295a996fd ]
If a duplicate attribute is found using kset_find_obj(),
a reference to that attribute is returned. This means
that we need to dispose it accordingly. Use kobject_put()
to dispose the duplicate attribute in such a case.
Compile-tested only.
Fixes: e8a60aa740 ("platform/x86: Introduce support for Systems Management Driver over WMI for Dell Systems")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230805053610.7106-1-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a3b8e63f8 ]
Even the PCI devices don't support pasid capability, PASID table is
mandatory for a PCI device in scalable mode. However flushing cache
of pasid directory table for these devices are not taken after pasid
table is allocated as the "size" of table is zero. Fix it by
calculating the size by page order.
Found this when reading the code, no real problem encountered for now.
Fixes: 194b3348bd ("iommu/vt-d: Fix PASID directory pointer coherency")
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230616081045.721873-1-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26b7d1a271 ]
smatch reports the warning below:
drivers/infiniband/core/uverbs_std_types_counters.c:110
ib_uverbs_handler_UVERBS_METHOD_COUNTERS_READ() error: 'uattr'
dereferencing possible ERR_PTR()
The return value of uattr maybe ERR_PTR(-ENOENT), fix this by checking
the value of uattr before using it.
Fixes: ebb6796bd3 ("IB/uverbs: Add read counters support")
Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Link: https://lore.kernel.org/r/20230804022525.1916766-1-xiangyang3@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 706efac447 ]
Currently, direct wqe is not supported for wr-list. RoCE driver excludes
direct wqe for wr-list by judging whether the number of wr is 1.
For a wr-list where the second wr is a length-error atomic wr, the
post-send driver handles the first wr and adds 1 to the wr number counter
firstly. While handling the second wr, the driver finds out a length error
and terminates the wr handle process, remaining the counter at 1. This
causes the driver mistakenly judges there is only 1 wr and thus enters
the direct wqe process, carrying the current length-error atomic wqe.
This patch fixes the error by adding a judgement whether the current wr
is a bad wr. If so, use the normal doorbell process but not direct wqe
despite the wr number is 1.
Fixes: 01584a5edc ("RDMA/hns: Add support of direct wqe")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b7867b5b8 ]
Remove kernel-doc warnings:
drivers/iommu/iommu.c:3261: warning: Function parameter or member 'group'
not described in 'iommu_group_release_dma_owner'
drivers/iommu/iommu.c:3261: warning: Excess function parameter 'dev'
description in 'iommu_group_release_dma_owner'
drivers/iommu/iommu.c:3275: warning: Function parameter or member 'dev'
not described in 'iommu_device_release_dma_owner'
drivers/iommu/iommu.c:3275: warning: Excess function parameter 'group'
description in 'iommu_device_release_dma_owner'
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Fixes: 89395ccedb ("iommu: Add device-centric DMA ownership interfaces")
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230731112758.214775-1-wangzhu9@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d48a51286c ]
force_aperture was intended to false only by GART drivers that have an
identity translation outside the aperture. This does not describe sprd, so
add the missing 'force_aperture = true'.
Fixes: b23e4fc4e3 ("iommu: add Unisoc IOMMU basic driver")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf69ef46db ]
Prepare for mt8188 to fix a two IOMMU HWs share pagetable issue.
We have two MM IOMMU HWs in mt8188, one is VPP-IOMMU, the other is
VDO-IOMMU. The 2 MM IOMMU HWs share pagetable don't work in this case:
a) VPP-IOMMU probe firstly.
b) VDO-IOMMU probe.
c) The master for VDO-IOMMU probe (means frstdata is vpp-iommu).
d) The master in another domain probe. No matter it is vdo or vpp.
Then it still create a new pagetable in step d). The problem is
"frstdata->bank[0]->m4u_dom" was not initialized. Then when d) enter, it
still create a new one.
In this patch, we create a new variable "share_dom" for this share
pgtable case, it should be helpful for readable. and put all the share
pgtable logic in the mtk_iommu_domain_finalise.
In mt8195, the master of VPP-IOMMU probes before than VDO-IOMMU
from its dtsi node sequence, we don't see this issue in it. Prepare for
mt8188.
Fixes: 645b87c190 ("iommu/mediatek: Fix 2 HW sharing pgtable issue")
Signed-off-by: Chengci.Xu <chengci.xu@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://lore.kernel.org/r/20230602090227.7264-3-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d20a3a8a32 ]
The driver fails to link when CONFIG_POWER_SUPPLY is disabled:
x86_64-linux-ld: vmlinux.o: in function `cht_wc_extcon_psy_get_prop':
extcon-intel-cht-wc.c:(.text+0x15ccda7): undefined reference to `power_supply_get_drvdata'
x86_64-linux-ld: vmlinux.o: in function `cht_wc_extcon_pwrsrc_event':
extcon-intel-cht-wc.c:(.text+0x15cd3e9): undefined reference to `power_supply_changed'
x86_64-linux-ld: vmlinux.o: in function `cht_wc_extcon_probe':
extcon-intel-cht-wc.c:(.text+0x15cd596): undefined reference to `devm_power_supply_register'
It should be possible to change the driver to not require this at
compile time and still provide other functions, but adding a hard
Kconfig dependency does not seem to have any practical downsides
and is simpler since the option is normally enabled anyway.
Fixes: 66e31186cd ("extcon: intel-cht-wc: Add support for registering a power_supply class-device")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79038a9944 ]
In some randconfig builds, kernfs ends up being disabled, so there is no prototype
for kernfs_generic_poll()
In file included from kernel/sched/build_utility.c:97:
kernel/sched/psi.c:1479:3: error: implicit declaration of function 'kernfs_generic_poll' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
kernfs_generic_poll(t->of, wait);
^
Add a stub helper for it, as we have it for other kernfs functions.
Fixes: aff037078e ("sched/psi: use kernfs polling functions for PSI trigger polling")
Fixes: 147e1a97c4 ("fs: kernfs: add poll file operation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/r/20230724121823.1357562-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e048e9b7a ]
Enable the generic .sync_state callback to ensure there are no
outstanding votes that would waste power.
Generally one would need a bunch of interface clocks to access the QoS
registers when trying to go over all possible nodes during sync_state,
but QCM2290 surprisingly does not seem to require any such handling.
Fixes: 1a14b1ac39 ("interconnect: qcom: Add QCM2290 driver support")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20230720-topic-qcm2290_icc-v2-2-a2ceb9d3e713@linaro.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd380097cd ]
Perf cs_etm session executed unexpectedly when AUX buffer > 1G.
perf record -C 0 -m ,2G -e cs_etm// -- <workload>
[ perf record: Captured and wrote 2.615 MB perf.data ]
Perf only collect about 2M perf data rather than 2G. This is becasuse
the operation, "nr_pages << PAGE_SHIFT", in coresight tmc driver, will
overflow when nr_pages >= 0x80000(correspond to 1G AUX buffer). The
overflow cause buffer allocation to fail, and TMC driver will alloc
minimal buffer size(1M). You can just get about 2M perf data(1M AUX
buffer + perf data header) at least.
Explicit convert nr_pages to 64 bit to avoid overflow.
Fixes: 22f429f19c ("coresight: etm-perf: Add support for ETR backend")
Fixes: 99443ea19e ("coresight: Add generic TMC sg table framework")
Fixes: 2e499bbc1a ("coresight: tmc: implementing TMC-ETF AUX space API")
Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230804081514.120171-2-tianruidong@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38313c6d2a ]
One-element and zero-length arrays are deprecated. So, replace
one-element array in struct irdma_qvlist_info with flexible-array
member.
A patch for this was sent a while ago[1]. However, it seems that, at
the time, the changes were partially folded[2][3], and the actual
flexible-array transformation was omitted. This patch fixes that.
The only binary difference seen before/after changes is shown below:
| drivers/infiniband/hw/irdma/hw.o
| @@ -868,7 +868,7 @@
| drivers/infiniband/hw/irdma/hw.c:484 (discriminator 2)
| size += struct_size(iw_qvlist, qv_info, rf->msix_count);
| 55b: imul $0x45c,%rdi,%rdi
|- 562: add $0x10,%rdi
|+ 562: add $0x4,%rdi
which is, of course, expected as it reflects the mistake made
while folding the patch I've mentioned above.
Worth mentioning is the fact that with this change we save 12 bytes
of memory, as can be inferred from the diff snapshot above. Notice
that:
$ pahole -C rdma_qv_info idrivers/infiniband/hw/irdma/hw.o
struct irdma_qv_info {
u32 v_idx; /* 0 4 */
u16 ceq_idx; /* 4 2 */
u16 aeq_idx; /* 6 2 */
u8 itr_idx; /* 8 1 */
/* size: 12, cachelines: 1, members: 4 */
/* padding: 3 */
/* last cacheline: 12 bytes */
};
Link: https://lore.kernel.org/linux-hardening/20210525230038.GA175516@embeddedor/ [1]
Link: https://lore.kernel.org/linux-hardening/bf46b428deef4e9e89b0ea1704b1f0e5@intel.com/ [2]
Link: https://lore.kernel.org/linux-rdma/20210520143809.819-1-shiraz.saleem@intel.com/T/#u [3]
Fixes: 44d9e52977 ("RDMA/irdma: Implement device initialization definitions")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZMpsQrZadBaJGkt4@work
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d122db2ff ]
If a send packet is dropped by the IP layer in rxe_requester()
the call to rxe_xmit_packet() can fail with err == -EAGAIN.
To recover, the state of the wqe is restored to the state before
the packet was sent so it can be resent. However, the routines
that save and restore the state miss a significnt part of the
variable state in the wqe, the dma struct which is used to process
through the sge table. And, the state is not saved before the packet
is built which modifies the dma struct.
Under heavy stress testing with many QPs on a fast node sending
large messages to a slow node dropped packets are observed and
the resent packets are corrupted because the dma struct was not
restored. This patch fixes this behavior and allows the test cases
to succeed.
Fixes: 3050b99850 ("IB/rxe: Fix race condition between requester and completer")
Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc28f35115 ]
This patch corrects an error in rxe_modify_srq where if the
caller changes the srq size the actual new value is not returned
to the caller since it may be larger than what is requested.
Additionally it open codes the subroutine rcv_wqe_size() which
adds very little value, and makes some whitespace changes.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230620140142.9452-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e0ba8ff467 ]
This patch:
- Moves code to initialize a qp send work queue to a
subroutine named rxe_init_sq.
- Moves code to initialize a qp recv work queue to a
subroutine named rxe_init_rq.
- Moves initialization of qp request and response packet
queues ahead of work queue initialization so that cleanup
of a qp if it is not fully completed can successfully
attempt to drain the packet queues without a seg fault.
- Makes minor whitespace cleanups.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230620135519.9365-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f9608f1887 ]
The global pointer 'sprd_port' may not zero when sprd_probe returns
failure, that is a risk for sprd_port to be accessed afterward, and
may lead to unexpected errors.
For example:
There are two UART ports, UART1 is used for console and configured in
kernel command line, i.e. "console=";
The UART1 probe failed and the memory allocated to sprd_port[1] was
released, but sprd_port[1] was not set to NULL;
In UART2 probe, the same virtual address was allocated to sprd_port[2],
and UART2 probe process finally will go into sprd_console_setup() to
register UART1 as console since it is configured as preferred console
(filled to console_cmdline[]), but the console parameters (sprd_port[1])
belong to UART2.
So move the sprd_port[] assignment to where the port already initialized
can avoid the above issue.
Fixes: b7396a38fb ("tty/serial: Add Spreadtrum sc9836-uart driver support")
Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Link: https://lore.kernel.org/r/20230725064053.235448-1-chunyan.zhang@unisoc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47cd3770e3 ]
There are three places that qla4xxx parses nlattrs:
- qla4xxx_set_chap_entry()
- qla4xxx_iface_set_param()
- qla4xxx_sysfs_ddb_set_param()
and each of them directly converts the nlattr to specific pointer of
structure without length checking. This could be dangerous as those
attributes are not validated and a malformed nlattr (e.g., length 0) could
result in an OOB read that leaks heap dirty data.
Add the nla_len check before accessing the nlattr data and return EINVAL if
the length check fails.
Fixes: 26ffd7b45f ("[SCSI] qla4xxx: Add support to set CHAP entries")
Fixes: 1e9e2be3ee ("[SCSI] qla4xxx: Add flash node mgmt support")
Fixes: 00c31889f7 ("[SCSI] qla4xxx: fix data alignment and use nl helpers")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723080053.3714534-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee0268f230 ]
beiscsi_iface_set_param() parses nlattr with nla_for_each_attr and assumes
every attributes can be viewed as struct iscsi_iface_param_info.
This is not true because there is no any nla_policy to validate the
attributes passed from the upper function iscsi_set_iface_params().
Add the nla_len check before accessing the nlattr data and return EINVAL if
the length check fails.
Fixes: 0e43895ec1 ("[SCSI] be2iscsi: adding functionality to change network settings using iscsiadm")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723075938.3713864-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce51c81700 ]
The functions iscsi_if_set_param() and iscsi_if_set_host_param() convert an
nlattr payload to type char* and then call C string handling functions like
sscanf and kstrdup:
char *data = (char*)ev + sizeof(*ev);
...
sscanf(data, "%d", &value);
However, since the nlattr is provided by the user-space program and the
nlmsg skb is allocated with GFP_KERNEL instead of GFP_ZERO flag (see
netlink_alloc_large_skb() in netlink_sendmsg()), dirty data on the heap can
lead to an OOB access for those string handling functions.
By investigating how the bug is introduced, we find it is really
interesting as the old version parsing code starting from commit
fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up") treated
the nlattr as integer bytes instead of string and had length check in
iscsi_copy_param():
if (ev->u.set_param.len != sizeof(uint32_t))
BUG();
But, since the commit a54a52caad ("[SCSI] iscsi: fixup set/get param
functions"), the code treated the nlattr as C string while forgetting to
add any strlen checks(), opening the possibility of an OOB access.
Fix the potential OOB by adding the strlen() check before accessing the
buf. If the data passes this check, all low-level set_param handlers can
safely treat this buf as legal C string.
Fixes: fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up")
Fixes: 1d9bf13a9c ("[SCSI] iscsi class: add iscsi host set param event")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723075820.3713119-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 971dfcb74a ]
The current NETLINK_ISCSI netlink parsing loop checks every nlmsg to make
sure the length is bigger than sizeof(struct iscsi_uevent) and then calls
iscsi_if_recv_msg().
nlh = nlmsg_hdr(skb);
if (nlh->nlmsg_len < sizeof(*nlh) + sizeof(*ev) ||
skb->len < nlh->nlmsg_len) {
break;
}
...
err = iscsi_if_recv_msg(skb, nlh, &group);
Hence, in iscsi_if_recv_msg() the nlmsg_data can be safely converted to
iscsi_uevent as the length is already checked.
However, in other cases the length of nlattr payload is not checked before
the payload is converted to other data structures. One example is
iscsi_set_path() which converts the payload to type iscsi_path without any
checks:
params = (struct iscsi_path *)((char *)ev + sizeof(*ev));
Whereas iscsi_if_transport_conn() correctly checks the pdu_len:
pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
if ((ev->u.send_pdu.hdr_size > pdu_len) ..
err = -EINVAL;
To sum up, some code paths called in iscsi_if_recv_msg() do not check the
length of the data (see below picture) and directly convert the data to
another data structure. This could result in an out-of-bound reads and heap
dirty data leakage.
_________ nlmsg_len(nlh) _______________
/ \
+----------+--------------+---------------------------+
| nlmsghdr | iscsi_uevent | data |
+----------+--------------+---------------------------+
\ /
iscsi_uevent->u.set_param.len
Fix the issue by adding the length check before accessing it. To clean up
the code, an additional parameter named rlen is added. The rlen is
calculated at the beginning of iscsi_if_recv_msg() which avoids duplicated
calculation.
Fixes: ac20c7bf07 ("[SCSI] iscsi_transport: Added Ping support")
Fixes: 43514774ff ("[SCSI] iscsi class: Add new NETLINK_ISCSI messages for cnic/bnx2i driver.")
Fixes: 1d9bf13a9c ("[SCSI] iscsi class: add iscsi host set param event")
Fixes: 01cb225dad ("[SCSI] iscsi: add target discvery event to transport class")
Fixes: 264faaaa12 ("[SCSI] iscsi: add transport end point callbacks")
Fixes: fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230725024529.428311-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89e637c19b ]
Although the code for residual handling in the SRP initiator follows the
SCSI documentation, that documentation has never been correct. Because
scsi_finish_command() starts from the data buffer length and subtracts the
residual, scsi_set_resid() must not be called if a residual overflow
occurs. Hence remove the scsi_set_resid() calls from the SRP initiator if a
residual overflow occurrs.
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Fixes: 9237f04e12 ("scsi: core: Fix scsi_get/set_resid() interface")
Fixes: e714531a34 ("IB/srp: Fix residual handling")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230724200843.3376570-3-bvanassche@acm.org
Acked-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5eda42aebb ]
The function mxs_phy_is_otg_host() will return true if OTG_ID_VALUE is
0 at USBPHY_CTRL register. However, OTG_ID_VALUE will not reflect the real
state if the ID pin is float, such as Host-only or Type-C cases. The value
of OTG_ID_VALUE is always 1 which means device mode.
This patch will fix the issue by judging the current mode based on
last_event. The controller will update last_event in time.
Fixes: 7b09e67639 ("usb: phy: mxs: refine mxs_phy_disconnect_line")
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20230627110353.1879477-2-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf10b0bb50 ]
If we encounter any error in the vdec_msg_queue_init() then we need
to set "msg_queue->wdma_addr.size = 0;". Normally, this is done
inside the vdec_msg_queue_deinit() function. However, if the
first call to allocate &msg_queue->wdma_addr fails, then the
vdec_msg_queue_deinit() function is a no-op. For that situation, just
set the size to zero explicitly and return.
There were two other error paths which did not clean up before returning.
Change those error paths to goto mem_alloc_err.
Fixes: b199fe46f3 ("media: mtk-vcodec: Add msg queue feature for lat and core architecture")
Fixes: 2f5d0aef37 ("media: mediatek: vcodec: support stateless AV1 decoder")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be40f524b6 ]
The "lat_buf->private_data" needs to be set to NULL to prevent a
double free. How this would happen is if vdec_msg_queue_init() failed
twice in a row and on the second time it failed earlier than on the
first time.
The vdec_msg_queue_init() function has a loop which does:
for (i = 0; i < NUM_BUFFER_COUNT; i++) {
Each iteration initializes one element in the msg_queue->lat_buf[] array
and then the clean up function vdec_msg_queue_deinit() frees each
element of the msg_queue->lat_buf[] array. This clean up code relies
on the assumption that every element is either initialized or zeroed.
Leaving a freed pointer which is non-zero breaks the assumption.
Fixes: b199fe46f3 ("media: mtk-vcodec: Add msg queue feature for lat and core architecture")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dfa2d6e074 ]
"fb_use_list" is used to store used or referenced frame buffers for
vp9 stateful decoder. "NULL" should be returned when getting target
frame buffer failed from "fb_use_list", not a random unexpected one.
Fixes: f77e89854b ("[media] vcodec: mediatek: Add Mediatek VP9 Video Decoder Driver")
Signed-off-by: Irui Wang <irui.wang@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5bd28eae48 ]
the supported_instance_count determine the instance index range,
it shouldn't exceed the bits number of instance_mask,
otherwise the bitops of instance_mask may cross boundaries
Fixes: 9f599f351e ("media: amphion: add vpu core driver")
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b69713f502 ]
the firmware only support low latency mode for h264,
but firmware will notify an event to driver
when one frame is decoded,
if V4L2_CID_MPEG_VIDEO_DEC_DISPLAY_DELAY_ENABLE is enabled,
and V4L2_CID_MPEG_VIDEO_DEC_DISPLAY_DELAY is set to 0,
driver can display the decoded frame immediately.
Fixes: ffa331d9bf ("media: amphion: decoder implement display delay enable")
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c677d7ae83 ]
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
and mtk_jpeg_enc_device_run may be called to start the
work.
If we remove the module which will call mtk_jpeg_remove
to make cleanup, there may be a unfinished work. The
possible sequence is as follows, which will cause a
typical UAF bug.
Fix it by canceling the work before cleanup in the mtk_jpeg_remove
CPU0 CPU1
|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use
Fixes: b2f0d2724b ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3b4c9d3cb ]
Commit f100ce3bbd ("media: verisilicon: Fix crash when probing
encoder") removed vpu_fmt from hantro_try_fmt(), since it was
initialized from vpu_dst_fmt, which may not be initialized, when TRY_FMT
is called. It was replaced by fmt, which is found using the pixelformat.
For the encoder, this changed the fmt to contain the raw format instead
of the coded format. The format constraints as of fmt->frmsize are only
valid for the coded format and are 0 for the raw formats. Therefore, the
size of a encoder OUTPUT device is constrained to 0 and the
v4l2-compliance tests for G_FMT, TRY_FMT, and SET_FMT fail.
Bring back vpu_fmt to use the coded format on an encoder OUTPUT device,
but initialize it using the currently set pixelformat on dst_fmt, which
is the coded format on an encoder.
Fixes: f100ce3bbd ("media: verisilicon: Fix crash when probing encoder")
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 73e3f09292 ]
according to v4l2 stateful decoder document 4.5.1.3. State Machine,
the state should change from seek to initialization
if call VIDIOC_REQBUFS(OUTPUT, 0).
so reinit the vpu decoder if reqbufs output 0
Fixes: 6de8d628df ("media: amphion: add v4l2 m2m vpu decoder stateful driver")
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bad5b6e34f ]
LOOPBACK and NONE (tunnel) devices have all-zero MAC addresses.
Currently, siw_device_create() falls back to copying the IB device's
name in those cases, because an all-zero MAC address breaks the RDMA
core address resolution mechanism.
However, at the point when siw_device_create() constructs a GID, the
ib_device::name field is uninitialized, leaving the MAC address to
remain in an all-zero state.
Fabricate a random artificial GID for such devices, and ensure this
artificial GID is returned for all device query operations.
Link: https://lore.kernel.org/r/168960673260.3007.12378736853793339110.stgit@manet.1015granger.net
Reported-by: Tom Talpey <tom@talpey.com>
Fixes: a2d36b02c1 ("RDMA/siw: Enable siw on tunnel devices")
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96002c0ac8 ]
If cx24120_message_send() returns error, we should keep local struct
unchanged.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 5afc9a25be ("[media] Add support for TechniSat Skystar S2")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea9ef6c2e0 ]
'read' is freed when it is known to be NULL, but not when a read error
occurs.
Revert the logic to avoid a small leak, should a m920x_read() call fail.
Fixes: a2ab06d7c4 ("media: m920x: don't use stack on USB reads")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a1db7b2c55 ]
Variable loopdiv can be assigned 0, then it is used as a denominator,
without checking it for 0.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 713d54a8bd ("[media] DiB7090: add support for the dib7090 based")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil: (bw != NULL) -> bw]
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9c7141f38 ]
The previous commit 4b208f8b56 ("[media] siano: register media controller
earlier")moves siano_media_device_register before smscore_register_device,
and adds corresponding error handling code if smscore_register_device
fails. However, it misses the following error handling code of
smsusb_init_device.
Fix this by moving error handling code at the end of smsusb_init_device
and adding a goto statement in the following error handling parts.
Fixes: 4b208f8b56 ("[media] siano: register media controller earlier")
Signed-off-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6df63b7ebd ]
The physical address to the directory table is currently encoded using
the following bit layout for IOMMU v2.
31:12 - Address bit 31:0
11: 4 - Address bit 39:32
This is also the bit layout used by the vendor kernel.
However, testing has shown that addresses to the directory/page tables
and memory pages are all encoded using the same bit layout.
IOMMU v1:
31:12 - Address bit 31:0
IOMMU v2:
31:12 - Address bit 31:0
11: 8 - Address bit 35:32
7: 4 - Address bit 39:36
Change to use the mk_dtentries ops to encode the directory table address
correctly. The value written to DTE_ADDR may include the valid bit set,
a bit that is ignored and DTE_ADDR reg read it back as 0.
This also update the bit layout comment for the page address and the
number of nybbles that are read back for DTE_ADDR comment.
These changes render the dte_addr_phys and dma_addr_dte ops unused and
is removed.
Fixes: 227014b33f ("iommu: rockchip: Add internal ops to handle variants")
Fixes: c55356c534 ("iommu: rockchip: Add support for iommu v2")
Fixes: c987b65a57 ("iommu/rockchip: Fix physical address decoding")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20230617182540.3091374-2-jonas@kwiboo.se
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 534103bcd5 ]
When unbinding pasid - a race condition exists vs outstanding page faults.
To prevent this, the pasid_state object contains a refcount.
* set to 1 on pasid bind
* incremented on each ppr notification start
* decremented on each ppr notification done
* decremented on pasid unbind
Since refcount_dec assumes that refcount will never reach 0:
the current implementation causes the following to be invoked on
pasid unbind:
REFCOUNT_WARN("decrement hit 0; leaking memory")
Fix this issue by changing refcount_dec to refcount_dec_and_test
to explicitly handle refcount=1.
Fixes: 8bc54824da ("iommu/amd: Convert from atomic_t to refcount_t on pasid_state->count")
Signed-off-by: Daniel Marcovitch <dmarcovitch@nvidia.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230609105146.7773-2-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d7b13edd4c ]
If fwnode_graph_get_remote_endpoint() fails, 'fwnode' is known to be NULL,
so fwnode_handle_put() is a no-op.
Release the reference taken from a previous fwnode_graph_get_port_parent()
call instead.
Also handle fwnode_graph_get_port_parent() failures.
In order to fix these issues, add an error handling path to the function
and the needed gotos.
Fixes: ca50c197bd ("[media] v4l: fwnode: Support generic fwnode for parsing standardised properties")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f126ff7e40 ]
The supported ad5820 and ad5821 VCMs both use a single 16 bit register
which is written by sending 2 bytes with the data directly after sending
the i2c-client address.
The ad5823 OTOH has a more typical i2c / smbus device setup with multiple
8 bit registers where the first byte send after the i2c-client address is
the register address and the actual data only starts from the second byte
after the i2c-client address.
The ad5823 i2c_ and of_device_id-s was added at the same time as
the ad5821 ids with as rationale:
"""
Some camera modules also refer that AD5823 is a replacement of AD5820:
https://download.kamami.com/p564094-OV8865_DS.pdf
"""
The AD5823 may be an electrical and functional replacement of the AD5820,
but from a software pov it is not compatible at all and it is going to
need its own driver, drop its id from the ad5820 driver.
Fixes: b8bf73136b ("media: ad5820: Add support for ad5821 and ad5823")
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Ricardo Ribalda Delgado <ribalda@kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a828002f38 ]
OV5640 will output abnormal image data when work at low resolution
(320x240, 176x144 and 160x120) after switching from high resolution,
such as 1080P, the time interval between high and low switching must
be less than 1000ms in order to OV5640 don't enter suspend state during
the time.
The reason is by 0x3824 value don't restore to initialize value when
do resolution switching. In high resolution setting array, 0x3824 is
set to 0x04, but low resolution setting array remove 0x3824 in commit
db15c1957a ("media: ov5640: Remove duplicated mode settings"). So
when do resolution switching from high to low, such as 1080P to 320x240,
and the time interval is less than auto suspend delay time which means
global initialize setting array will not be loaded, the output image
data are abnormal. Hence move 0x3824 from ov5640_init_setting[] table
to ov5640_setting_low_res[] table and also move 0x4407 0x460b, 0x460c
to avoid same issue.
Fixes: db15c1957a ("media: ov5640: Remove duplicated mode settings")
Signed-off-by: Guoniu.zhou <guoniu.zhou@nxp.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c8c926200c ]
Since commit f28e22441f ("cgroup/cpuset: Add a new isolated
cpus.partition type"), the CS_SCHED_LOAD_BALANCE bit of a v2 cpuset
can be on or off. The child cpusets of a partition root must have the
same setting as its parent or it may screw up the rebuilding of sched
domains. Fix this problem by making sure the a child v2 cpuset will
follows its parent cpuset load balance state unless the child cpuset
is a new partition root itself.
Fixes: f28e22441f ("cgroup/cpuset: Add a new isolated cpus.partition type")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4a123d2e8 ]
The comma at the end of the line was leftover from an earlier refactor
of the _nfs4_pnfs_v3_ds_connect() function. This is technically valid C,
so the compilers didn't catch it, but if I'm understanding how it works
correctly it assigns the return value of rpc_clnt_add_xprtr() to
xprtdata.cred.
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: a12f996d34 ("NFSv4/pNFS: Use connections to a DS that are all of the same protocol family")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5690eed941 ]
If the client sent a synchronous copy and the server replied with
ERR_OFFLOAD_NO_REQ indicating that it wants an asynchronous
copy instead, the client should retry with asynchronous copy.
Fixes: 539f57b3e0 ("NFS handle COPY ERR_OFFLOAD_NO_REQS")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f67b55b658 ]
Commit 64cfca85ba asserts the only valid return values for
nfs2/3_decode_dirent should not include -ENAMETOOLONG, but for a server
that sends a filename3 which exceeds MAXNAMELEN in a READDIR response the
client's behavior will be to endlessly retry the operation.
We could map -ENAMETOOLONG into -EBADCOOKIE, but that would produce
truncated listings without any error. The client should return an error
for this case to clearly assert that the server implementation must be
corrected.
Fixes: 64cfca85ba ("NFS: Return valid errors from nfs2/3_decode_dirent()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6372e2ee62 ]
The XDR specification in RFC 8881 looks like this:
struct device_addr4 {
layouttype4 da_layout_type;
opaque da_addr_body<>;
};
struct GETDEVICEINFO4resok {
device_addr4 gdir_device_addr;
bitmap4 gdir_notification;
};
union GETDEVICEINFO4res switch (nfsstat4 gdir_status) {
case NFS4_OK:
GETDEVICEINFO4resok gdir_resok4;
case NFS4ERR_TOOSMALL:
count4 gdir_mincount;
default:
void;
};
Looking at nfsd4_encode_getdeviceinfo() ....
When the client provides a zero gd_maxcount, then the Linux NFS
server implementation encodes the da_layout_type field and then
skips the da_addr_body field completely, proceeding directly to
encode gdir_notification field.
There does not appear to be an option in the specification to skip
encoding da_addr_body. Moreover, Section 18.40.3 says:
> If the client wants to just update or turn off notifications, it
> MAY send a GETDEVICEINFO operation with gdia_maxcount set to zero.
> In that event, if the device ID is valid, the reply's da_addr_body
> field of the gdir_device_addr field will be of zero length.
Since the layout drivers are responsible for encoding the
da_addr_body field, put this fix inside the ->encode_getdeviceinfo
methods.
Fixes: 9cf514ccfa ("nfsd: implement pNFS operations")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Tom Haynes <loghyr@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de8d38cf44 ]
clang's static analysis warning: fs/lockd/mon.c: line 293, column 2:
Null pointer passed as 2nd argument to memory copy function.
Assuming 'hostname' is NULL and calling 'nsm_create_handle()', this will
pass NULL as 2nd argument to memory copy function 'memcpy()'. So return
NULL if 'hostname' is invalid.
Fixes: 77a3ef33e2 ("NSM: More clean up of nsm_get_handle()")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1524773425 ]
Running generic/475(filesystem consistent tests after power cut) could
easily trigger unattached inode error while doing fsck:
Unattached zero-length inode 39405. Clear? no
Unattached inode 39405
Connect to /lost+found? no
Above inconsistence is caused by following process:
P1 P2
ext4_create
inode = ext4_new_inode_start_handle // itable records nlink=1
ext4_add_nondir
err = ext4_add_entry // ENOSPC
ext4_append
ext4_bread
ext4_getblk
ext4_map_blocks // returns ENOSPC
drop_nlink(inode) // won't be updated into disk inode
ext4_orphan_add(handle, inode)
ext4_orphan_file_add
ext4_journal_stop(handle)
jbd2_journal_commit_transaction // commit success
>> power cut <<
ext4_fill_super
ext4_load_and_init_journal // itable records nlink=1
ext4_orphan_cleanup
ext4_process_orphan
if (inode->i_nlink) // true, inode won't be deleted
Then, allocated inode will be reserved on disk and corresponds to no
dentries, so e2fsck reports 'unattached inode' problem.
The problem won't happen if orphan file feature is disabled, because
ext4_orphan_add() will update disk inode in orphan list mode. There
are several places not updating disk inode while putting inode into
orphan area, such as ext4_add_nondir(), ext4_symlink() and whiteout
in ext4_rename(). Fix it by updating inode into disk in all error
branches of these places.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217605
Fixes: 02f310fcf4 ("ext4: Speedup ext4 orphan inode handling")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230628132011.650383-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c37b6908f7 ]
fail_iommu_setup() registers the fail_iommu_bus_notifier struct to both
PCI and VIO buses. struct notifier_block is a linked list node, so this
causes any notifiers later registered to either bus type to also be
registered to the other since they share the same node.
This causes issues in (at least) the vgaarb code, which registers a
notifier for PCI buses. pci_notify() ends up being called on a vio
device, converted with to_pci_dev() even though it's not a PCI device,
and finally makes a bad access in vga_arbiter_add_pci_device() as
discovered with KASAN:
BUG: KASAN: slab-out-of-bounds in vga_arbiter_add_pci_device+0x60/0xe00
Read of size 4 at addr c000000264c26fdc by task swapper/0/1
Call Trace:
dump_stack_lvl+0x1bc/0x2b8 (unreliable)
print_report+0x3f4/0xc60
kasan_report+0x244/0x698
__asan_load4+0xe8/0x250
vga_arbiter_add_pci_device+0x60/0xe00
pci_notify+0x88/0x444
notifier_call_chain+0x104/0x320
blocking_notifier_call_chain+0xa0/0x140
device_add+0xac8/0x1d30
device_register+0x58/0x80
vio_register_device_node+0x9ac/0xce0
vio_bus_scan_register_devices+0xc4/0x13c
__machine_initcall_pseries_vio_device_init+0x94/0xf0
do_one_initcall+0x12c/0xaa8
kernel_init_freeable+0xa48/0xba8
kernel_init+0x64/0x400
ret_from_kernel_thread+0x5c/0x64
Fix this by creating separate notifier_block structs for each bus type.
Fixes: d6b9a81b2a ("powerpc: IOMMU fault injection")
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Add #ifdef to fix CONFIG_IBMVIO=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230322035322.328709-1-ruscur@russell.cc
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9bbbf4979 ]
In mpc5xxx_fwnode_get_bus_frequency(), we should add
fwnode_handle_put() when break out of the iteration
fwnode_for_each_parent_node() as it will automatically
increase and decrease the refcounter.
Fixes: de06fba62a ("powerpc/mpc5xxx: Switch mpc5xxx_get_bus_frequency() to use fwnode")
Signed-off-by: Liang He <windhl@126.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230322030423.1855440-1-windhl@126.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 08b45fcb2d ]
This allocation should use the passed in GFP_ flags instead of
GFP_KERNEL. One places where this matters is in filelayout_pg_init_write()
which uses GFP_NOFS as the allocation flags.
Fixes: 5c83746a0c ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eac030b22e ]
lppaca_shared_proc() takes a pointer to the lppaca which is typically
accessed through get_lppaca(). With DEBUG_PREEMPT enabled, this leads
to checking if preemption is enabled, for example:
BUG: using smp_processor_id() in preemptible [00000000] code: grep/10693
caller is lparcfg_data+0x408/0x19a0
CPU: 4 PID: 10693 Comm: grep Not tainted 6.5.0-rc3 #2
Call Trace:
dump_stack_lvl+0x154/0x200 (unreliable)
check_preemption_disabled+0x214/0x220
lparcfg_data+0x408/0x19a0
...
This isn't actually a problem however, as it does not matter which
lppaca is accessed, the shared proc state will be the same.
vcpudispatch_stats_procfs_init() already works around this by disabling
preemption, but the lparcfg code does not, erroring any time
/proc/powerpc/lparcfg is accessed with DEBUG_PREEMPT enabled.
Instead of disabling preemption on the caller side, rework
lppaca_shared_proc() to not take a pointer and instead directly access
the lppaca, bypassing any potential preemption checks.
Fixes: f13c13a005 ("powerpc: Stop using non-architected shared_proc field in lppaca")
Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Rework to avoid needing a definition in paca.h and lppaca.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055317.751786-4-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1aa0006676 ]
By adding a forward declaration for struct lppaca we can untangle paca.h
and lppaca.h. Also move get_lppaca() into lppaca.h for consistency.
Add includes of lppaca.h to some files that need it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055317.751786-3-mpe@ellerman.id.au
Stable-dep-of: eac030b22e ("powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 303a780520 ]
I found that the read code might send multiple requests using the same
nfs_pgio_header, but nfs4_proc_read_setup() is only called once. This is
how we ended up occasionally double-freeing the scratch buffer, but also
means we set a NULL pointer but non-zero length to the xdr scratch
buffer. This results in an oops the first time decoding needs to copy
something to scratch, which frequently happens when decoding READ_PLUS
hole segments.
I fix this by moving scratch handling into the pageio read code. I
provide a function to allocate scratch space for decoding read replies,
and free the scratch buffer when the nfs_pgio_header is freed.
Fixes: fbd2a05f29 (NFSv4.2: Rework scratch handling for READ_PLUS)
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d18f6c5bb ]
I bump the decode_read_plus_maxsz to account for hole segments, but I
need to subtract out this increase when calling
rpc_prepare_reply_pages() so the common case of single data segment
replies can be directly placed into the xdr pages without needing to be
shifted around.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: d3b00a802c ("NFS: Replace the READ_PLUS decoding code")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb05a617f0 ]
Smatch reports:
fs/nfs/nfs42xdr.c:1131 decode_read_plus() warn: missing error code? 'status'
Which Dan suggests to fix by doing a hardcoded "return 0" from the
"if (segments == 0)" check.
Additionally, smatch reports that the "status = -EIO" assignment is not
used. This patch addresses both these issues.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202305222209.6l5VM2lL-lkp@intel.com/
Fixes: d3b00a802c ("NFS: Replace the READ_PLUS decoding code")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7189576e8a ]
Don't assume that only the driver would be accessing LNKCTL. ASPM policy
changes can trigger write to LNKCTL outside of driver's control. And in
the case of upstream bridge, the driver does not even own the device it's
changing the registers for.
Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Fixes: 8a7cd27679 ("drm/radeon/cik: add support for pcie gen1/2/3 switching")
Fixes: b9d305dfb6 ("drm/radeon: implement pcie gen2/3 support for SI")
Link: https://lore.kernel.org/r/20230717120503.15276-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34daf445f8 ]
CC arch/powerpc/perf/core-fsl-emb.o
arch/powerpc/perf/core-fsl-emb.c:675:6: error: no previous prototype for 'hw_perf_event_setup' [-Werror=missing-prototypes]
675 | void hw_perf_event_setup(int cpu)
| ^~~~~~~~~~~~~~~~~~~
Looks like fsl_emb was completely missed by commit 3f6da39053 ("perf:
Rework and fix the arch CPU-hotplug hooks")
So, apply same changes as commit 3f6da39053 ("perf: Rework and fix
the arch CPU-hotplug hooks") then commit 57ecde42cc ("powerpc/perf:
Convert book3s notifier to state machine callbacks")
While at it, also fix following error:
arch/powerpc/perf/core-fsl-emb.c: In function 'perf_event_interrupt':
arch/powerpc/perf/core-fsl-emb.c:648:13: error: variable 'found' set but not used [-Werror=unused-but-set-variable]
648 | int found = 0;
| ^~~~~
Fixes: 3f6da39053 ("perf: Rework and fix the arch CPU-hotplug hooks")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/603e1facb32608f88f40b7d7b9094adc50e7b2dc.1692349125.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1eb75e0df ]
In case fadump_reserve_mem() fails to reserve memory, the
reserve_dump_area_size variable will retain the reserve area size. This
will lead to /sys/kernel/fadump/mem_reserved node displaying an incorrect
memory reserved by fadump.
To fix this problem, reserve dump area size variable is set to 0 if fadump
failed to reserve memory.
Fixes: 8255da95e5 ("powerpc/fadump: release all the memory above boot memory size")
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230704050715.203581-1-sourabhjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cd24e2a60a ]
Fix an information leak where an uninitialized hole in struct
vfio_iommu_type1_info_cap_migration on the stack is exposed to userspace.
The definition of struct vfio_iommu_type1_info_cap_migration contains a hole as
shown in this pahole(1) output:
struct vfio_iommu_type1_info_cap_migration {
struct vfio_info_cap_header header; /* 0 8 */
__u32 flags; /* 8 4 */
/* XXX 4 bytes hole, try to pack */
__u64 pgsize_bitmap; /* 16 8 */
__u64 max_dirty_bitmap_size; /* 24 8 */
/* size: 32, cachelines: 1, members: 4 */
/* sum members: 28, holes: 1, sum holes: 4 */
/* last cacheline: 32 bytes */
};
The cap_mig variable is filled in without initializing the hole:
static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
struct vfio_info_cap *caps)
{
struct vfio_iommu_type1_info_cap_migration cap_mig;
cap_mig.header.id = VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION;
cap_mig.header.version = 1;
cap_mig.flags = 0;
/* support minimum pgsize */
cap_mig.pgsize_bitmap = (size_t)1 << __ffs(iommu->pgsize_bitmap);
cap_mig.max_dirty_bitmap_size = DIRTY_BITMAP_SIZE_MAX;
return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
}
The structure is then copied to a temporary location on the heap. At this point
it's already too late and ioctl(VFIO_IOMMU_GET_INFO) copies it to userspace
later:
int vfio_info_add_capability(struct vfio_info_cap *caps,
struct vfio_info_cap_header *cap, size_t size)
{
struct vfio_info_cap_header *header;
header = vfio_info_cap_add(caps, size, cap->id, cap->version);
if (IS_ERR(header))
return PTR_ERR(header);
memcpy(header + 1, cap + 1, size - sizeof(*header));
return 0;
}
This issue was found by code inspection.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Fixes: ad721705d0 ("vfio iommu: Add migration capability to report supported features")
Link: https://lore.kernel.org/r/20230801155352.1391945-1-stefanha@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a9dd8f292 ]
With skiboot_defconfig, Clang reports:
CC arch/powerpc/mm/book3s64/radix_tlb.o
arch/powerpc/mm/book3s64/radix_tlb.c:419:20: error: unused function '_tlbie_pid_lpid' [-Werror,-Wunused-function]
static inline void _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
^
arch/powerpc/mm/book3s64/radix_tlb.c:663:20: error: unused function '_tlbie_va_range_lpid' [-Werror,-Wunused-function]
static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long end,
^
This is because those functions are only called from functions
enclosed in a #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
Move below functions inside that #ifdef
* __tlbie_pid_lpid(unsigned long pid,
* __tlbie_va_lpid(unsigned long va, unsigned long pid,
* fixup_tlbie_pid_lpid(unsigned long pid, unsigned long lpid)
* _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
* fixup_tlbie_va_range_lpid(unsigned long va,
* __tlbie_va_range_lpid(unsigned long start, unsigned long end,
* _tlbie_va_range_lpid(unsigned long start, unsigned long end,
Fixes: f0c6fbbb90 ("KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307260802.Mjr99P5O-lkp@intel.com/
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/3d72efd39f986ee939d068af69fdce28bd600766.1691568093.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4dd432d985 ]
Reconfiguring the clock divider to the exact same value is observed
on an i.MX8MN to often cause a longer than usual clock pause, probably
because the divider restarts counting whenever the register is rewritten.
This issue doesn't show up normally, because the clock framework will
take care to not call set_rate when the clock rate is the same.
However, when we reconfigure an upstream clock, the common code will
call set_rate with the newly calculated rate on all children, e.g.:
- sai5 is running normally and divides Audio PLL out by 16.
- Audio PLL rate is increased by 32Hz (glitch-free kdiv change)
- rates for children are recalculated and rates are set recursively
- imx8m_clk_composite_divider_set_rate(sai5) is called with
32/16 = 2Hz more
- imx8m_clk_composite_divider_set_rate computes same divider as before
- divider register is written, so it restarts counting from zero and
MCLK is briefly paused, so instead of e.g. 40ns, MCLK is low for 120ns.
Some external clock consumers can be upset by such unexpected clock pauses,
so let's make sure we only rewrite the divider value when the value to be
written is actually different.
Fixes: d3ff972813 ("clk: imx: Add imx composite clock")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20230807082201.2332746-1-a.fatoum@pengutronix.de
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e09060b3b6 ]
Don't assume that the device is fully under the control of ASPM and use RMW
capability accessors which do proper locking to avoid losing concurrent
updates to the register values.
If configuration fails in pcie_aspm_configure_common_clock(), the
function attempts to restore the old PCI_EXP_LNKCTL_CCC settings. Store
only the old PCI_EXP_LNKCTL_CCC bit for the relevant devices rather
than the content of the whole LNKCTL registers. It aligns better with
how pcie_lnkctl_clear_and_set() expects its parameter and makes the
code more obvious to understand.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Fixes: 2a42d9dba7 ("PCIe: ASPM: Break out of endless loop waiting for PCI config bits to switch")
Fixes: 7d715a6c1a ("PCI: add PCI Express ASPM support")
Link: https://lore.kernel.org/r/20230717120503.15276-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e70d0acf0 ]
Many places in the kernel write the Link Control and Root Control PCI
Express Capability Registers without proper concurrency control and this
could result in losing the changes one of the writers intended to make.
Add pcie_cap_lock spinlock into the struct pci_dev and use it to protect
bit changes made in the RMW capability accessors. Protect only a selected
set of registers by differentiating the RMW accessor internally to
locked/unlocked variants using a wrapper which has the same signature as
pcie_capability_clear_and_set_word(). As the Capability Register (pos)
given to the wrapper is always a constant, the compiler should be able to
simplify all the dead-code away.
So far only the Link Control Register (ASPM, hotplug, link retraining,
various drivers) and the Root Control Register (AER & PME) seem to
require RMW locking.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Fixes: c7f486567c ("PCI PM: PCIe PME root port service driver")
Fixes: f12eb72a26 ("PCI/ASPM: Use PCI Express Capability accessors")
Fixes: 7d715a6c1a ("PCI: add PCI Express ASPM support")
Fixes: affa48de84 ("staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM")
Fixes: 849a9366cb ("misc: rtsx: Add support new chip rts5228 mmc: rtsx: Add support MMC_CAP2_NO_MMC")
Fixes: 3d1e7aa80d ("misc: rtsx: Use pcie_capability_clear_and_set_word() for PCI_EXP_LNKCTL")
Fixes: c0e5f4e73a ("misc: rtsx: Add support for RTS5261")
Fixes: 3df4fce739 ("misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG")
Fixes: 121e9c6b5c ("misc: rtsx: modify and fix init_hw function")
Fixes: 19f3bd548f ("mfd: rtsx: Remove LCTLR defination")
Fixes: 773ccdfd9c ("mfd: rtsx: Read vendor setting from config space")
Fixes: 8275b77a15 ("mfd: rts5249: Add support for RTS5250S power saving")
Fixes: 5da4e04ae4 ("misc: rtsx: Add support for RTS5260")
Fixes: 0f49bfbd0f ("tg3: Use PCI Express Capability accessors")
Fixes: 5e7dfd0fb9 ("tg3: Prevent corruption at 10 / 100Mbps w CLKREQ")
Fixes: b726e493e8 ("r8169: sync existing 8168 device hardware start sequences with vendor driver")
Fixes: e6de30d63e ("r8169: more 8168dp support.")
Fixes: 8a06127602 ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards")
Fixes: 6f461f6c7c ("e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata")
Fixes: 1eae4eb2a1 ("e1000e: Disable L1 ASPM power savings for 82573 mobile variants")
Fixes: 8060e169e0 ("ath9k: Enable extended synch for AR9485 to fix L0s recovery issue")
Fixes: 69ce674bfa ("ath9k: do btcoex ASPM disabling at initialization time")
Fixes: f37f055035 ("mt76: mt76x2e: disable pcie_aspm by default")
Link: https://lore.kernel.org/r/20230717120503.15276-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e0f4f2918 ]
kvm_vfio_group_add() creates kvg instance, links it to kv->group_list,
and calls kvm_vfio_file_set_kvm() with kvg->file as an argument after
dropping kv->lock. If we race group addition and deletion calls, kvg
instance may get freed by the time we get around to calling
kvm_vfio_file_set_kvm().
Previous iterations of the code did not reference kvg->file outside of
the critical section, but used a temporary variable. Still, they had
similar problem of the file reference being owned by kvg structure and
potential for kvm_vfio_group_del() dropping it before
kvm_vfio_group_add() had a chance to complete.
Fix this by moving call to kvm_vfio_file_set_kvm() under the protection
of kv->lock. We already call it while holding the same lock when vfio
group is being deleted, so it should be safe here as well.
Fixes: 2fc1bec158 ("kvm: set/clear kvm to/from vfio_group when group add/delete")
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230714224538.404793-1-dmitry.torokhov@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60c672b7f2 ]
ngroups is ext4_group_t (unsigned int) while next_linear_group treat it
in int. If ngroups is bigger than max number described by int, it will
be treat as a negative number. Then "return group + 1 >= ngroups ? 0 :
group + 1;" may keep returning 0.
Switch int to ext4_group_t in next_linear_group to fix the overflow.
Fixes: 196e402adf ("ext4: improve cr 0 / cr 1 group scanning")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce53ad81ed ]
Current igen6_edac checks for pending errors before the registration
of the error handler. However, there is a possibility that the error
occurs during the registration process, leading to unhandled pending
errors and no future error events. This issue can be reproduced by
repeatedly injecting errors during the loading of the igen6_edac.
Fix this issue by moving the pending error handler after the registration
of the error handler, ensuring that no pending errors are left unhandled.
Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: Ee Wey Lim <ee.wey.lim@intel.com>
Tested-by: Ee Wey Lim <ee.wey.lim@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230725080427.23883-1-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3a3a097ea ]
The following debug object splat was observed in testing:
ODEBUG: free active (active state 0) object: 0000000097d23782 object type: work_struct hint: doe_statemachine_work+0x0/0x510
WARNING: CPU: 1 PID: 71 at lib/debugobjects.c:514 debug_print_object+0x7d/0xb0
...
Workqueue: pci 0000:36:00.0 DOE [1 doe_statemachine_work
RIP: 0010:debug_print_object+0x7d/0xb0
...
Call Trace:
? debug_print_object+0x7d/0xb0
? __pfx_doe_statemachine_work+0x10/0x10
debug_object_free.part.0+0x11b/0x150
doe_statemachine_work+0x45e/0x510
process_one_work+0x1d4/0x3c0
This occurs because destroy_work_on_stack() was called after signaling
the completion in the calling thread. This creates a race between
destroy_work_on_stack() and the task->work struct going out of scope in
pci_doe().
Signal the work complete after destroying the work struct. This is safe
because signal_task_complete() is the final thing the work item does and
the workqueue code is careful not to access the work struct after.
Fixes: abf04be0e7 ("PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y")
Link: https://lore.kernel.org/r/20230726-doe-fix-v1-1-af07e614d4dd@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31422dff18 ]
Due to the auto_domains mechanism the ioas->mutex must be held until
the hwpt is completely setup by iommufd_object_abort_and_destroy() or
iommufd_object_finalize().
This prevents a concurrent iommufd_device_auto_get_domain() from seeing
an incompletely initialized object through the ioas->hwpt_list.
To make this more consistent move the unlock until after finalize.
Fixes: e8d5721003 ("iommufd: Add kAPI toward external drivers for physical devices")
Link: https://lore.kernel.org/r/11-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9cbc06049 ]
Currently, as part of the qcom_pcie_perst_deassert() function, instead
of writing the updated value to clear PARF_MSTR_AXI_CLK_EN, the variable
"val" is re-read.
This must be fixed to ensure that the master clock supplied to the MHI
bus is correctly gated during L1.1/L1.2 to save power.
Thus, replace the line that re-reads "val" with a line that writes the
updated value to the register to clear PARF_MSTR_AXI_CLK_EN.
[kwilczynski: commit log]
Fixes: c457ac029e ("PCI: qcom-ep: Gate Master AXI clock to MHI bus during L1SS")
Link: https://lore.kernel.org/linux-pci/20230627141036.11600-1-manivannan.sadhasivam@linaro.org
Reported-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8650c0c2a ]
The apple_pcie_setup_port() function computes ilog2(pcie->nvecs) to set
up the number of MSIs available for each port. However, it's called
before apple_msi_init(), which initializes pcie->nvecs.
Luckily, pcie->nvecs is part of kzalloc()-ed structure and, as such,
initialized as zero. ilog2(0) happens to be 0xffffffff which then simply
configures more MSIs in hardware than we have. This doesn't break
anything because we never hand out those vectors.
Thus, swap the order of the two calls so that the correctly initialized
value is then used.
[kwilczynski: commit log]
Link: https://lore.kernel.org/linux-pci/20230311133453.63246-1-sven@svenpeter.dev
Fixes: 476c41ed45 ("PCI: apple: Implement MSI support")
Signed-off-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8d72e32e1 ]
The adapter scan ssif_info_find() sets info->adapter_name if the adapter
info came from SMBIOS, as it's not set in that case. However, this
function can be called more than once, and it will leak the adapter name
if it had already been set. So check for NULL before setting it.
Fixes: c4436c9149 ("ipmi_ssif: avoid registering duplicate ssif interface")
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67de40c9df ]
Before committing 79597c8bf6, *rac97 always be NULL if there is
an error. When error happens, make sure *rac97 is NULL is safer.
For examble, in snd_vortex_mixer():
err = snd_ac97_mixer(pbus, &ac97, &vortex->codec);
vortex->isquad = ((vortex->codec == NULL) ?
0 : (vortex->codec->ext_id&0x80));
If error happened but vortex->codec isn't NULL, this may cause some
problems.
Move the judgement order to be clearer and better.
Fixes: 79597c8bf6 ("ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer")
Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20230823025212.1000961-1-suhui@nfschina.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9515ff4fb ]
When of_overlay_fdt_apply() fails, the changeset may be partially
applied, and the caller is still expected to call of_overlay_remove() to
clean up this partial state.
However, of_overlay_apply() calls of_resolve_phandles() before
init_overlay_changeset(). Hence if the overlay fails to apply due to an
unresolved symbol, the overlay_changeset.cset.entries list is still
uninitialized, and cleanup will crash with a NULL-pointer dereference in
overlay_removal_is_ok().
Fix this by moving the call to of_changeset_init() from
init_overlay_changeset() to of_overlay_fdt_apply(), where all other
early initialization is done.
Fixes: f948d6d8b7 ("of: overlay: avoid race condition between applying multiple overlays")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/4f1d6d74b61cba2599026adb6d1948ae559ce91f.1690533838.git.geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38592ae6dc ]
DSP_SW_INTR_STAT_OFFSET is a common interrupt register which will be
accessed by both ACP firmware and driver. This register contains register
bits corresponds to host to dsp interrupts and vice versa.
when dsp to host interrupt is reported, only clear dsp to host
interrupt bit in DSP_SW_INTR_STAT_OFFSET.
Fixes: 2e7c6652f9 ("ASoC: SOF: amd: Fix for handling spurious interrupts from DSP")
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Link: https://lore.kernel.org/r/20230823073340.2829821-7-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc22b5407e ]
When a bio is split by md raid0, the newly created bio will not be tracked
by md for I/O accounting. Only the portion of I/O still assigned to the
original bio which was reduced by the split will be accounted for. This
results in md iostat data sometimes showing I/O values far below the actual
amount of data being sent through md.
md_account_bio() needs to be called for all bio generated by the bio split.
A simple example of the issue was generated using a raid0 device on partitions
to the same device. Since all raid0 I/O then goes to one device, it makes it
easy to see a gap between the md device and its sd storage. Reading an lvm
device on top of the md device, the iostat output (some 0 columns and extra
devices removed to make the data more compact) was:
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read
md2 0.00 0.00 0.00 0.00 0
sde 0.00 0.00 0.00 0.00 0
md2 1364.00 411496.00 0.00 0.00 411496
sde 1734.00 646144.00 0.00 0.00 646144
md2 1699.00 510680.00 0.00 0.00 510680
sde 2155.00 802784.00 0.00 0.00 802784
md2 803.00 241480.00 0.00 0.00 241480
sde 1016.00 377888.00 0.00 0.00 377888
md2 0.00 0.00 0.00 0.00 0
sde 0.00 0.00 0.00 0.00 0
I/O was generated doing large direct I/O reads (12M) with dd to a linear
lvm volume on top of the 4 leg raid0 device.
The md2 reads were showing as roughly 2/3 of the reads to the sde device
containing all of md2's raid partitions. The sum of reads to sde was
1826816 kB, which was the expected amount as it was the amount read by
dd. With the patch, the total reads from md will match the reads from
sde and be consistent with the amount of I/O generated.
Fixes: 10764815ff ("md: add io accounting for raid0 and raid5")
Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230816181433.13289-1-djeffery@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 319ff40a54 ]
Commit f00d7c85be ("md/raid0: fix up bio splitting.") among other
things changed how bio that needs to be split is submitted. Before this
commit, we have split the bio, mapped and submitted each part. After
this commit, we map only the first part of the split bio and submit the
second part unmapped. Due to bio sorting in __submit_bio_noacct() this
results in the following request ordering:
9,0 18 1181 0.525037895 15995 Q WS 1479315464 + 63392
Split off chunk-sized (1024 sectors) request:
9,0 18 1182 0.629019647 15995 X WS 1479315464 / 1479316488
Request is unaligned to the chunk so it's split in
raid0_make_request(). This is the first part mapped and punted to
bio_list:
8,0 18 7053 0.629020455 15995 A WS 739921928 + 1016 <- (9,0) 1479315464
Now raid0_make_request() returns, second part is postponed on
bio_list. __submit_bio_noacct() resorts the bio_list, mapped request
is submitted to the underlying device:
8,0 18 7054 0.629022782 15995 G WS 739921928 + 1016
Now we take another request from the bio_list which is the remainder
of the original huge request. Split off another chunk-sized bit from
it and the situation repeats:
9,0 18 1183 0.629024499 15995 X WS 1479316488 / 1479317512
8,16 18 6998 0.629025110 15995 A WS 739921928 + 1016 <- (9,0) 1479316488
8,16 18 6999 0.629026728 15995 G WS 739921928 + 1016
...
9,0 18 1184 0.629032940 15995 X WS 1479317512 / 1479318536 [libnetacq-write]
8,0 18 7059 0.629033294 15995 A WS 739922952 + 1016 <- (9,0) 1479317512
8,0 18 7060 0.629033902 15995 G WS 739922952 + 1016
...
This repeats until we consume the whole original huge request. Now we
finally get to processing the second parts of the split off requests
(in reverse order):
8,16 18 7181 0.629161384 15995 A WS 739952640 + 8 <- (9,0) 1479377920
8,0 18 7239 0.629162140 15995 A WS 739952640 + 8 <- (9,0) 1479376896
8,16 18 7186 0.629163881 15995 A WS 739951616 + 8 <- (9,0) 1479375872
8,0 18 7242 0.629164421 15995 A WS 739951616 + 8 <- (9,0) 1479374848
...
I guess it is obvious that this IO pattern is extremely inefficient way
to perform sequential IO. It also makes bio_list to grow to rather long
lengths.
Change raid0_make_request() to map both parts of the split bio. Since we
know we are provided with at most chunk-sized bios, we will always need
to split the incoming bio at most once.
Fixes: f00d7c85be ("md/raid0: fix up bio splitting.")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230814092720.3931-2-jack@suse.cz
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec14a87ee1 ]
blk-iocost sometimes causes the following crash:
BUG: kernel NULL pointer dereference, address: 00000000000000e0
...
RIP: 0010:_raw_spin_lock+0x17/0x30
Code: be 01 02 00 00 e8 79 38 39 ff 31 d2 89 d0 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 65 ff 05 48 d0 34 7e b9 01 00 00 00 31 c0 <f0> 0f b1 0f 75 02 5d c3 89 c6 e8 ea 04 00 00 5d c3 0f 1f 84 00 00
RSP: 0018:ffffc900023b3d40 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 00000000000000e0 RCX: 0000000000000001
RDX: ffffc900023b3d20 RSI: ffffc900023b3cf0 RDI: 00000000000000e0
RBP: ffffc900023b3d40 R08: ffffc900023b3c10 R09: 0000000000000003
R10: 0000000000000064 R11: 000000000000000a R12: ffff888102337000
R13: fffffffffffffff2 R14: ffff88810af408c8 R15: ffff8881070c3600
FS: 00007faaaf364fc0(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000e0 CR3: 00000001097b1000 CR4: 0000000000350ea0
Call Trace:
<TASK>
ioc_weight_write+0x13d/0x410
cgroup_file_write+0x7a/0x130
kernfs_fop_write_iter+0xf5/0x170
vfs_write+0x298/0x370
ksys_write+0x5f/0xb0
__x64_sys_write+0x1b/0x20
do_syscall_64+0x3d/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
This happens because iocg->ioc is NULL. The field is initialized by
ioc_pd_init() and never cleared. The NULL deref is caused by
blkcg_activate_policy() installing blkg_policy_data before initializing it.
blkcg_activate_policy() was doing the following:
1. Allocate pd's for all existing blkg's and install them in blkg->pd[].
2. Initialize all pd's.
3. Online all pd's.
blkcg_activate_policy() only grabs the queue_lock and may release and
re-acquire the lock as allocation may need to sleep. ioc_weight_write()
grabs blkcg->lock and iterates all its blkg's. The two can race and if
ioc_weight_write() runs during #1 or between #1 and #2, it can encounter a
pd which is not initialized yet, leading to crash.
The crash can be reproduced with the following script:
#!/bin/bash
echo +io > /sys/fs/cgroup/cgroup.subtree_control
systemd-run --unit touch-sda --scope dd if=/dev/sda of=/dev/null bs=1M count=1 iflag=direct
echo 100 > /sys/fs/cgroup/system.slice/io.weight
bash -c "echo '8:0 enable=1' > /sys/fs/cgroup/io.cost.qos" &
sleep .2
echo 100 > /sys/fs/cgroup/system.slice/io.weight
with the following patch applied:
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index fc49be622e05..38d671d5e10c 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -1553,6 +1553,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
> pd->online = false;
> }
>
> + if (system_state == SYSTEM_RUNNING) {
> + spin_unlock_irq(&q->queue_lock);
> + ssleep(1);
> + spin_lock_irq(&q->queue_lock);
> + }
> +
> /* all allocated, init in the same order */
> if (pol->pd_init_fn)
> list_for_each_entry_reverse(blkg, &q->blkg_list, q_node)
I don't see a reason why all pd's should be allocated, initialized and
onlined together. The only ordering requirement is that parent blkgs to be
initialized and onlined before children, which is guaranteed from the
walking order. Let's fix the bug by allocating, initializing and onlining pd
for each blkg and holding blkcg->lock over initialization and onlining. This
ensures that an installed blkg is always fully initialized and onlined
removing the the race window.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Breno Leitao <leitao@debian.org>
Fixes: 9d179b8654 ("blkcg: Fix multiple bugs in blkcg_activate_policy()")
Link: https://lore.kernel.org/r/ZN0p5_W-Q9mAHBVY@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ac1102b22 ]
Before adding a new FW control, its name is checked against
existing controls list. But the string length in strncmp used
to compare controls names is taken from the list, so if beginnings
of the controls are matching, then the new control is not created.
For example, if CAL_R control already exists, CAL_R_SELECTED
is not created.
The fix is to compare string lengths as well.
Fixes: 6477960755 ("ASoC: wm_adsp: Move check for control existence")
Signed-off-by: Vlad Karpovich <vkarpovi@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230815172908.3454056-1-vkarpovi@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d0bd28c50 ]
r5l_flush_stripe_to_raid() will check if the list 'flushing_ios' is
empty, and then submit 'flush_bio', however, r5l_log_flush_endio()
is clearing the list first and then clear the bio, which will cause
null-ptr-deref:
T1: submit flush io
raid5d
handle_active_stripes
r5l_flush_stripe_to_raid
// list is empty
// add 'io_end_ios' to the list
bio_init
submit_bio
// io1
T2: io1 is done
r5l_log_flush_endio
list_splice_tail_init
// clear the list
T3: submit new flush io
...
r5l_flush_stripe_to_raid
// list is empty
// add 'io_end_ios' to the list
bio_init
bio_uninit
// clear bio->bi_blkg
submit_bio
// null-ptr-deref
Fix this problem by clearing bio before clearing the list in
r5l_log_flush_endio().
Fixes: 0dd00cba99 ("raid5-cache: fully initialize flush_bio when needed")
Reported-and-tested-by: Corey Hickey <bugfood-ml@fatooh.org>
Closes: https://lore.kernel.org/all/cddd7213-3dfd-4ab7-a3ac-edd54d74a626@fatooh.org/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a705b11b35 ]
Commit b13015af94 ("md/raid5-cache: Clear conf->log after finishing
work") introduce a new problem:
// caller hold reconfig_mutex
r5l_exit_log
flush_work(&log->disable_writeback_work)
r5c_disable_writeback_async
wait_event
/*
* conf->log is not NULL, and mddev_trylock()
* will fail, wait_event() can never pass.
*/
conf->log = NULL
Fix this problem by setting 'config->log' to NULL before wake_up() as it
used to be, so that wait_event() from r5c_disable_writeback_async() can
exist. In the meantime, move forward md_unregister_thread() so that
null-ptr-deref this commit fixed can still be fixed.
Fixes: b13015af94 ("md/raid5-cache: Clear conf->log after finishing work")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230708091727.1417894-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0a8f16ae4 ]
Right now each MSM8916 device has a huge block of regulator constraints
with allowed voltages for each regulator. For lack of better
documentation these voltages are often copied as-is from the vendor
device tree, without much extra thought.
Unfortunately, the voltages in the vendor device trees are often
misleading or even wrong, e.g. because:
- There is a large voltage range allowed and the actual voltage is
only set somewhere hidden in some messy vendor driver. This is often
the case for pm8916_{l14,l15,l16} because they have a broad range of
1.8-3.3V by default.
- The voltage is actually wrong but thanks to the voltage constraints
in the RPM firmware it still ends up applying the correct voltage.
To have proper regulator constraints it is important to review them in
context of the usage. The current setup in the MSM8916 device trees
makes this quite hard because each device duplicates the standard
voltages for components of the SoC and mixes those with minor
device-specific additions and dummy voltages for completely unused
regulators.
The actual usage of the regulators for the SoC components is in
msm8916-pm8916.dtsi, so it can and should also define the related
voltage constraints. These are not board-specific but defined in the
APQ8016E/PM8916 Device Specification. The board DT can then focus on
describing the actual board-specific regulators, which makes it much
easier to review and spot potential mistakes there.
Note that this commit does not make any functional change. All used
regulators still have the same regulator constraints as before. Unused
regulators do not have regulator constraints anymore because most of
these were too broad or even entirely wrong. They should be added back
with proper voltage constraints when there is an actual usage.
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230510-msm8916-regulators-v1-7-54d4960a05fc@gerhold.net
Stable-dep-of: 4facccb44a ("arm64: dts: qcom: apq8016-sbc: Rename ov5640 enable-gpios to powerdown-gpios")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 355750828c ]
The regulator constraints for most MSM8916 devices (except DB410c) were
originally taken from Qualcomm's msm-3.10 vendor device tree (for lack
of better documentation). Unfortunately it turns out that Qualcomm's
voltages are slightly off as well and do not match the voltage
constraints applied by the RPM firmware.
This means that we sometimes request a specific voltage but the RPM
firmware actually applies a much lower or higher voltage. This is
particularly critical for pm8916_l11 which is used as SD card VMMC
regulator: The SD card can choose a voltage from the current range of
1.8 - 2.95V. If it chooses to run at 1.8V we pretend that this is fine
but the RPM firmware will still silently end up configuring 2.95V.
This can be easily reproduced with a multimeter or by checking the
SPMI hardware registers of the regulator.
Fix this by making the voltages match the actual "specified range" in
the PM8916 Device Specification which is enforced by the RPM firmware.
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230510-msm8916-regulators-v1-3-54d4960a05fc@gerhold.net
Stable-dep-of: 4facccb44a ("arm64: dts: qcom: apq8016-sbc: Rename ov5640 enable-gpios to powerdown-gpios")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 43a6845808 ]
The ov5640 driver expects DOVDD, AVDD and DVDD as regulator supply names.
The ov5640 has depended on these names since the driver was committed
upstream in 2017. Similarly apq8016-sbc.dtsi has had completely different
regulator names since its own initial commit in 2020.
Perhaps the regulators were left on in previous 410c bootloaders. In any
case today on 6.5 we won't switch on the ov5640 without correctly naming
the regulators.
Fixes: 39e0ce6cd1 ("arm64: dts: qcom: apq8016-sbc: Add CCI/Sensor nodes")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20230811234738.2859417-3-bryan.odonoghue@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 73387da70f ]
The Display Data Channel (DDC) transactions between an HDMI transmitter
(SIL9022A in this case) and an HDMI monitor, occur at a maximum of
100KHz. That's the maximum supported frequency within DDC standards.
While the SIL9022A can transact with the core at 400KHz, it needs to
drop the frequency to 100KHz when communicating with the monitor,
otherwise, the i2c controller times out and shows warning like this.
[ 985.773431] omap_i2c 20010000.i2c: controller timed out
That feature, however, has not been enabled in the SIL9022 driver.
Since, dropping the frequency doesn't affect any other devices on the
bus, drop the main-i2c1 frequency from 400KHz to 100KHz.
Fixes: a841581451 ("arm64: dts: ti: Refractor AM625 SK dts")
Signed-off-by: Aradhya Bhatia <a-bhatia1@ti.com>
Link: https://lore.kernel.org/r/20230809084559.17322-2-a-bhatia1@ti.com
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e1e1e9bb9d ]
Fix "warning: cast from pointer to integer of different size" on 64-bit
builds.
Note that this is a cosmetic fix at this point as the driver is not yet
used for 64-bit systems.
Fixes: feaa8baee8 ("bus: ti-sysc: Implement SoC revision handling")
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d47f9717e5 ]
The original formula was inaccurate:
dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
For write requests, when we assign a tags from sched_tags,
data->shallow_depth will be passed to sbitmap_find_bit,
see the following code:
nr = sbitmap_find_bit_in_word(&sb->map[index],
min_t (unsigned int,
__map_depth(sb, index),
depth),
alloc_hint, wrap);
The smaller of data->shallow_depth and __map_depth(sb, index)
will be used as the maximum range when allocating bits.
For a mmc device (one hw queue, deadline I/O scheduler):
q->nr_requests = sched_tags = 128, so according to the previous
calculation method, dd->async_depth = data->shallow_depth = 96,
and the platform is 64bits with 8 cpus, sched_tags.bitmap_tags.sb.shift=5,
sb.maps[]=32/32/32/32, 32 is smaller than 96, whether it is a read or
a write I/O, tags can be allocated to the maximum range each time,
which has not throttling effect.
In addition, refer to the methods of bfg/kyber I/O scheduler,
limit ratiois are calculated base on sched_tags.bitmap_tags.sb.shift.
This patch can throttle write requests really.
Fixes: 07757588e5 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/1691061162-22898-1-git-send-email-zhiguo.niu@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b59bc6e372 ]
Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
too many PATH records maybe cause soft lockup.
For example:
1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
2. auditctl -a exit,always -S open -k key
3. sysctl -w kernel.watchdog_thresh=5
4. mkdir /sys/kernel/debug/tracing/instances/test
There may be a soft lockup as follows:
watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
Kernel panic - not syncing: softlockup: hung tasks
Call trace:
dump_backtrace+0x0/0x30c
show_stack+0x20/0x30
dump_stack+0x11c/0x174
panic+0x27c/0x494
watchdog_timer_fn+0x2bc/0x390
__run_hrtimer+0x148/0x4fc
__hrtimer_run_queues+0x154/0x210
hrtimer_interrupt+0x2c4/0x760
arch_timer_handler_phys+0x48/0x60
handle_percpu_devid_irq+0xe0/0x340
__handle_domain_irq+0xbc/0x130
gic_handle_irq+0x78/0x460
el1_irq+0xb8/0x140
__audit_inode_child+0x240/0x7bc
tracefs_create_file+0x1b8/0x2a0
trace_create_file+0x18/0x50
event_create_dir+0x204/0x30c
__trace_add_new_event+0xac/0x100
event_trace_add_tracer+0xa0/0x130
trace_array_create_dir+0x60/0x140
trace_array_create+0x1e0/0x370
instance_mkdir+0x90/0xd0
tracefs_syscall_mkdir+0x68/0xa0
vfs_mkdir+0x21c/0x34c
do_mkdirat+0x1b4/0x1d4
__arm64_sys_mkdirat+0x4c/0x60
el0_svc_common.constprop.0+0xa8/0x240
do_el0_svc+0x8c/0xc0
el0_svc+0x20/0x30
el0_sync_handler+0xb0/0xb4
el0_sync+0x160/0x180
Therefore, we add cond_resched() to __audit_inode_child() to fix it.
Fixes: 5195d8e217 ("audit: dynamically allocate audit_names when not enough space is in the names array")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db07ce5da8 ]
The adreno_is_a20x() and adreno_is_a225() functions rely on the
GPU revision, but such information is retrieved inside adreno_gpu_init(),
which is called afterwards.
Fix this problem by caling adreno_gpu_init() earlier, so that
the GPU information revision is available when adreno_is_a20x()
and adreno_is_a225() run.
Tested on a imx53-qsb board.
Fixes: 21af872cd8 ("drm/msm/adreno: add a2xx")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/543456/
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ad49d37cf ]
There is a upper bound to "catlen" but no lower bound to prevent
negatives. I don't see that this necessarily causes a problem but we
may as well be safe.
Fixes: e114e47377 ("Smack: Simplified Mandatory Access Control Kernel")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f908786cf ]
This patch fixes the following sparse error:
drivers/soc/qcom/smem.c:738:30: error: incompatible types in comparison expression (different add ress spaces):
drivers/soc/qcom/smem.c:738:30: void *
drivers/soc/qcom/smem.c:738:30: void [noderef] __iomem *
In addr_in_range(), "base" is of type void __iomem *, converting
void *addr to the same type to fix above sparse error.
Fixes: 20bb6c9de1 ("soc: qcom: smem: map only partitions used by local HOST")
Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com>
Link: https://lore.kernel.org/r/20230801094807.4146779-1-chenjiahao16@huawei.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f81bb0ac78 ]
Aspeed always report the display port as "connected", because it
doesn't set a .detect_ctx callback.
Fix this by providing the proper detect callback for astdp and dp501.
This also fixes the following regression:
Since commit fae7d18640 ("drm/probe-helper: Default to 640x480 if no
EDID on DP") The default resolution is now 640x480 when no monitor is
connected. But Aspeed graphics is mostly used in servers, where no monitor
is attached. This also affects the remote BMC resolution to 640x480, which
is inconvenient, and breaks the anaconda installer.
v2: Add .detect callback to the dp/dp501 connector (Jani Nikula)
v3: Use .detect_ctx callback, and refactors (Thomas Zimmermann)
Add a BMC virtual connector
v4: Better indent detect_ctx() functions (Thomas Zimmermann)
v5: Enable polling of the dp and dp501 connector status
(Thomas Zimmermann)
v6: Change check order in ast_astdp_is_connected (Jammy Huang)
Fixes: fae7d18640 ("drm/probe-helper: Default to 640x480 if no EDID on DP")
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230713134316.332502-2-jfalempe@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 44abfa6a95 ]
Several reasons why 'reconfig_mutex' should be held:
1) rdev_for_each() is not safe to be called without the lock, because
rdev can be removed concurrently.
2) mddev_destroy_serial_pool() and mddev_create_serial_pool() should not
be called concurrently.
3) mddev_suspend() from mddev_destroy/create_serial_pool() should be
protected by the lock.
Fixes: 10c92fca63 ("md-bitmap: create and destroy wb_info_pool with the change of backlog")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230706083727.608914-3-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 673643490b ]
Commit 2ae6aaf769 ("md/raid10: fix io loss while replacement replace
rdev") reads replacement first to prevent io loss. However, there are same
issue in wait_blocked_dev() and raid10_handle_discard(), too. Fix it by
using dereference_rdev_and_rrdev() to get devices.
Fixes: d30588b273 ("md/raid10: improve raid10 discard request")
Fixes: f2e7e269a7 ("md/raid10: pull the code that wait for blocked dev into one function")
Signed-off-by: Li Nan <linan122@huawei.com>
Link: https://lore.kernel.org/r/20230701080529.2684932-4-linan666@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e24ed04389 ]
memalloc_noio_save() is called for the first mddev_suspend(), and
repeated mddev_suspend() only increase 'suspended'. However,
memalloc_noio_restore() is also called for the first mddev_resume(),
which means that memory reclaim will be enabled before the last
mddev_resume() is called, while the array is still suspended.
Fix this problem by restore 'noio_flag' for the last mddev_resume().
Fixes: 78f57ef9d5 ("md: use memalloc scope APIs in mddev_suspend()/mddev_resume()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230628012931.88911-3-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5befe22b3e ]
Running sparse on fsl_qmc_audio (make C=1) raises the following warnings:
fsl_qmc_audio.c:387:26: warning: restricted snd_pcm_format_t degrades to integer
fsl_qmc_audio.c:389:59: warning: incorrect type in argument 1 (different base types)
fsl_qmc_audio.c:389:59: expected restricted snd_pcm_format_t [usertype] format
fsl_qmc_audio.c:389:59: got unsigned int [assigned] i
fsl_qmc_audio.c:564:26: warning: restricted snd_pcm_format_t degrades to integer
fsl_qmc_audio.c:569:50: warning: incorrect type in argument 1 (different base types)
fsl_qmc_audio.c:569:50: expected restricted snd_pcm_format_t [usertype] format
fsl_qmc_audio.c:569:50: got int [assigned] i
fsl_qmc_audio.c:573:62: warning: incorrect type in argument 1 (different base types)
fsl_qmc_audio.c:573:62: expected restricted snd_pcm_format_t [usertype] format
fsl_qmc_audio.c:573:62: got int [assigned] i
These warnings are due to snd_pcm_format_t values handling done in the
driver. Some macros and functions exist to handle safely these values.
Use dedicated macros and functions to remove these warnings.
Fixes: 075c7125b1 ("ASoC: fsl: Add support for QMC audio")
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Link: https://lore.kernel.org/r/20230726161620.495298-1-herve.codina@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a87852e37f ]
Despite its name, the regulator vcc3v3_pcie30x1 has nothing to do with
pcie30x1. Instead, it supply power to VBAT1-5 on the M.2 KEY B port as
seen on page 8 of the schematic [1].
pcie30x1 is used for the mini PCIe slot, and as seen on page 9 the
vcc3v3_minipcie regulator is instead related to pcie30x1.
The M.2 KEY B port can be used for WWAN USB2 modules or SATA drives.
Use correct regulator vcc3v3_minipcie for pcie30x1.
[1] https://dl.radxa.com/cm3p/e25/radxa-e25-v1.4-sch.pdf
Fixes: 2bf2f4d9f6 ("arm64: dts: rockchip: Add Radxa CM3I E25")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20230724145213.3833099-1-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0cc8e1512 ]
Fixes the following:
WARNING: min() should probably be min_t(size_t, size, sizeof(ip))
+ ret = copy_to_user(out, &ip, min((size_t)size, sizeof(ip)));
And other style fixes:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Missing a blank line after declarations
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 44ad820780 ]
Both Luxul's XAP devices (XAP-810 and XAP-1440) are access points that
use a non-default design. They don't include switch but have a single
Ethernet port and BCM54210E PHY connected to the Ethernet controller's
MDIO bus.
Support for those devices regressed due to two changes:
1. Describing MDIO bus with switch
After commit 9fb90ae6ca ("ARM: dts: BCM53573: Describe on-SoC BCM53125
rev 4 switch") Linux stopped probing for MDIO devices.
2. Dropping hardcoded BCM54210E delays
In commit fea7fda7f5 ("net: phy: broadcom: Fix RGMII delays
configuration for BCM54210E") support for other PHY modes was added but
that requires a proper "phy-mode" value in DT.
Both above changes are correct (they don't need to be reverted or
anything) but they need this fix for DT data to be correct and for Linux
to work properly.
Fixes: 9fb90ae6ca ("ARM: dts: BCM53573: Describe on-SoC BCM53125 rev 4 switch")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Link: https://lore.kernel.org/r/20230713111145.14864-1-zajec5@gmail.com
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 548cb93205 ]
Visible glitches have been observed when running graphics applications on
Linux under Xen hypervisor. Those observations have been confirmed with
failures from kms_pwrite_crc Intel GPU test that verifies data coherency
of DRM frame buffer objects using hardware CRC checksums calculated by
display controllers, exposed to userspace via debugfs. Affected
processing paths have then been identified with new IGT test variants that
mmap the objects using different methods and caching modes [1].
When running as a Xen PV guest, Linux uses Xen provided PAT configuration
which is different from its native one. In particular, Xen specific PTE
encoding of write-combining caching, likely used by graphics applications,
differs from the Linux default one found among statically defined minimal
set of supported modes. Since Xen defines PTE encoding of the WC mode as
_PAGE_PAT, it no longer belongs to the minimal set, depends on correct
handling of _PAGE_PAT bit, and can be mismatched with write-back caching.
When a user calls mmap() for a DRM buffer object, DRM device specific
.mmap file operation, called from mmap_region(), takes care of setting PTE
encoding bits in a vm_page_prot field of an associated virtual memory area
structure. Unfortunately, _PAGE_PAT bit is not preserved when the vma's
.vm_flags are then applied to .vm_page_prot via vm_set_page_prot(). Bits
to be preserved are determined with _PAGE_CHG_MASK symbol that doesn't
cover _PAGE_PAT. As a consequence, WB caching is requested instead of WC
when running under Xen (also, WP is silently changed to WT, and UC
downgraded to UC_MINUS). When running on bare metal, WC is not affected,
but WP and WT extra modes are unintentionally replaced with WC and UC,
respectively.
WP and WT modes, encoded with _PAGE_PAT bit set, were introduced by commit
281d4078be ("x86: Make page cache mode a real type"). Care was taken
to extend _PAGE_CACHE_MASK symbol with that additional bit, but that
symbol has never been used for identification of bits preserved when
applying page protection flags. Support for all cache modes under Xen,
including the problematic WC mode, was then introduced by commit
47591df505 ("xen: Support Xen pv-domains using PAT").
The issue needs to be fixed by including _PAGE_PAT bit into a bitmask used
by pgprot_modify() for selecting bits to be preserved. We can do that
either internally to pgprot_modify() (as initially proposed), or by making
_PAGE_PAT a part of _PAGE_CHG_MASK. If we go for the latter then, since
_PAGE_PAT is the same as _PAGE_PSE, we need to note that _HPAGE_CHG_MASK
-- a huge pmds' counterpart of _PAGE_CHG_MASK, introduced by commit
c489f1257b ("thp: add pmd_modify"), defined as (_PAGE_CHG_MASK |
_PAGE_PSE) -- will no longer differ from _PAGE_CHG_MASK. If such
modification of _PAGE_CHG_MASK was irrelevant to its users then one might
wonder why that new _HPAGE_CHG_MASK symbol was introduced instead of
reusing the existing one with that otherwise irrelevant bit (_PAGE_PSE in
that case) added.
Add _PAGE_PAT to _PAGE_CHG_MASK and _PAGE_PAT_LARGE to _HPAGE_CHG_MASK for
symmetry. Split out common bits from both symbols to a common symbol for
clarity.
[ dhansen: tweak the solution changelog description ]
[1] https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/commit/0f0754413f14
Fixes: 281d4078be ("x86: Make page cache mode a real type")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7648
Link: https://lore.kernel.org/all/20230710073613.8006-2-janusz.krzysztofik%40linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 43c9835b14 ]
Currently the write_cache attribute allows enabling the QUEUE_FLAG_WC
flag on devices that never claimed the capability.
Fix that by adding a QUEUE_FLAG_HW_WC flag that is set by
blk_queue_write_cache and guards re-enabling the cache through sysfs.
Note that any rescan that calls blk_queue_write_cache will still
re-enable the write cache as in the current code.
Fixes: 93e9d8e836 ("block: add ability to flag write back caching on a device")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230707094239.107968-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20faf2005e ]
gpu->mmu_context is the MMU context of the last job in the HW queue, which
isn't necessarily the same as the context from the bad job. Dump the MMU
context from the scheduler determined bad submit to make it work as intended.
Fixes: 17e4660ae3 ("drm/etnaviv: implement per-process address spaces on MMUv2")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 590bfe5183 ]
After commit 71de0a054d ("arm64: tegra: Drop serial clock-names and
reset-names") was applied, the HSUART failed to probe and the following
error is seen:
serial-tegra 70006300.serial: Couldn't get the reset
serial-tegra: probe of 70006300.serial failed with error -2
Commit 71de0a054d ("arm64: tegra: Drop serial clock-names and
reset-names") is correct because the "reset-names" property is not
needed for 8250 UARTs. However, the "reset-names" is required for the
HSUART and should have been populated as part of commit a63c0cd837
("arm64: dts: tegra: smaug: Add Bluetooth node") that enabled the HSUART
for the Pixel C. Fix this by populating the "reset-names" property for
the HSUART on the Pixel C.
Fixes: a63c0cd837 ("arm64: dts: tegra: smaug: Add Bluetooth node")
Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 861dbb2b15 ]
After commit 71de0a054d ("arm64: tegra: Drop serial clock-names and
reset-names") was applied, the HSUART failed to probe and the following
error is seen:
serial-tegra 3100000.serial: Couldn't get the reset
serial-tegra: probe of 3100000.serial failed with error -2
Commit 71de0a054d ("arm64: tegra: Drop serial clock-names and
reset-names") is correct because the "reset-names" property is not
needed for 8250 UARTs. However, the "reset-names" is required for the
HSUART and should have been populated as part of commit ff578db7b6
("arm64: tegra: Enable UART instance on 40-pin header") that
enabled the HSUART for Jetson AGX Orin. Fix this by populating the
"reset-names" property for the HSUART on Jetson AGX Orin.
Fixes: ff578db7b6 ("arm64: tegra: Enable UART instance on 40-pin header")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3392ef368d ]
This fixes:
arch/arm/boot/dts/broadcom/bcm47189-luxul-xap-1440.dtb: pcie@2000: '#address-cells' is a required property
From schema: /lib/python3.10/site-packages/dtschema/schemas/pci/pci-bus.yaml
arch/arm/boot/dts/broadcom/bcm47189-luxul-xap-1440.dtb: pcie@2000: '#size-cells' is a required property
From schema: /lib/python3.10/site-packages/dtschema/schemas/pci/pci-bus.yaml
Two properties that need to be added later are "device_type" and
"ranges". Adding "device_type" on its own causes a new warning and the
value of "ranges" needs to be determined yet.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Link: https://lore.kernel.org/r/20230707114004.2740-3-zajec5@gmail.com
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05d2c3d552 ]
Such property simply doesn't exist (is not documented or used anywhere).
This fixes:
arch/arm/boot/dts/broadcom/bcm47189-luxul-xap-1440.dtb: usb@d000: Unevaluated properties are not allowed ('#usb-cells' was unexpected)
From schema: Documentation/devicetree/bindings/usb/generic-ohci.yaml
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Link: https://lore.kernel.org/r/20230707114004.2740-2-zajec5@gmail.com
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be7e1e5b0f ]
There is no such trigger documented or implemented in Linux. It was a
copy & paste mistake.
This fixes:
arch/arm/boot/dts/broadcom/bcm47189-luxul-xap-1440.dtb: leds: led-wlan:linux,default-trigger: 'oneOf' conditional failed, one must be fixed:
'default-off' is not one of ['backlight', 'default-on', 'heartbeat', 'disk-activity', 'disk-read', 'disk-write', 'timer', 'pattern', 'audio-micmute', 'audio-mute', 'bluetooth-power', 'flash', 'kbd-capslock', 'mtd', 'nand-disk', 'none', 'torch', 'usb-gadget', 'usb-host', 'usbport']
'default-off' does not match '^cpu[0-9]*$'
'default-off' does not match '^hci[0-9]+-power$'
'default-off' does not match '^mmc[0-9]+$'
'default-off' does not match '^phy[0-9]+tx$'
From schema: Documentation/devicetree/bindings/leds/leds-gpio.yaml
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Link: https://lore.kernel.org/r/20230707114004.2740-1-zajec5@gmail.com
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 822130b5e8 ]
On 32-bit architectures comparing a resource against a value larger than
U32_MAX can cause a warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1344:18: error: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
res->start > 0x100000000ull)
~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
As gcc does not warn about this in dead code, add an IS_ENABLED() check at
the start of the function. This will always return success but not actually resize
the BAR on 32-bit architectures without high memory, which is exactly what
we want here, as the driver can fall back to bank switching the VRAM
access.
Fixes: 31b8adab32 ("drm/amdgpu: require a root bus window above 4GB for BAR resize")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f38de940f ]
Add missing "detach" mailbox to this board to permit the CPU to inform
the remote processor on a detach. This signal allows the remote processor
firmware to stop IPC communication and to reinitialize the resources for
a re-attach.
Without this mailbox, detach is not possible and kernel log contains the
following warning to, so make sure all the STM32MP15xx platform DTs are
in sync regarding the mailboxes to fix the detach issue and the warning:
"
stm32-rproc 10000000.m4: mbox_request_channel_byname() could not locate channel named "detach"
"
Fixes: 6257dfc1c4 ("ARM: dts: stm32: Add coprocessor detach mbox on stm32mp15x-dkx boards")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit deb7edbc27 ]
Add missing "detach" mailbox to this board to permit the CPU to inform
the remote processor on a detach. This signal allows the remote processor
firmware to stop IPC communication and to reinitialize the resources for
a re-attach.
Without this mailbox, detach is not possible and kernel log contains the
following warning to, so make sure all the STM32MP15xx platform DTs are
in sync regarding the mailboxes to fix the detach issue and the warning:
"
stm32-rproc 10000000.m4: mbox_request_channel_byname() could not locate channel named "detach"
"
Fixes: 6257dfc1c4 ("ARM: dts: stm32: Add coprocessor detach mbox on stm32mp15x-dkx boards")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9bcfc3cdc9 ]
The generic ADC channel binding is recommended over legacy one, update the
DT to the modern binding. No functional change. For further details, see
commit which adds the generic binding to STM32 ADC binding document:
'664b9879f56e ("dt-bindings: iio: stm32-adc: add generic channel binding")'
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Stable-dep-of: deb7edbc27 ("ARM: dts: stm32: Add missing detach mailbox for DHCOM SoM")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 966f04a89d ]
Add missing "detach" mailbox to this board to permit the CPU to inform
the remote processor on a detach. This signal allows the remote processor
firmware to stop IPC communication and to reinitialize the resources for
a re-attach.
Without this mailbox, detach is not possible and kernel log contains the
following warning to, so make sure all the STM32MP15xx platform DTs are
in sync regarding the mailboxes to fix the detach issue and the warning:
"
stm32-rproc 10000000.m4: mbox_request_channel_byname() could not locate channel named "detach"
"
Fixes: 6257dfc1c4 ("ARM: dts: stm32: Add coprocessor detach mbox on stm32mp15x-dkx boards")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ee0ef38aa ]
Add missing "detach" mailbox to this board to permit the CPU to inform
the remote processor on a detach. This signal allows the remote processor
firmware to stop IPC communication and to reinitialize the resources for
a re-attach.
Without this mailbox, detach is not possible and kernel log contains the
following warning to, so make sure all the STM32MP15xx platform DTs are
in sync regarding the mailboxes to fix the detach issue and the warning:
"
stm32-rproc 10000000.m4: mbox_request_channel_byname() could not locate channel named "detach"
"
Fixes: 6257dfc1c4 ("ARM: dts: stm32: Add coprocessor detach mbox on stm32mp15x-dkx boards")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c46e9b6cc9 ]
Use STM32 ADC generic bindings instead of legacy bindings on
emtrion GmbH Argon boards.
The STM32 ADC specific binding to declare channels has been deprecated,
hence adopt the generic IIO channels bindings, instead.
The STM32MP151 device tree now exposes internal channels using the
generic binding. This makes the change mandatory here to avoid a mixed
use of legacy and generic binding, which is not supported by the driver.
Signed-off-by: Olivier Moysan <olivier.moysan@foss.st.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Stable-dep-of: 0ee0ef38aa ("ARM: dts: stm32: Add missing detach mailbox for emtrion emSBC-Argon")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b8a633507 ]
Sony ever so graciously provides GPIO line names in their downstream
kernel (though sometimes they are not 100% accurate and you can judge
that by simply looking at them and with what drivers they are used).
Add these to the PDX203&206 DTSIs to better document the hardware.
Diff between 203 and 206:
pm8009_gpios
< "CAM_PWR_LD_EN",
> "NC",
pm8150_gpios
< "NC",
> "G_ASSIST_N",
< "WLC_EN_N", /* GPIO_10 */
> "NC", /* GPIO_10 */
Which is due to 5 II having an additional Google Assistant hardware
button and 1 II having a wireless charger & different camera wiring
to accommodate the additional 3D iToF sensor.
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20230614-topic-edo_pinsgpiopmic-v2-2-6f90bba54c53@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Stable-dep-of: a422c6a91a ("arm64: dts: qcom: sm8250-edo: Rectify gpio-keys")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 40b398beab ]
Sony ever so graciously provides GPIO line names in their downstream
kernel (though sometimes they are not 100% accurate and you can judge
that by simply looking at them and with what drivers they are used).
Add these to the PDX203&206 DTSIs to better document the hardware.
Diff between 203 and 206:
< "CAM_PWR_A_CS",
> "FRONTC_PWR_EN",
< "CAM4_MCLK",
< "TOF_RST_N",
> "NC",
> "NC",
< "WLC_I2C_SDA",
< "WLC_I2C_SCL", /* GPIO_120 */
> "NC",
> "NC",
< "WLC_INT_N",
> "NC",
Which makes sense, as 203 has a 3D iToF, slightly different camera
power wiring and WLC (WireLess Charging).
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20230614-topic-edo_pinsgpiopmic-v2-1-6f90bba54c53@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Stable-dep-of: a422c6a91a ("arm64: dts: qcom: sm8250-edo: Rectify gpio-keys")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 775a5283c2 ]
sm8250 faces the same problem with its Energy Model as sdm845. The energy
cost of LITTLE cores is reported to be higher than medium or big cores
EM computes the energy with formula:
energy = OPP's cost / maximum cpu capacity * utilization
On v6.4-rc6 we have:
max capacity of CPU0 = 284
capacity of CPU0's OPP(1612800 Hz) = 253
cost of CPU0's OPP(1612800 Hz) = 191704
max capacity of CPU4 = 871
capacity of CPU4's OPP(710400 Hz) = 255
cost of CPU4's OPP(710400 Hz) = 343217
Both OPPs have almost the same compute capacity but the estimated energy
per unit of utilization will be estimated to:
energy CPU0 = 191704 / 284 * 1 = 675
energy CPU4 = 343217 / 871 * 1 = 394
EM estimates that little CPU0 will consume 71% more than medium CPU4 for
the same compute capacity. According to [1], little consumes 25% less than
medium core for Coremark benchmark at those OPPs for the same duration.
Set the dynamic-power-coefficient of CPU0-3 to 105 to fix the energy model
for little CPUs.
[1] https://github.com/kdrag0n/freqbench/tree/master/results/sm8250/k30s
Fixes: 6aabed5526 ("arm64: dts: qcom: sm8250: Add CPU capacities and energy model")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230615154852.130076-1-vincent.guittot@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7b484b1c9 ]
Since we're using these two macros to read a value from a register, we
need to use the FIELD_GET instead of the FIELD_PREP macro, otherwise
we're getting wrong values.
So instead of:
[ 3.111779] ocmem fdd00000.sram: 2 ports, 1 regions, 512 macros, not interleaved
we now get the correct value of:
[ 3.129672] ocmem fdd00000.sram: 2 ports, 1 regions, 2 macros, not interleaved
Fixes: 88c1e9404f ("soc: qcom: add OCMEM driver")
Reviewed-by: Caleb Connolly <caleb.connolly@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Link: https://lore.kernel.org/r/20230506-msm8226-ocmem-v3-1-79da95a2581f@z3ntu.xyz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d1077cf2e ]
Fixes the following build errors on arm64:
drivers/video/fbdev/hyperv_fb.c: In function 'hvfb_getmem':
>> drivers/video/fbdev/hyperv_fb.c:1033:24: error: 'screen_info' undeclared (first use in this function)
1033 | base = screen_info.lfb_base;
| ^~~~~~~~~~~
drivers/video/fbdev/hyperv_fb.c:1033:24: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/gpu/drm/hyperv/hyperv_drm_drv.c:75:54: error: 'screen_info' undeclared (first use in this function)
75 | drm_aperture_remove_conflicting_framebuffers(screen_info.lfb_base,
| ^~~~~~~~~~~
drivers/gpu/drm/hyperv/hyperv_drm_drv.c:75:54: note: each undeclared identifier is reported only once for each function it appears in
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307090823.nxnT8Kk5-lkp@intel.com/
Fixes: 81d2393485 ("fbdev/hyperv-fb: Do not set struct fb_info.apertures")
Fixes: 8b0d13545b ("efi: Do not include <linux/screen_info.h> from EFI header")
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230709100514.703759-1-suijingfeng@loongson.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e101bf95ea ]
[WHY]
Writing to DRR registers such as OTG_V_TOTAL_MIN on the same frame as a
pipe commit can cause underflow.
[HOW]
Move DMUB p-state delegate into optimze_bandwidth; enabling FAMS sets
optimized_required.
This change expects that Freesync requests are blocked when
optimized_required is true.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dabc8b2075 ]
The dquot_mark_dquot_dirty() using dquot references from the inode
should be protected by dquot_srcu. quota_off code takes care to call
synchronize_srcu(&dquot_srcu) to not drop dquot references while they
are used by other users. But dquot_transfer() breaks this assumption.
We call dquot_transfer() to drop the last reference of dquot and add
it to free_dquots, but there may still be other users using the dquot
at this time, as shown in the function graph below:
cpu1 cpu2
_________________|_________________
wb_do_writeback CHOWN(1)
...
ext4_da_update_reserve_space
dquot_claim_block
...
dquot_mark_dquot_dirty // try to dirty old quota
test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE
if (test_bit(DQ_MOD_B, &dquot->dq_flags))
// test no dirty, wait dq_list_lock
...
dquot_transfer
__dquot_transfer
dqput_all(transfer_from) // rls old dquot
dqput // last dqput
dquot_release
clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
atomic_dec(&dquot->dq_count)
put_dquot_last(dquot)
list_add_tail(&dquot->dq_free, &free_dquots)
// add the dquot to free_dquots
if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags))
add dqi_dirty_list // add released dquot to dirty_list
This can cause various issues, such as dquot being destroyed by
dqcache_shrink_scan() after being added to free_dquots, which can trigger
a UAF in dquot_mark_dquot_dirty(); or after dquot is added to free_dquots
and then to dirty_list, it is added to free_dquots again after
dquot_writeback_dquots() is executed, which causes the free_dquots list to
be corrupted and triggers a UAF when dqcache_shrink_scan() is called for
freeing dquot twice.
As Honza said, we need to fix dquot_transfer() to follow the guarantees
dquot_srcu should provide. But calling synchronize_srcu() directly from
dquot_transfer() is too expensive (and mostly unnecessary). So we add
dquot whose last reference should be dropped to the new global dquot
list releasing_dquots, and then queue work item which would call
synchronize_srcu() and after that perform the final cleanup of all the
dquots on releasing_dquots.
Fixes: 4580b30ea8 ("quota: Do not dirty bad dquots")
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230630110822.3881712-5-libaokun1@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b9bdfa165 ]
Now we have a helper function dquot_dirty() to determine if dquot has
DQ_MOD_B bit. dquot_active() can easily be misunderstood as a helper
function to determine if dquot has DQ_ACTIVE_B bit. So we avoid this by
renaming it to inode_quota_active() and later on we will add the helper
function dquot_active() to determine if dquot has DQ_ACTIVE_B bit.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230630110822.3881712-3-libaokun1@huawei.com>
Stable-dep-of: dabc8b2075 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05aa613345 ]
Before this patch, booting to Linux VT and doing a simple:
echo 2 > /sys/class/graphics/fb0/blank
echo 0 > /sys/class/graphics/fb0/blank
would result in failures to re-enable the panel. Mode set callback is
called only once during boot in this scenario, while calls to
enable/disable callbacks are balanced afterwards. The driver doesn't
work unless userspace calls modeset before enabling the CRTC/connector.
This patch moves enabling of the DSI host from mode_set into pre_enable
callback, and removes some old hacks where this bridge driver is
directly calling into other bridge driver's callbacks.
pre_enable_prev_first flag is set on the panel's bridge so that panel
drivers will get their prepare function called between DSI host's
pre_enable and enable callbacks, so that they get a chance to
perform panel setup while DSI host is already enabled in command
mode. Otherwise panel's prepare would be called before DSI host
is enabled, and any DSI communication used in prepare callback
would fail.
With all these changes, the enable/disable sequence is now well
balanced, and host's and panel's callbacks are called in proper order
documented in the drm_panel API documentation without needing the old
hacks. (Mainly that panel->prepare is called when DSI host is ready to
allow the panel driver to send DSI commands and vice versa during
disable.)
Tested on Pinephone Pro. Trace of the callbacks follows.
Before:
[ 1.253882] dw-mipi-dsi-rockchip ff960000.dsi: mode_set
[ 1.290732] panel-himax-hx8394 ff960000.dsi.0: prepare
[ 1.475576] dw-mipi-dsi-rockchip ff960000.dsi: enable
[ 1.475593] panel-himax-hx8394 ff960000.dsi.0: enable
echo 2 > /sys/class/graphics/fb0/blank
[ 13.722799] panel-himax-hx8394 ff960000.dsi.0: disable
[ 13.774502] dw-mipi-dsi-rockchip ff960000.dsi: post_disable
[ 13.774526] panel-himax-hx8394 ff960000.dsi.0: unprepare
echo 0 > /sys/class/graphics/fb0/blank
[ 17.735796] panel-himax-hx8394 ff960000.dsi.0: prepare
[ 17.923522] dw-mipi-dsi-rockchip ff960000.dsi: enable
[ 17.923540] panel-himax-hx8394 ff960000.dsi.0: enable
[ 17.944330] dw-mipi-dsi-rockchip ff960000.dsi: failed to write command FIFO
[ 17.944335] panel-himax-hx8394 ff960000.dsi.0: sending command 0xb9 failed: -110
[ 17.944340] panel-himax-hx8394 ff960000.dsi.0: Panel init sequence failed: -110
echo 2 > /sys/class/graphics/fb0/blank
[ 431.148583] panel-himax-hx8394 ff960000.dsi.0: disable
[ 431.169259] dw-mipi-dsi-rockchip ff960000.dsi: failed to write command FIFO
[ 431.169268] panel-himax-hx8394 ff960000.dsi.0: Failed to enter sleep mode: -110
[ 431.169282] dw-mipi-dsi-rockchip ff960000.dsi: post_disable
[ 431.169316] panel-himax-hx8394 ff960000.dsi.0: unprepare
[ 431.169357] pclk_mipi_dsi0 already disabled
echo 0 > /sys/class/graphics/fb0/blank
[ 432.796851] panel-himax-hx8394 ff960000.dsi.0: prepare
[ 432.981537] dw-mipi-dsi-rockchip ff960000.dsi: enable
[ 432.981568] panel-himax-hx8394 ff960000.dsi.0: enable
[ 433.002290] dw-mipi-dsi-rockchip ff960000.dsi: failed to write command FIFO
[ 433.002299] panel-himax-hx8394 ff960000.dsi.0: sending command 0xb9 failed: -110
[ 433.002312] panel-himax-hx8394 ff960000.dsi.0: Panel init sequence failed: -110
-----------------------------------------------------------------------
After:
[ 1.248372] dw-mipi-dsi-rockchip ff960000.dsi: mode_set
[ 1.248704] dw-mipi-dsi-rockchip ff960000.dsi: pre_enable
[ 1.285377] panel-himax-hx8394 ff960000.dsi.0: prepare
[ 1.468392] dw-mipi-dsi-rockchip ff960000.dsi: enable
[ 1.468421] panel-himax-hx8394 ff960000.dsi.0: enable
echo 2 > /sys/class/graphics/fb0/blank
[ 16.210357] panel-himax-hx8394 ff960000.dsi.0: disable
[ 16.261315] dw-mipi-dsi-rockchip ff960000.dsi: post_disable
[ 16.261339] panel-himax-hx8394 ff960000.dsi.0: unprepare
echo 0 > /sys/class/graphics/fb0/blank
[ 19.161453] dw-mipi-dsi-rockchip ff960000.dsi: pre_enable
[ 19.197869] panel-himax-hx8394 ff960000.dsi.0: prepare
[ 19.382141] dw-mipi-dsi-rockchip ff960000.dsi: enable
[ 19.382158] panel-himax-hx8394 ff960000.dsi.0: enable
(But depends on functionality intorduced in Linux 6.3, so this patch will
not build on older kernels when applied to older stable branches.)
Fixes: 46fc51546d ("drm/bridge/synopsys: Add MIPI DSI host controller bridge")
Signed-off-by: Ondrej Jirman <megi@xff.cz>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Robert Foss <rfoss@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230617224915.1923630-1-megi@xff.cz
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 597d0ec0e4 ]
MAC (CGX or RPM) asserts backpressure at TL3 or TL2 node of the egress
hierarchical scheduler tree depending on link level config done. If
there are multiple PFC priorities enabled at a time and for all such
flows to backoff, each priority will have to assert backpressure at
different TL3/TL2 scheduler nodes and these flows will need to submit
egress pkts to these nodes.
Current PFC configuration has an issue where in only one backpressure
scheduler node is being allocated which is resulting in only one PFC
priority to work. This patch fixes this issue.
Fixes: 99c969a83d ("octeontx2-pf: Add egress PFC support")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230824081032.436432-4-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47bcc9c1cf ]
Suppose user has enabled pfc with prio 0,1 on a PF netdev(eth0)
dcb pfc set dev eth0 prio-pfc o:on 1:on
later user enabled pfc priorities 2 and 3 on the VF interface(eth1)
dcb pfc set dev eth1 prio-pfc 2:on 3:on
Instead of enabling pfc on all priorities (0..3), the driver only
enables on priorities 2,3. This patch corrects the issue by using
the proper CSR address.
Fixes: b9d0fedc62 ("octeontx2-af: cn10kb: Add RPM_USX MAC support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230824081032.436432-3-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b4b2ded9c ]
1. Upon txschq free request, the transmit schedular config in hardware
is not getting reset. This patch adds necessary changes to do the same.
2. Current implementation calls txschq alloc during interface
initialization and in response handler updates the default txschq array.
This creates a problem for htb offload where txsch alloc will be called
for every tc class. This patch addresses the issue by reading txschq
response in mbox caller function instead in the response handler.
3. Current otx2_txschq_stop routine tries to free all txschq nodes
allocated to the interface. This creates a problem for htb offload.
This patch introduces the otx2_txschq_free_one to free txschq in a
given level.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: a9ac2e1877 ("octeontx2-pf: Fix PFC TX scheduler free")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3fc134a074 ]
Transceiver module temperature sensors are indexed after ASIC and
platform sensors. The current label printing method does not take this
into account and simply prints the index of the transceiver module
sensor.
On new systems that have platform sensors this results in incorrect
(shifted) transceiver module labels being printed:
$ sensors
[...]
front panel 002: +37.0°C (crit = +70.0°C, emerg = +75.0°C)
front panel 003: +47.0°C (crit = +70.0°C, emerg = +75.0°C)
[...]
Fix by taking the sensor count into account. After the fix:
$ sensors
[...]
front panel 001: +37.0°C (crit = +70.0°C, emerg = +75.0°C)
front panel 002: +47.0°C (crit = +70.0°C, emerg = +75.0°C)
[...]
Fixes: a53779de6a ("mlxsw: core: Add QSFP module temperature label attribute to hwmon")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d7248f1cc8 ]
Maximum size of buffer is obtained from underlying I2C adapter and in
case adapter allows I2C transaction buffer size greater than 100 bytes,
transaction will fail due to firmware limitation.
As a result driver will fail initialization.
Limit the maximum size of transaction buffer by 100 bytes to fit to
firmware.
Remove unnecessary calculation:
max_t(u16, MLXSW_I2C_BLK_DEF, quirk_size).
This condition can not happened.
Fixes: 3029a693be ("mlxsw: i2c: Allow flexible setting of I2C transactions size")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 146c7c3305 ]
The driver reads commands output from the output mailbox. If the size
of the output mailbox is not a multiple of the transaction /
block size, then the driver will not issue enough read transactions
to read the entire output, which can result in driver initialization
errors.
Fix by determining the number of transactions using DIV_ROUND_UP().
Fixes: 3029a693be ("mlxsw: i2c: Allow flexible setting of I2C transactions size")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 786c96e92f ]
It is not allowed to call kfree_skb() from hardware interrupt
context or with hardware interrupts being disabled.
So replace kfree_skb() with dev_kfree_skb_irq() under
local_irq_disable(). Compile tested only.
Fixes: 05fcd31cc4 ("arcnet: add err_skb package for package status feedback")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0aacec49c2 ]
The ice hardware has a synchronization mechanism used to drive the
simultaneous application of commands on both PHY ports and the source timer
in the MAC.
When issuing a sync via ice_ptp_exec_tmr_cmd(), the hardware will
simultaneously apply the commands programmed for the main timer and each
PHY port. Neither the main timer command register, nor the PHY port command
registers auto clear on command execution.
During the execution of a timer command intended for a single port on E822
devices, such as those used to configure a PHY during link up, the driver
is not correctly clearing the previous commands.
This results in unintentionally executing the last programmed command on
the main timer and other PHY ports whenever performing reconfiguration on
E822 ports after link up. This results in unintended side effects on other
timers, depending on what command was previously programmed.
To fix this, the driver must ensure that the main timer and all other PHY
ports are properly initialized to perform no action.
The enumeration for timer commands does not include an enumeration value
for doing nothing. Introduce ICE_PTP_NOP for this purpose. When writing a
timer command to hardware, leave the command bits set to zero which
indicates that no operation should be performed on that port.
Modify ice_ptp_one_port_cmd() to always initialize all ports. For all ports
other than the one being configured, write their timer command register to
ICE_PTP_NOP. This ensures that no side effect happens on the timer command.
To fix this for the PHY ports, modify ice_ptp_one_port_cmd() to always
initialize all other ports to ICE_PTP_NOP. This ensures that no side
effects happen on the other ports.
Call ice_ptp_src_cmd() with a command value if ICE_PTP_NOP in
ice_sync_phy_timer_e822() and ice_start_phy_timer_e822().
With both of these changes, the driver should no longer execute a stale
command on the main timer or another PHY port when reconfiguring one of the
PHY ports after link up.
Fixes: 3a7496234d ("ice: implement basic E822 PTP support")
Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2a05334d7f ]
It is not allowed to call kfree_skb() from hardware interrupt
context or with hardware interrupts being disabled.
So replace kfree_skb() with dev_kfree_skb_irq() under
spin_lock_irqsave(). Compile tested only.
Fixes: baac6276c0 ("Bluetooth: btusb: handle mSBC audio over USB Endpoints")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3344d31833 ]
Not calling hci_(dis)connect_cfm before deleting conn referred to by a
socket generally results to use-after-free.
When cleaning up SCO connections when the parent ACL is deleted too
early, use hci_conn_failed to do the connection cleanup properly.
We also need to clean up ISO connections in a similar situation when
connecting has started but LE Create CIS is not yet sent, so do it too
here.
Fixes: ca1fd42e7d ("Bluetooth: Fix potential double free caused by hci_conn_unlink")
Reported-by: syzbot+cf54c1da6574b6c1b049@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-bluetooth/00000000000013b93805fbbadc50@google.com/
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94d9ba9f98 ]
Use-after-free can occur in hci_disconnect_all_sync if a connection is
deleted by concurrent processing of a controller event.
To prevent this the code now tries to iterate over the list backwards
to ensure the links are cleanup before its parents, also it no longer
relies on a cursor, instead it always uses the last element since
hci_abort_conn_sync is guaranteed to call hci_conn_del.
UAF crash log:
==================================================================
BUG: KASAN: slab-use-after-free in hci_set_powered_sync
(net/bluetooth/hci_sync.c:5424) [bluetooth]
Read of size 8 at addr ffff888009d9c000 by task kworker/u9:0/124
CPU: 0 PID: 124 Comm: kworker/u9:0 Tainted: G W
6.5.0-rc1+ #10
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.2-1.fc38 04/01/2014
Workqueue: hci0 hci_cmd_sync_work [bluetooth]
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x90
print_report+0xcf/0x670
? __virt_addr_valid+0xdd/0x160
? hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
kasan_report+0xa6/0xe0
? hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
? __pfx_set_powered_sync+0x10/0x10 [bluetooth]
hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
? __pfx_hci_set_powered_sync+0x10/0x10 [bluetooth]
? __pfx_lock_release+0x10/0x10
? __pfx_set_powered_sync+0x10/0x10 [bluetooth]
hci_cmd_sync_work+0x137/0x220 [bluetooth]
process_one_work+0x526/0x9d0
? __pfx_process_one_work+0x10/0x10
? __pfx_do_raw_spin_lock+0x10/0x10
? mark_held_locks+0x1a/0x90
worker_thread+0x92/0x630
? __pfx_worker_thread+0x10/0x10
kthread+0x196/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
Allocated by task 1782:
kasan_save_stack+0x33/0x60
kasan_set_track+0x25/0x30
__kasan_kmalloc+0x8f/0xa0
hci_conn_add+0xa5/0xa80 [bluetooth]
hci_bind_cis+0x881/0x9b0 [bluetooth]
iso_connect_cis+0x121/0x520 [bluetooth]
iso_sock_connect+0x3f6/0x790 [bluetooth]
__sys_connect+0x109/0x130
__x64_sys_connect+0x40/0x50
do_syscall_64+0x60/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Freed by task 695:
kasan_save_stack+0x33/0x60
kasan_set_track+0x25/0x30
kasan_save_free_info+0x2b/0x50
__kasan_slab_free+0x10a/0x180
__kmem_cache_free+0x14d/0x2e0
device_release+0x5d/0xf0
kobject_put+0xdf/0x270
hci_disconn_complete_evt+0x274/0x3a0 [bluetooth]
hci_event_packet+0x579/0x7e0 [bluetooth]
hci_rx_work+0x287/0xaa0 [bluetooth]
process_one_work+0x526/0x9d0
worker_thread+0x92/0x630
kthread+0x196/0x1e0
ret_from_fork+0x2c/0x50
==================================================================
Fixes: 182ee45da0 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5af1f84ed1 ]
Connections may be cleanup while waiting for the commands to complete so
this attempts to check if the connection handle remains valid in case of
errors that would lead to call hci_conn_failed:
BUG: KASAN: slab-use-after-free in hci_conn_failed+0x1f/0x160
Read of size 8 at addr ffff888001376958 by task kworker/u3:0/52
CPU: 0 PID: 52 Comm: kworker/u3:0 Not tainted
6.5.0-rc1-00527-g2dfe76d58d3a #5615
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.2-1.fc38 04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl+0x1d/0x70
print_report+0xce/0x620
? __virt_addr_valid+0xd4/0x150
? hci_conn_failed+0x1f/0x160
kasan_report+0xd1/0x100
? hci_conn_failed+0x1f/0x160
hci_conn_failed+0x1f/0x160
hci_abort_conn_sync+0x237/0x360
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 94d9ba9f98 ("Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f777d88278 ]
Some use cases require the user to be informed if BIG synchronization
fails. This commit makes it so that even if the BIG sync established
event arrives with error status, a new hconn is added for each BIS,
and the iso layer is notified about the failed connections.
Unsuccesful bis connections will be marked using the
HCI_CONN_BIG_SYNC_FAILED flag. From the iso layer, the POLLERR event
is triggered on the newly allocated bis sockets, before adding them
to the accept list of the parent socket.
From user space, a new fd for each failed bis connection will be
obtained by calling accept. The user should check for the POLLERR
event on the new socket, to determine if the connection was successful
or not.
The HCI_CONN_BIG_SYNC flag has been added to mark whether the BIG sync
has been successfully established. This flag is checked at bis cleanup,
so the HCI LE BIG Terminate Sync command is only issued if needed.
The BT_SK_BIG_SYNC flag indicates if BIG create sync has been called
for a listening socket, to avoid issuing the command everytime a BIGInfo
advertising report is received.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 94d9ba9f98 ("Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a13f316e90 ]
This consolidates code for aborting connections using
hci_cmd_sync_queue so it is synchronized with other threads, but
because of the fact that some commands may block the cmd_sync_queue
while waiting specific events this attempt to cancel those requests by
using hci_cmd_sync_cancel.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 94d9ba9f98 ("Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6785b2edf4 ]
The commit being fixed introduced a hunk into check_func_arg_reg_off
that bypasses reg->off == 0 enforcement when offset points to a graph
node or root. This might possibly be done for treating bpf_rbtree_remove
and others as KF_RELEASE and then later check correct reg->off in helper
argument checks.
But this is not the case, those helpers are already not KF_RELEASE and
permit non-zero reg->off and verify it later to match the subobject in
BTF type.
However, this logic leads to bpf_obj_drop permitting free of register
arguments with non-zero offset when they point to a graph root or node
within them, which is not ok.
For instance:
struct foo {
int i;
int j;
struct bpf_rb_node node;
};
struct foo *f = bpf_obj_new(typeof(*f));
if (!f) ...
bpf_obj_drop(f); // OK
bpf_obj_drop(&f->i); // still ok from verifier PoV
bpf_obj_drop(&f->node); // Not OK, but permitted right now
Fix this by dropping the whole part of code altogether.
Fixes: 6a3cd3318f ("bpf: Migrate release_on_unlock logic to non-owning ref semantics")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230822175140.1317749-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7a2ef0c4b ]
While looking at a bug, I got rather confused by the layout of the
'status' field in ieee80211_tx_info. Apparently, the intention is that
status_driver_data[] is used for driver specific data, and fills up the
size of the union to 40 bytes, just like the other ones.
This is indeed what actually happens, but only because of the
combination of two mistakes:
- "void *status_driver_data[18 / sizeof(void *)];" is intended
to be 18 bytes long but is actually two bytes shorter because of
rounding-down in the division, to a multiple of the pointer
size (4 bytes or 8 bytes).
- The other fields combined are intended to be 22 bytes long, but
are actually 24 bytes because of padding in front of the
unaligned tx_time member, and in front of the pointer array.
The two mistakes cancel out. so the size ends up fine, but it seems
more helpful to make this explicit, by having a multiple of 8 bytes
in the size calculation and explicitly describing the padding.
Fixes: ea5907db2a ("mac80211: fix struct ieee80211_tx_info size")
Fixes: 02219b3abc ("mac80211: add WMM admission control support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230623152443.2296825-2-arnd@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 218d690c49 ]
The previous commit dd3e4fc75b ("nl80211/cfg80211: add BSS color to
NDP ranging parameters") adds a parameter for NDP ranging by introducing
a new attribute type named NL80211_PMSR_FTM_REQ_ATTR_BSS_COLOR.
However, the author forgot to also describe the nla_policy at
nl80211_pmsr_ftm_req_attr_policy (net/wireless/nl80211.c). Just
complement it to avoid malformed attribute that causes out-of-attribute
access.
Fixes: dd3e4fc75b ("nl80211/cfg80211: add BSS color to NDP ranging parameters")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809033151.768910-1-linma@zju.edu.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 927521170c ]
Code inspection reveals that we switch the puncturing bitmap
before the real channel switch, since that happens only in
the second round of the worker after the channel context is
switched by ieee80211_link_use_reserved_context().
Fixes: 2cc25e4b2a ("wifi: mac80211: configure puncturing bitmap")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 454994cfa9 ]
If ath9k_wmi_cmd() has exited with a timeout, it is possible that during
next ath9k_wmi_cmd() call the wmi_rsp callback for previous wmi command
writes to new wmi->cmd_rsp_buf and makes a completion. This results in an
invalid ath9k_wmi_cmd() return value.
Move the replacement of WMI command response buffer and length under
wmi_lock. Note that last_seq_id value is updated there, too.
Thus, the buffer cannot be written to by a belated wmi_rsp callback
because that path is properly rejected by the last_seq_id check.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: fb9987d0f7 ("ath9k_htc: Support for AR9271 chipset.")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230425192607.18015-2-pchelkin@ispras.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b674fb513e ]
Currently, the synchronization between ath9k_wmi_cmd() and
ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being
rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd().
Consider the following scenario:
CPU0 CPU1
ath9k_wmi_cmd(...)
mutex_lock(&wmi->op_mutex)
ath9k_wmi_cmd_issue(...)
wait_for_completion_timeout(...)
---
timeout
---
/* the callback is being processed
* before last_seq_id became zero
*/
ath9k_wmi_ctrl_rx(...)
spin_lock_irqsave(...)
/* wmi->last_seq_id check here
* doesn't detect timeout yet
*/
spin_unlock_irqrestore(...)
/* last_seq_id is zeroed to
* indicate there was a timeout
*/
wmi->last_seq_id = 0
mutex_unlock(&wmi->op_mutex)
return -ETIMEDOUT
ath9k_wmi_cmd(...)
mutex_lock(&wmi->op_mutex)
/* the buffer is replaced with
* another one
*/
wmi->cmd_rsp_buf = rsp_buf
wmi->cmd_rsp_len = rsp_len
ath9k_wmi_cmd_issue(...)
spin_lock_irqsave(...)
spin_unlock_irqrestore(...)
wait_for_completion_timeout(...)
/* the continuation of the
* callback left after the first
* ath9k_wmi_cmd call
*/
ath9k_wmi_rsp_callback(...)
/* copying data designated
* to already timeouted
* WMI command into an
* inappropriate wmi_cmd_buf
*/
memcpy(...)
complete(&wmi->cmd_wait)
/* awakened by the bogus callback
* => invalid return result
*/
mutex_unlock(&wmi->op_mutex)
return 0
To fix this, update last_seq_id on timeout path inside ath9k_wmi_cmd()
under the wmi_lock. Move ath9k_wmi_rsp_callback() under wmi_lock inside
ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for
initially designated wmi_cmd call, otherwise the path would be rejected
with last_seq_id check.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: fb9987d0f7 ("ath9k_htc: Support for AR9271 chipset.")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230425192607.18015-1-pchelkin@ispras.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d93a7cf6ca ]
In the commit 7c4cd051ad ("bpf: Fix syscall's stackmap lookup
potential deadlock"), a potential deadlock issue was addressed, which
resulted in *_map_lookup_elem not triggering BPF programs.
(prior to lookup, bpf_disable_instrumentation() is used)
To resolve the broken map lookup probe using "htab_map_lookup_elem",
this commit introduces an alternative approach. Instead, it utilize
"bpf_map_copy_value" and apply a filter specifically for the hash table
with map_type.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Fixes: 7c4cd051ad ("bpf: Fix syscall's stackmap lookup potential deadlock")
Link: https://lore.kernel.org/r/20230818090119.477441-8-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92632115fb ]
Recently, a new tracepoint for the block layer, specifically the
block_io_start/done tracepoints, was introduced in commit 5a80bd075f
("block: introduce block_io_start/block_io_done tracepoints").
Previously, the kprobe entry used for this purpose was quite unstable
and inherently broke relevant probes [1]. Now that a stable tracepoint
is available, this commit replaces the bio latency check with it.
One of the changes made during this replacement is the key used for the
hash table. Since 'struct request' cannot be used as a hash key, the
approach taken follows that which was implemented in bcc/biolatency [2].
(uses dev:sector for the key)
[1]: https://github.com/iovisor/bcc/issues/4261
[2]: https://github.com/iovisor/bcc/pull/4691
Fixes: 450b7879e3 ("block: move blk_account_io_{start,done} to blk-mq.c")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230818090119.477441-7-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 86684c2481 ]
Comparing .dts files to built .dtb files yielded a few .dts files which
are never built. Add them to the build.
Signed-off-by: Rob Herring <robh@kernel.org>
Stable-dep-of: 92632115fb ("samples/bpf: fix bio latency check with tracepoint")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99f34659e7 ]
Patch series "memfd: cleanups for vm.memfd_noexec", v2.
The most critical issue with vm.memfd_noexec=2 (the fact that passing
MFD_EXEC would bypass it entirely[1]) has been fixed in Andrew's
tree[2], but there are still some outstanding issues that need to be
addressed:
* vm.memfd_noexec=2 shouldn't reject old-style memfd_create(2) syscalls
because it will make it far to difficult to ever migrate. Instead it
should imply MFD_EXEC.
* The dmesg warnings are pr_warn_once(), which on most systems means
that they will be used up by systemd or some other boot process and
userspace developers will never see it.
- For the !(flags & (MFD_EXEC | MFD_NOEXEC_SEAL)) case, outputting a
rate-limited message to the kernel log is necessary to tell
userspace that they should add the new flags.
Arguably the most ideal way to deal with the spam concern[3,4]
while still prompting userspace to switch to the new flags would be
to only log the warning once per task or something similar.
However, adding something to task_struct for tracking this would be
needless bloat for a single pr_warn_ratelimited().
So just switch to pr_info_ratelimited() to avoid spamming the log
with something that isn't a real warning. There's lots of
info-level stuff in dmesg, it seems really unlikely that this
should be an actual problem. Most programs are already switching to
the new flags anyway.
- For the vm.memfd_noexec=2 case, we need to log a warning for every
failure because otherwise userspace will have no idea why their
previously working program started returning -EACCES (previously
-EINVAL) from memfd_create(2). pr_warn_once() is simply wrong here.
* The racheting mechanism for vm.memfd_noexec makes it incredibly
unappealing for most users to enable the sysctl because enabling it
on &init_pid_ns means you need a system reboot to unset it. Given the
actual security threat being protected against, CAP_SYS_ADMIN users
being restricted in this way makes little sense.
The argument for this ratcheting by the original author was that it
allows you to have a hierarchical setting that cannot be unset by
child pidnses, but this is not accurate -- changing the parent
pidns's vm.memfd_noexec setting to be more restrictive didn't affect
children.
Instead, switch the vm.memfd_noexec sysctl to be properly
hierarchical and allow CAP_SYS_ADMIN users (in the pidns's owning
userns) to lower the setting as long as it is not lower than the
parent's effective setting. This change also makes it so that
changing a parent pidns's vm.memfd_noexec will affect all
descendants, providing a properly hierarchical setting. The
performance impact of this is incredibly minimal since the maximum
depth of pidns is 32 and it is only checked during memfd_create(2)
and unshare(CLONE_NEWPID).
* The memfd selftests would not exit with a non-zero error code when
certain tests that ran in a forked process (specifically the ones
related to MFD_EXEC and MFD_NOEXEC_SEAL) failed.
[1]: https://lore.kernel.org/all/ZJwcsU0vI-nzgOB_@codewreck.org/
[2]: https://lore.kernel.org/all/20230705063315.3680666-1-jeffxu@google.com/
[3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/
This patch (of 5):
Before this change, a test runner using this self test would see a return
code of 0 when the tests using a child process (namely the MFD_NOEXEC_SEAL
and MFD_EXEC tests) failed, masking test failures.
Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com
Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-1-7ff9e3e10ba6@cyphar.com
Fixes: 11f75a0144 ("selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Jeff Xu <jeffxu@google.com>
Cc: "Christian Brauner (Microsoft)" <brauner@kernel.org>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc1fb82ae1 ]
sk_getsockopt() runs locklessly. This means sk->sk_lingertime
can be read while other threads are changing its value.
Other reads also happen without socket lock being held,
and must be annotated.
Remove preprocessor logic using BITS_PER_LONG, compilers
are smart enough to figure this by themselves.
v2: fixed a clang W=1 (-Wtautological-constant-out-of-range-compare) warning
(Jakub)
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab104318f6 ]
As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), so use IS_ERR() to fix the return value issue.
Fixes: 72df3489fb ("net: lan966x: Add ptp trap rules")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a171fbec88 ]
LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2,
such that any positive return value from a xmit hook could cause
unexpected continue behavior, despite that related skb may have been
freed. This could be error-prone for future xmit hook ops. One of the
possible errors is to return statuses of dst_output directly.
To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to
distinguish from dst_output statuses and check the continue
condition explicitly.
Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29b22badb7 ]
BPF encap ops can return different types of positive values, such like
NET_RX_DROP, NET_XMIT_CN, NETDEV_TX_BUSY, and so on, from function
skb_do_redirect and bpf_lwt_xmit_reroute. At the xmit hook, such return
values would be treated implicitly as LWTUNNEL_XMIT_CONTINUE in
ip(6)_finish_output2. When this happens, skbs that have been freed would
continue to the neighbor subsystem, causing use-after-free bug and
kernel crashes.
To fix the incorrect behavior, skb_do_redirect return values can be
simply discarded, the same as tc-egress behavior. On the other hand,
bpf_lwt_xmit_reroute returns useful errors to local senders, e.g. PMTU
information. Thus convert its return values to avoid the conflict with
LWTUNNEL_XMIT_CONTINUE.
Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Reported-by: Jordan Griege <jgriege@cloudflare.com>
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/0d2b878186cfe215fec6b45769c1cd0591d3628d.1692326837.git.yan@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e03dd62e5 ]
Chips such as BCM7278 support system wide suspend/resume which will
cause the HWRNG block to lose its state and reset to its power on reset
register values. We need to cleanup and re-initialize the HWRNG for it
to be functional coming out of a system suspend cycle.
Fixes: c3577f6100 ("hwrng: iproc-rng200 - Add support for BCM7278")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac8a529621 ]
Now there are two indicators of socket memory pressure sit inside
struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating
memory reclaim pressure in memcg->memory and ->tcpmem respectively.
When in legacy mode (cgroupv1), the socket memory is charged into
->tcpmem which is independent of ->memory, so socket_pressure has
nothing to do with socket's pressure at all. Things could be worse
by taking socket_pressure into consideration in legacy mode, as a
pressure in ->memory can lead to premature reclamation/throttling
in socket.
While for the default mode (cgroupv2), the socket memory is charged
into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used.
So {socket,tcpmem}_pressure are only used in default/legacy mode
respectively for indicating socket memory pressure. This patch fixes
the pieces of code that make mixed use of both.
Fixes: 8e8ae64524 ("mm: memcontrol: hook up vmpressure to socket pressure")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 36122201ee ]
In the original RPU query command, the status register values of
multiple RPU tunnels are accumulated by default, which is unreasonable.
This patch Fix it by querying the specified tunnel ID.
The tunnel number of the device can be obtained from firmware
during initialization.
Fixes: ddb54554fa ("net: hns3: add DFX registers information for ethtool -d")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8634b7c3f ]
The dump register function is being refactored.
The second step in refactoring is to support tlv info in regs data for
HNS3 PF driver.
Currently, if we use "ethtool -d" to dump regs value,
the output is as follows:
offset1: 00 01 02 03 04 05 ...
offset2:10 11 12 13 14 15 ...
......
We can't get the value of a register directly.
This patch deletes the original separator information and
add tag_len_value information in regs data.
ethtool can parse register data in key-value format by -d command.
a patch will be added to the ethtool to parse regs data
in the following format:
reg1 : value2
reg2 : value2
......
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 36122201ee ("net: hns3: fix wrong rpu tln reg issue")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 939ccd107f ]
The dump register function is being refactored.
The first step in refactoring is put the dump regs function
into a separate file.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 36122201ee ("net: hns3: fix wrong rpu tln reg issue")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 83a89c4b6a ]
Running the bench_rename test script, the following error occurs:
# ./benchs/run_bench_rename.sh
base : 0.819 ± 0.012M/s
kprobe : 0.538 ± 0.009M/s
kretprobe : 0.503 ± 0.004M/s
rawtp : 0.779 ± 0.020M/s
fentry : 0.726 ± 0.007M/s
fexit : 0.691 ± 0.007M/s
benchmark 'rename-fmodret' not found
The bench_rename_fmodret has been removed in commit b000def2e0
("selftests: Remove fmod_ret from test_overhead"), thus remove it
from the runners in the test script.
Fixes: b000def2e0 ("selftests: Remove fmod_ret from test_overhead")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230814030727.3010390-1-zouyipeng@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e89688e3e9 ]
In tcp_retransmit_timer(), a window shrunk connection will be regarded
as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not
right all the time.
The retransmits will become zero-window probes in tcp_retransmit_timer()
if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to
TCP_RTO_MAX sooner or later.
However, the timer can be delayed and be triggered after 122877ms, not
TCP_RTO_MAX, as I tested.
Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true
once the RTO come up to TCP_RTO_MAX, and the socket will die.
Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout',
which is exact the timestamp of the timeout.
However, "tp->rcv_tstamp" can restart from idle, then tp->rcv_tstamp
could already be a long time (minutes or hours) in the past even on the
first RTO. So we double check the timeout with the duration of the
retransmission.
Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket
dying too soon.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAMM096pg@mail.gmail.com/
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66dee21524 ]
When user tries to connect a new CIS when its CIG is not configurable,
that connection shall fail, but pre-existing connections shall not be
affected. However, currently hci_cc_le_set_cig_params deletes all CIS
of the CIG on error so it doesn't work, even though controller shall not
change CIG/CIS configuration if the command fails.
Fix by failing on command error only the connections that are not yet
bound, so that we keep the previous CIS configuration like the
controller does.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f78191cc9 ]
This attempts to always allocate a unique handle for connections so they
can be properly aborted by the likes of hci_abort_conn, so this uses the
invalid range as a pool of unset handles that way if userspace is trying
to create multiple connections at once each will be given a unique
handle which will be considered unset.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 66dee21524 ("Bluetooth: hci_event: drop only unbound CIS if Set CIG Parameters fails")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2bcd2b632 ]
KSAN reports use-after-free in hci_add_adv_monitor().
While adding an adv monitor,
hci_add_adv_monitor() calls ->
msft_add_monitor_pattern() calls ->
msft_add_monitor_sync() calls ->
msft_le_monitor_advertisement_cb() calls in an error case ->
hci_free_adv_monitor() which frees the *moniter.
This is referenced by bt_dev_dbg() in hci_add_adv_monitor().
Fix the bt_dev_dbg() by using handle instead of monitor->handle.
Fixes: b747a83690 ("Bluetooth: hci_sync: Refactor add Adv Monitor")
Signed-off-by: Manish Mandlik <mmandlik@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f55eea116 ]
The hci_add_adv_monitor() hci_remove_adv_monitor() functions call
bt_dev_dbg() to print some debug statements. The bt_dev_dbg() macro
automatically adds in the device's name. That means that we shouldn't
include the name in the bt_dev_dbg() calls.
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: a2bcd2b632 ("Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_add_adv_monitor()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3673952cf0 ]
Similar to commit c5d2b6fa26 ("Bluetooth: Fix use-after-free in
hci_remove_ltk/hci_remove_irk"). We can not access k after kfree_rcu()
call.
Fixes: d7d41682ef ("Bluetooth: Fix Suspicious RCU usage warnings")
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a091289218 ]
When running with concurrent task only one CIS was being assigned so
this attempts to rework the way the PDU is constructed so it is handled
later at the callback instead of in place.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2f84a70f9 ]
Only the number of CIS shall be limited to 0x1f, the CIS ID in the
other hand is up to 0xef.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7f923b1ef ]
Valid range of CIG/CIS are 0x00 to 0xEF, so this checks they are
properly checked before attempting to use HCI_OP_LE_SET_CIG_PARAMS.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8b5aed313 ]
in nokia_bluetooth_serdev_probe(), check the return value of
clk_prepare_enable() and return the error code if
clk_prepare_enable() returns an unexpected value.
Fixes: 7bb318680e ("Bluetooth: add nokia driver")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f74563e61 ]
LE Create CIS command shall not be sent before all CIS Established
events from its previous invocation have been processed. Currently it is
sent via hci_sync but that only waits for the first event, but there can
be multiple.
Make it wait for all events, and simplify the CIS creation as follows:
Add new flag HCI_CONN_CREATE_CIS, which is set if Create CIS has been
sent for the connection but it is not yet completed.
Make BT_CONNECT state to mean the connection wants Create CIS.
On events after which new Create CIS may need to be sent, send it if
possible and some connections need it. These events are:
hci_connect_cis, iso_connect_cfm, hci_cs_le_create_cis,
hci_le_cis_estabilished_evt.
The Create CIS status/completion events shall queue new Create CIS only
if at least one of the connections transitions away from BT_CONNECT, so
that we don't loop if controller is sending bogus events.
This fixes sending multiple CIS Create for the same CIS in the
"ISO AC 6(i) - Success" BlueZ test case:
< HCI Command: LE Create Co.. (0x08|0x0064) plen 9 #129 [hci0]
Number of CIS: 2
CIS Handle: 257
ACL Handle: 42
CIS Handle: 258
ACL Handle: 42
> HCI Event: Command Status (0x0f) plen 4 #130 [hci0]
LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1
Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 29 #131 [hci0]
LE Connected Isochronous Stream Established (0x19)
Status: Success (0x00)
Connection Handle: 257
...
< HCI Command: LE Setup Is.. (0x08|0x006e) plen 13 #132 [hci0]
...
> HCI Event: Command Complete (0x0e) plen 6 #133 [hci0]
LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
...
< HCI Command: LE Create Co.. (0x08|0x0064) plen 5 #134 [hci0]
Number of CIS: 1
CIS Handle: 258
ACL Handle: 42
> HCI Event: Command Status (0x0f) plen 4 #135 [hci0]
LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1
Status: ACL Connection Already Exists (0x0b)
> HCI Event: LE Meta Event (0x3e) plen 29 #136 [hci0]
LE Connected Isochronous Stream Established (0x19)
Status: Success (0x00)
Connection Handle: 258
...
Fixes: c09b80be6f ("Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHED")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0bfde167b ]
It is required for some configurations to have multiple BISes as part
of the same BIG.
Similar to the flow implemented for unicast, DEFER_SETUP will also be
used to bind multiple BISes for the same BIG, before starting Periodic
Advertising and creating the BIG.
The user will have to open a new socket for each BIS. By setting the
BT_DEFER_SETUP socket option and calling connect, a new connection
will be added for the BIG and advertising handle set by the socket
QoS parameters. Since all BISes will be bound for the same BIG and
advertising handle, the socket QoS options and base parameters should
match for all connections.
By calling connect on a socket that does not have the BT_DEFER_SETUP
option set, periodic advertising will be started and the BIG will
be created, with a BIS for each previously bound connection. Since
a BIG cannot be reconfigured with additional BISes after creation,
no more connections can be bound for the BIG after the start periodic
advertising and create BIG commands have been queued.
The bis_cleanup function has also been updated, so that the advertising
set and the BIG will not be terminated unless there are no more
bound or connected BISes.
The HCI_CONN_BIG_CREATED connection flag has been added to indicate
that the BIG has been successfully created. This flag is checked at
bis_cleanup, so that the BIG is only terminated if the
HCI_LE_Create_BIG_Complete has been received.
This implementation has been tested on hardware, using the "isotest"
tool with an additional command line option, to specify the number of
BISes to create as part of the desired BIG:
tools/isotest -i hci0 -s 00:00:00:00:00:00 -N 2 -G 1 -T 1
The btmon log shows that a BIG containing 2 BISes has been created:
< HCI Command: LE Create Broadcast Isochronous Group (0x08|0x0068) plen 31
Handle: 0x01
Advertising Handle: 0x01
Number of BIS: 2
SDU Interval: 10000 us (0x002710)
Maximum SDU size: 40
Maximum Latency: 10 ms (0x000a)
RTN: 0x02
PHY: LE 2M (0x02)
Packing: Sequential (0x00)
Framing: Unframed (0x00)
Encryption: 0x00
Broadcast Code: 00000000000000000000000000000000
> HCI Event: Command Status (0x0f) plen 4
LE Create Broadcast Isochronous Group (0x08|0x0068) ncmd 1
Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 23
LE Broadcast Isochronous Group Complete (0x1b)
Status: Success (0x00)
Handle: 0x01
BIG Synchronization Delay: 1974 us (0x0007b6)
Transport Latency: 1974 us (0x0007b6)
PHY: LE 2M (0x02)
NSE: 3
BN: 1
PTO: 1
IRC: 3
Maximum PDU: 40
ISO Interval: 10.00 msec (0x0008)
Connection Handle #0: 10
Connection Handle #1: 11
< HCI Command: LE Setup Isochronous Data Path (0x08|0x006e) plen 13
Handle: 10
Data Path Direction: Input (Host to Controller) (0x00)
Data Path: HCI (0x00)
Coding Format: Transparent (0x03)
Company Codec ID: Ericsson Technology Licensing (0)
Vendor Codec ID: 0
Controller Delay: 0 us (0x000000)
Codec Configuration Length: 0
Codec Configuration:
> HCI Event: Command Complete (0x0e) plen 6
LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
Status: Success (0x00)
Handle: 10
< HCI Command: LE Setup Isochronous Data Path (0x08|0x006e) plen 13
Handle: 11
Data Path Direction: Input (Host to Controller) (0x00)
Data Path: HCI (0x00)
Coding Format: Transparent (0x03)
Company Codec ID: Ericsson Technology Licensing (0)
Vendor Codec ID: 0
Controller Delay: 0 us (0x000000)
Codec Configuration Length: 0
Codec Configuration:
> HCI Event: Command Complete (0x0e) plen 6
LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
Status: Success (0x00)
Handle: 11
< ISO Data TX: Handle 10 flags 0x02 dlen 44
< ISO Data TX: Handle 11 flags 0x02 dlen 44
> HCI Event: Number of Completed Packets (0x13) plen 5
Num handles: 1
Handle: 10
Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5
Num handles: 1
Handle: 11
Count: 1
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 7f74563e61 ("Bluetooth: ISO: do not emit new LE Create CIS if previous is pending")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2be22f1941 ]
The ISO Interval on CIS Established Event uses 1.25 ms slots:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E
page 2304:
Time = N * 1.25 ms
In addition to that this always update the QoS settings based on CIS
Established Event.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 7f74563e61 ("Bluetooth: ISO: do not emit new LE Create CIS if previous is pending")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aec4880516 ]
If pm_runtime_get() (disguised as pm_runtime_resume_and_get()) fails, this
means the clk wasn't prepared and enabled. Returning early in this case
however is wrong as then the following resource frees are skipped and this
is never catched up. So do all the cleanups but clk_disable_unprepare().
Also don't emit a warning, as stm32_hash_runtime_resume() already emitted
one.
Note that the return value of stm32_hash_remove() is mostly ignored by
the device core. The only effect of returning zero instead of an error
value is to suppress another warning in platform_remove(). So return 0
even if pm_runtime_resume_and_get() failed.
Fixes: 8b4d566de6 ("crypto: stm32/hash - Add power management support")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dee3a6b819 ]
rust_is_available.sh uses cc-version.sh to identify which C compiler is
in use, as scripts/Kconfig.include does. cc-version.sh isn't designed to
be able to handle multiple arguments in one variable, i.e. "ccache clang".
Its invocation in rust_is_available.sh quotes "$CC", which makes
$1 == "ccache clang" instead of the intended $1 == ccache & $2 == clang.
cc-version.sh could also be changed to handle having "ccache clang" as one
argument, but it only has the one consumer upstream, making it simpler to
fix the caller here.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Fixes: 78521f3399 ("scripts: add `rust_is_available.sh`")
Link: https://github.com/Rust-for-Linux/linux/pull/873
[ Reworded title prefix and reflow line to 75 columns. ]
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230616001631.463536-3-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de5e92cb5c ]
There is two warnings reported by coccinelle:
./drivers/spi/spi-mpc512x-psc.c:493:5-13: WARNING:
Unsigned expression compared with zero: mps -> irq < 0
./drivers/spi/spi-mpc52xx-psc.c:332:5-13: WARNING:
Unsigned expression compared with zero: mps -> irq < 0
The commit "208ee586f862"
("spi: mpc5xxx-psc: Return immediately if IRQ resource is unavailable")
was to check whether the IRQ resource is unavailable. When the IRQ
resource is unavailable, an error code is returned, however, the type
of "mps->irq" is "unsigned int", causing the error code to flip. Modify
the type of "mps->irq" to solve this problem.
Fixes: 208ee586f8 ("spi: mpc5xxx-psc: Return immediately if IRQ resource is unavailable")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/20230803134805.1037251-1-lizetao1@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 416c6d0124 ]
commit bdeeed3498 ("libbpf: fix offsetof() and container_of() to work with CO-RE")
...was backported to stable trees such as 5.15. The problem is that with older
LLVM/clang (14/15) - which is often used for older kernels - we see compilation
failures in BPF selftests now:
In file included from progs/test_cls_redirect_subprogs.c:2:
progs/test_cls_redirect.c:90:2: error: static assertion expression is not an integral constant expression
sizeof(flow_ports_t) !=
^~~~~~~~~~~~~~~~~~~~~~~
progs/test_cls_redirect.c:91:3: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression
offsetofend(struct bpf_sock_tuple, ipv4.dport) -
^
progs/test_cls_redirect.c:32:3: note: expanded from macro 'offsetofend'
(offsetof(TYPE, MEMBER) + sizeof((((TYPE *)0)->MEMBER)))
^
tools/testing/selftests/bpf/tools/include/bpf/bpf_helpers.h:86:33: note: expanded from macro 'offsetof'
^
In file included from progs/test_cls_redirect_subprogs.c:2:
progs/test_cls_redirect.c:95:2: error: static assertion expression is not an integral constant expression
sizeof(flow_ports_t) !=
^~~~~~~~~~~~~~~~~~~~~~~
progs/test_cls_redirect.c:96:3: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression
offsetofend(struct bpf_sock_tuple, ipv6.dport) -
^
progs/test_cls_redirect.c:32:3: note: expanded from macro 'offsetofend'
(offsetof(TYPE, MEMBER) + sizeof((((TYPE *)0)->MEMBER)))
^
tools/testing/selftests/bpf/tools/include/bpf/bpf_helpers.h:86:33: note: expanded from macro 'offsetof'
^
2 errors generated.
make: *** [Makefile:594: tools/testing/selftests/bpf/test_cls_redirect_subprogs.bpf.o] Error 1
The problem is the new offsetof() does not play nice with static asserts.
Given that the context is a static assert (and CO-RE relocation is not
needed at compile time), offsetof() usage can be replaced by restoring
the original offsetof() definition as __builtin_offsetof().
Fixes: bdeeed3498 ("libbpf: fix offsetof() and container_of() to work with CO-RE")
Reported-by: Colm Harrington <colm.harrington@oracle.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Yipeng Zou <zouyipeng@huawei.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230802073906.3197480-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 603cf6c2fc ]
Two memory copies in this function copy from a short array into a longer one,
using the wrong size, which leads to an out-of-bounds access:
include/linux/fortify-string.h:592:4: error: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
__read_overflow2_field(q_size_field, size);
^
include/linux/fortify-string.h:592:4: error: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
2 errors generated.
Fixes: d889913205 ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230703123737.3420464-1-arnd@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 72c8caf904 ]
5 GHz band channel 177 support was added with the commit e5e94d10c8 ("wifi:
ath11k: add channel 177 into 5 GHz channel list"). However, during processing
for the received ppdu in ath11k_dp_rx_h_ppdu(), channel number is checked only
till 173. This leads to driver code checking for channel and then fetching the
band from it which is extra effort since firmware has already given the channel
number in the metadata.
Fix this issue by checking the channel number till 177 since we support
it now.
Found via code review. Compile tested only.
Fixes: e5e94d10c8 ("wifi: ath11k: add channel 177 into 5 GHz channel list")
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230726044624.20507-1-quic_adisi@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 288c63d5cb ]
Add missing 'kfree_skb()' in 'mwifiex_init_rxq_ring()' and never do
'kfree(card->rxbd_ring_vbase)' because this area is DMAed and should
be released with 'dma_free_coherent()'. The latter is performed in
'mwifiex_pcie_delete_rxbd_ring()', which is now called to recover
from possible errors in 'mwifiex_pcie_create_rxbd_ring()'. Likewise
for 'mwifiex_pcie_init_evt_ring()', 'kfree(card->evtbd_ring_vbase)'
'mwifiex_pcie_delete_evtbd_ring()' and 'mwifiex_pcie_create_rxbd_ring()'.
Fixes: d930faee14 ("mwifiex: add support for Marvell pcie8766 chipset")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230731074334.56463-1-dmantipov@yandex.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 942999c48c ]
When using compressed firmware, the early firmware load feature will fail.
In most cases, the only downside is that if a device has more than one
firmware version available, only the last one listed will be loaded.
In at least two cases, there is no firmware loaded, and the device fails
initialization. See https://github.com/lwfinger/rtw89/issues/259 and
https://bugzilla.opensuse.org/show_bug.cgi?id=1212808 for examples of
the failure.
When firmware_class.dyndbg=+p" added to the kernel boot parameters, the
following is found:
finger@localhost:~/rtw89>sudo dmesg -t | grep rtw89
firmware_class: __allocate_fw_priv: fw-rtw89/rtw8852b_fw-1.bin fw_priv=00000000638862fb
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/5.14.21-150500.53-default/rtw89/rtw8852b_fw-1.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/rtw89/rtw8852b_fw-1.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/5.14.21-150500.53-default/rtw89/rtw8852b_fw-1.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/rtw89/rtw8852b_fw-1.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: Direct firmware load for rtw89/rtw8852b_fw-1.bin failed with error -2
firmware_class: __free_fw_priv: fw-rtw89/rtw8852b_fw-1.bin fw_priv=00000000638862fb data=00000000307c30c7 size=0
firmware_class: __allocate_fw_priv: fw-rtw89/rtw8852b_fw.bin fw_priv=00000000638862fb
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/5.14.21-150500.53-default/rtw89/rtw8852b_fw.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/rtw89/rtw8852b_fw.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/5.14.21-150500.53-default/rtw89/rtw8852b_fw.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/rtw89/rtw8852b_fw.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: Direct firmware load for rtw89/rtw8852b_fw.bin failed with error -2
firmware_class: __free_fw_priv: fw-rtw89/rtw8852b_fw.bin fw_priv=00000000638862fb data=00000000307c30c7 size=0
rtw89_8852be 0000:02:00.0: failed to early request firmware: -2
firmware_class: __allocate_fw_priv: fw-rtw89/rtw8852b_fw.bin fw_priv=00000000638862fb
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/5.14.21-150500.53-default/rtw89/rtw8852b_fw.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/rtw89/rtw8852b_fw.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/5.14.21-150500.53-default/rtw89/rtw8852b_fw.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/rtw89/rtw8852b_fw.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/5.14.21-150500.53-default/rtw89/rtw8852b_fw.bin.xz failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/rtw89/rtw8852b_fw.bin.xz failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/5.14.21-150500.53-default/rtw89/rtw8852b_fw.bin.xz failed for no such file or directory.
rtw89_8852be 0000:02:00.0: Loading firmware from /lib/firmware/rtw89/rtw8852b_fw.bin.xz
rtw89_8852be 0000:02:00.0: f/w decompressing rtw89/rtw8852b_fw.bin
firmware_class: fw_set_page_data: fw-rtw89/rtw8852b_fw.bin fw_priv=00000000638862fb data=000000004ed6c2f7 size=1035232
rtw89_8852be 0000:02:00.0: Firmware version 0.27.32.1, cmd version 0, type 1
rtw89_8852be 0000:02:00.0: Firmware version 0.27.32.1, cmd version 0, type 3
The key is that firmware version 0.27.32.1 is loaded.
With this patch, the following is obtained:
firmware_class: __free_fw_priv: fw-rtw89/rtw8852b_fw.bin fw_priv=000000000849addc data=00000000fd3cabe2 size=1035232
firmware_class: fw_name_devm_release: fw_name-rtw89/rtw8852b_fw.bin devm-000000002d8c3343 released
firmware_class: __allocate_fw_priv: fw-rtw89/rtw8852b_fw-1.bin fw_priv=000000009e1a6364
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/6.4.3-1-default/rtw89/rtw8852b_fw-1.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/rtw89/rtw8852b_fw-1.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/6.4.3-1-default/rtw89/rtw8852b_fw-1.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/rtw89/rtw8852b_fw-1.bin failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/6.4.3-1-default/rtw89/rtw8852b_fw-1.bin.zst failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/rtw89/rtw8852b_fw-1.bin.zst failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/6.4.3-1-default/rtw89/rtw8852b_fw-1.bin.zst failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/rtw89/rtw8852b_fw-1.bin.zst failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/6.4.3-1-default/rtw89/rtw8852b_fw-1.bin.xz failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/updates/rtw89/rtw8852b_fw-1.bin.xz failed for no such file or directory.
rtw89_8852be 0000:02:00.0: loading /lib/firmware/6.4.3-1-default/rtw89/rtw8852b_fw-1.bin.xz failed for no such file or directory.
rtw89_8852be 0000:02:00.0: Loading firmware from /lib/firmware/rtw89/rtw8852b_fw-1.bin.xz
rtw89_8852be 0000:02:00.0: f/w decompressing rtw89/rtw8852b_fw-1.bin
firmware_class: fw_set_page_data: fw-rtw89/rtw8852b_fw-1.bin fw_priv=000000009e1a6364 data=00000000fd3cabe2 size=1184992
rtw89_8852be 0000:02:00.0: Loaded FW: rtw89/rtw8852b_fw-1.bin, sha256: 8539efc75f513f4585cf0cd6e79e6507da47fce87225f2d0de391a03aefe9ac8
rtw89_8852be 0000:02:00.0: loaded firmware rtw89/rtw8852b_fw-1.bin
rtw89_8852be 0000:02:00.0: Firmware version 0.29.29.1, cmd version 0, type 5
rtw89_8852be 0000:02:00.0: Firmware version 0.29.29.1, cmd version 0, type 3
Now, version 0.29.29.1 is loaded.
Fixes: ffde7f3476 ("wifi: rtw89: add firmware format version to backward compatible with older drivers")
Cc: Ping-Ke Shih <pkshih@realtek.com>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230724183927.28553-1-Larry.Finger@lwfinger.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c0570bc21 ]
If user changes the channel without completely disabling the interface the
txpower_sku values reported track the old channel the device was operating on.
If user bounces the interface the correct power tables are applied.
mt7915_sku_group_len array gets updated before the channel switch happens so it
uses data from the old channel.
Fixes: ecb187a74e ("mt76: mt7915: rework the flow of txpower setting")
Fixes: f1d962369d ("mt76: mt7915: implement HE per-rate tx power support")
Reported-By: Chad Monroe <chad.monroe@smartrg.com>
Tested-by: Chad Monroe <chad.monroe@smartrg.com>
Signed-off-by: Allen Ye <allen.ye@mediatek.com>
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 74f12d5116 ]
It seems that the nla_policy in mt76_tm_policy is missed for attribute
MT76_TM_ATTR_TX_LENGTH. This patch adds the correct description to make
sure the
u32 val = nla_get_u32(tb[MT76_TM_ATTR_TX_LENGTH]);
in function mt76_testmode_cmd() is safe and will not result in
out-of-attribute read.
Fixes: f0efa86215 ("mt76: add API for testmode support")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ec5ac12ac ]
The IEEE80211_VHT_CAP_EXT_NSS_BW value already indicates support for half-NSS
160 MHz support, so it is wrong to also advertise full 160 MHz support.
Fixes: c2f73eacee ("wifi: mt76: mt7915: add back 160MHz channel width support for MT7915")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02a894046d ]
Capabilities in vif->bss_conf are only initialized in AP mode.
For other modes, they should be enabled by default, in order to avoid a
mismatch.
Fixes: 885f7af7e5 ("wifi: mt76: mt7915: remove mt7915_mcu_beacon_check_caps()")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4f0755823 ]
Due to AP stop improperly, mt7915 driver would face random command timeout
by chip fw problem. Migrate AP start/stop process to .start_ap/.stop_ap and
congiure BSS network settings in both hooks.
The new flow is shown below.
* AP start
.start_ap()
configure BSS network resource
set BSS to connected state
.bss_info_changed()
enable fw beacon offload
* AP stop
.bss_info_changed()
disable fw beacon offload (skip this command)
.stop_ap()
set BSS to disconnected state (beacon offload disabled automatically)
destroy BSS network resource
Based on "mt76: mt7921: fix command timeout in AP stop period"
Signed-off-by: Rany Hany <rany_hany@riseup.net>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Stable-dep-of: 02a894046d ("wifi: mt76: mt7915: fix capabilities in non-AP mode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67312adc96 ]
The semantics for bpf_sk_assign are as follows:
sk = some_lookup_func()
bpf_sk_assign(skb, sk)
bpf_sk_release(sk)
That is, the sk is not consumed by bpf_sk_assign. The function
therefore needs to make sure that sk lives long enough to be
consumed from __inet_lookup_skb. The path through the stack for a
TCPv4 packet is roughly:
netif_receive_skb_core: takes RCU read lock
__netif_receive_skb_core:
sch_handle_ingress:
tcf_classify:
bpf_sk_assign()
deliver_ptype_list_skb:
deliver_skb:
ip_packet_type->func == ip_rcv:
ip_rcv_core:
ip_rcv_finish_core:
dst_input:
ip_local_deliver:
ip_local_deliver_finish:
ip_protocol_deliver_rcu:
tcp_v4_rcv:
__inet_lookup_skb:
skb_steal_sock
The existing helper takes advantage of the fact that everything
happens in the same RCU critical section: for sockets with
SOCK_RCU_FREE set bpf_sk_assign never takes a reference.
skb_steal_sock then checks SOCK_RCU_FREE again and does sock_put
if necessary.
This approach assumes that SOCK_RCU_FREE is never set on a sk
between bpf_sk_assign and skb_steal_sock, but this invariant is
violated by unhashed UDP sockets. A new UDP socket is created
in TCP_CLOSE state but without SOCK_RCU_FREE set. That flag is only
added in udp_lib_get_port() which happens when a socket is bound.
When bpf_sk_assign was added it wasn't possible to access unhashed
UDP sockets from BPF, so this wasn't a problem. This changed
in commit 0c48eefae7 ("sock_map: Lift socket state restriction
for datagram sockets"), but the helper wasn't adjusted accordingly.
The following sequence of events will therefore lead to a refcount
leak:
1. Add socket(AF_INET, SOCK_DGRAM) to a sockmap.
2. Pull socket out of sockmap and bpf_sk_assign it. Since
SOCK_RCU_FREE is not set we increment the refcount.
3. bind() or connect() the socket, setting SOCK_RCU_FREE.
4. skb_steal_sock will now set refcounted = false due to
SOCK_RCU_FREE.
5. tcp_v4_rcv() skips sock_put().
Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
This matches the behaviour of __inet_lookup_skb which is ultimately
the goal of bpf_sk_assign().
Fixes: cf7fbe660f ("bpf: Add socket assign support")
Cc: Joe Stringer <joe@cilium.io>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-2-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0ea27e7bf ]
Contrary to TCP, UDP reuseport groups can contain TCP_ESTABLISHED
sockets. To support these properly we remember whether a group has
a connected socket and skip the fast reuseport early-return. In
effect we continue scoring all reuseport sockets and then choose the
one with the highest score.
The current code fails to re-calculate the score for the result of
lookup_reuseport. According to Kuniyuki Iwashima:
1) SO_INCOMING_CPU is set
-> selected sk might have +1 score
2) BPF prog returns ESTABLISHED and/or SO_INCOMING_CPU sk
-> selected sk will have more than 8
Using the old score could trigger more lookups depending on the
order that sockets are created.
sk -> sk (SO_INCOMING_CPU) -> sk (ESTABLISHED)
| |
`-> select the next SO_INCOMING_CPU sk
|
`-> select itself (We should save this lookup)
Fixes: efc6b6f6c3 ("udp: Improve load balancing for SO_REUSEPORT.")
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-1-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1634de418b ]
Fix rx ring size of WA event to get rid of event loss and queue overflow
problems.
Fixes: 98686cd216 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
Signed-off-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ffe0d5690 ]
If driver directly uses the band_idx reported from the radar event to
access mt76_phy array, it will get the wrong phy for background radar.
Fix this by adjusting the statement.
Fixes: 98686cd216 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
Signed-off-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc945b5462 ]
The bmc_tx_wlan_idx should be the wlan_idx of the current bss rather
than peer AP's wlan_idx, otherwise there will appear some frame
decryption problems on station mode.
Fixes: 98686cd216 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
Reviewed-by: Shayne Chen <shayne.chen@mediatek.com>
Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com>
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e5911bb7c ]
Due to the scan command may only request legacy bands and PSC channel
in 6GHz band, we are unable to scan the APs on non-PSC channel in this
case. Enable WIPHY_FLAG_SPLIT_SCAN_6GHZ to support non-PSC channel
(obtained during scan on legacy bands) in 6GHz scan request.
Fixes: 50ac15a511 ("mt76: mt7921: add 6GHz support")
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f39d499345 ]
Concurrent binding/non-binding skbs could be handled anywhere which leads
to mixed byte counting, so switch to use PPDU TxS reporting regardless Tx
paths when WED is active.
Fixes: 43eaa36895 ("wifi: mt76: add PPDU based TxS support for WED device")
Co-developed-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 161a7528e4 ]
PPDU TxS can only report MPDU count whereas mac80211 requires MSDU scale
(NL80211_STA_INFO_TX_PACKETS), so switch to get MSDU counts from WA
statistic.
Note that mt7915 WA firmware only counts tx_packet for WED path, so driver
needs to take care of host path additionally.
Fixes: 43eaa36895 ("wifi: mt76: add PPDU based TxS support for WED device")
Co-developed-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c55b4e788f ]
When header translation failure is indicated, the hardware will insert
an extra 2-byte field containing the data length after the protocol
type field. This happens either when the LLC-SNAP pattern did not match,
or if a VLAN header was detected.
The previous commit accidentally breaks the logic, so reverts back.
Fixes: 27db47ab1f (wifi: mt76: mt7996: enable mesh HW amsdu/de-amsdu support)
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59b4cc439f ]
If there is a failure during kstrtobool_from_user()
rtw89_debug_priv_btc_manual_set should return a negative error code
instead of returning the count directly.
Fix this bug by returning an error code instead of a count after
a failed call of the function "kstrtobool_from_user". Moreover
I omitted the label "out" with this source code correction.
Fixes: e3ec7017f6 ("rtw89: add Realtek 802.11ax driver")
Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/tencent_1C09B99BD7DA9CAD18B00C8F0F050F540607@qq.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0393e1fe4 ]
REGCACHE_MAPLE needs to allocate memory for regmap operations.
This results in lockdep splats if used with fast_io since fast_io uses
spinlocks for locking.
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 167, name: kunit_try_catch
preempt_count: 1, expected: 0
1 lock held by kunit_try_catch/167:
#0: 838e9c10 (regmap_kunit:86:(config)->lock){....}-{2:2}, at: regmap_lock_spinlock+0x14/0x1c
irq event stamp: 146
hardirqs last enabled at (145): [<8078bfa8>] crng_make_state+0x1a0/0x294
hardirqs last disabled at (146): [<80c5f62c>] _raw_spin_lock_irqsave+0x7c/0x80
softirqs last enabled at (0): [<80110cc4>] copy_process+0x810/0x216c
softirqs last disabled at (0): [<00000000>] 0x0
CPU: 0 PID: 167 Comm: kunit_try_catch Tainted: G N 6.5.0-rc1-00028-gc4be22597a36-dirty #6
Hardware name: Generic DT based system
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x38/0x5c
dump_stack_lvl from __might_resched+0x188/0x2d0
__might_resched from __kmem_cache_alloc_node+0x1f4/0x258
__kmem_cache_alloc_node from __kmalloc+0x48/0x170
__kmalloc from regcache_maple_write+0x194/0x248
regcache_maple_write from _regmap_write+0x88/0x140
_regmap_write from regmap_write+0x44/0x68
regmap_write from basic_read_write+0x8c/0x27c
basic_read_write from kunit_generic_run_threadfn_adapter+0x1c/0x28
kunit_generic_run_threadfn_adapter from kthread+0xf8/0x120
kthread from ret_from_fork+0x14/0x3c
Exception stack(0x881a5fb0 to 0x881a5ff8)
5fa0: 00000000 00000000 00000000 00000000
5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Use map->alloc_flags instead of GFP_KERNEL for memory allocations to fix
the problem.
Fixes: f033c26de5 ("regmap: Add maple tree based register cache")
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230720172021.2617326-1-linux@roeck-us.net
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a48d2127f ]
Currently we use the normal single register write function to load the
default values into the cache, resulting in a large number of reallocations
when there are blocks of registers as we extend the memory region we are
using to store the values. Instead scan through the list of defaults for
blocks of adjacent registers and do a single allocation and insert for each
such block. No functional change.
We do not take advantage of the maple tree preallocation, this is purely at
the regcache level. It is not clear to me yet if the maple tree level would
help much here or if we'd have more overhead from overallocating and then
freeing maple tree data.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230523-regcache-maple-load-defaults-v1-1-0c04336f005d@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Stable-dep-of: b0393e1fe4 ("regmap: maple: Use alloc_flags for memory allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6755ad74aa ]
Use devm_clk_get_enabled in the pic32 driver. Ensure that the clock is
enabled as long as the driver is registered with the hwrng core.
Fixes: 7ea39973d1 ("hwrng: pic32 - Use device-managed registration API")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 039980de89 ]
The nomadik driver uses devres to register itself with the hwrng core,
the driver will be unregistered from hwrng when its device goes out of
scope. This happens after the driver's remove function is called.
However, nomadik's clock is disabled in the remove function. There's a
short timeframe where nomadik is still registered with the hwrng core
although its clock is disabled. I suppose the clock must be active to
access the hardware and serve requests from the hwrng core.
Switch to devm_clk_get_enabled and let devres disable the clock and
unregister the hwrng. This avoids the race condition.
Fixes: 3e75241be8 ("hwrng: drivers - Use device-managed registration API")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f942bdfe9 ]
The power management configuration of 4xxx devices is too aggressive
and in some conditions the device might be prematurely put to a low
power state.
Increase the idle filter value to prevent that.
In future, this will be set by firmware.
Fixes: e5745f3411 ("crypto: qat - enable power management for QAT GEN4")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33937607ef ]
We are utilizing BPF LSM to monitor BPF operations within our container
environment. When we add support for raw_tracepoint, it hits below
error.
; (const void *)attr->raw_tracepoint.name);
27: (79) r3 = *(u64 *)(r2 +0)
access beyond the end of member map_type (mend:4) in struct (anon) with off 0 size 8
It can be reproduced with below BPF prog.
SEC("lsm/bpf")
int BPF_PROG(bpf_audit, int cmd, union bpf_attr *attr, unsigned int size)
{
switch (cmd) {
case BPF_RAW_TRACEPOINT_OPEN:
bpf_printk("raw_tracepoint is %s", attr->raw_tracepoint.name);
break;
default:
break;
}
return 0;
}
The reason is that when accessing a field in a union, such as bpf_attr,
if the field is located within a nested struct that is not the first
member of the union, it can result in incorrect field verification.
union bpf_attr {
struct {
__u32 map_type; <<<< Actually it will find that field.
__u32 key_size;
__u32 value_size;
...
};
...
struct {
__u64 name; <<<< We want to verify this field.
__u32 prog_fd;
} raw_tracepoint;
};
Considering the potential deep nesting levels, finding a perfect
solution to address this issue has proven challenging. Therefore, I
propose a solution where we simply skip the verification process if the
field in question is located within a union.
Fixes: 7e3617a72d ("bpf: Add array support to btf_struct_access")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-4-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ce4dc3e4a ]
Per discussion with Alexei, the PTR_UNTRUSTED flag should not been
cleared when we start to walk a new struct, because the struct in
question may be a struct nested in a union. We should also check and set
this flag before we walk its each member, in case itself is a union.
We will clear this flag if the field is BTF_TYPE_SAFE_RCU_OR_NULL.
Fixes: 6fcd486b3a ("bpf: Refactor RCU enforcement in the verifier.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a0260dbf6 ]
realloc() and reallocarray() can either return NULL or a special
non-NULL pointer, if their size argument is zero. This requires a bit
more care to handle NULL-as-valid-result situation differently from
NULL-as-error case. This has caused real issues before ([0]), and just
recently bit again in production when performing bpf_program__attach_usdt().
This patch fixes 4 places that do or potentially could suffer from this
mishandling of NULL, including the reported USDT-related one.
There are many other places where realloc()/reallocarray() is used and
NULL is always treated as an error value, but all those have guarantees
that their size is always non-zero, so those spot don't need any extra
handling.
[0] d08ab82f59 ("libbpf: Fix double-free when linker processes empty sections")
Fixes: 999783c8bb ("libbpf: Wire up spec management and other arch-independent USDT logic")
Fixes: b63b3c490e ("libbpf: Add bpf_program__set_insns function")
Fixes: 697f104db8 ("libbpf: Support custom SEC() handlers")
Fixes: b126882672 ("libbpf: Change the order of data and text relocations.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230711024150.1566433-1-andrii@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 658ac06801 ]
Fix the following error when building bpftool:
CLANG profiler.bpf.o
CLANG pid_iter.bpf.o
skeleton/profiler.bpf.c:18:21: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_perf_event_value'
__uint(value_size, sizeof(struct bpf_perf_event_value));
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:13:39: note: expanded from macro '__uint'
tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helper_defs.h:7:8: note: forward declaration of 'struct bpf_perf_event_value'
struct bpf_perf_event_value;
^
struct bpf_perf_event_value is being used in the kernel only when
CONFIG_BPF_EVENTS is enabled, so it misses a BTF entry then.
Define struct bpf_perf_event_value___local with the
`preserve_access_index` attribute inside the pid_iter BPF prog to
allow compiling on any configs. It is a full mirror of a UAPI
structure, so is compatible both with and w/o CO-RE.
bpf_perf_event_read_value() requires a pointer of the original type,
so a cast is needed.
Fixes: 47c09d6a9f ("bpftool: Introduce "prog profile" command")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230707095425.168126-5-quentin@isovalent.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 44ba7b30e8 ]
In order to allow the BPF program in bpftool's pid_iter.bpf.c to compile
correctly on hosts where vmlinux.h does not define
BPF_LINK_TYPE_PERF_EVENT (running kernel versions lower than 5.15, for
example), define and use a local copy of the enum value. This requires
LLVM 12 or newer to build the BPF program.
Fixes: cbdaf71f7e ("bpftool: Add bpf_cookie to link output")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230707095425.168126-4-quentin@isovalent.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67a43462ee ]
When building bpftool with !CONFIG_PERF_EVENTS:
skeleton/pid_iter.bpf.c:47:14: error: incomplete definition of type 'struct bpf_perf_link'
perf_link = container_of(link, struct bpf_perf_link, link);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:74:22: note: expanded from macro 'container_of'
((type *)(__mptr - offsetof(type, member))); \
^~~~~~~~~~~~~~~~~~~~~~
tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:68:60: note: expanded from macro 'offsetof'
#define offsetof(TYPE, MEMBER) ((unsigned long)&((TYPE *)0)->MEMBER)
~~~~~~~~~~~^
skeleton/pid_iter.bpf.c:44:9: note: forward declaration of 'struct bpf_perf_link'
struct bpf_perf_link *perf_link;
^
&bpf_perf_link is being defined and used only under the ifdef.
Define struct bpf_perf_link___local with the `preserve_access_index`
attribute inside the pid_iter BPF prog to allow compiling on any
configs. CO-RE will substitute it with the real struct bpf_perf_link
accesses later on.
container_of() uses offsetof(), which does the necessary CO-RE
relocation if the field is specified with `preserve_access_index` - as
is the case for struct bpf_perf_link___local.
Fixes: cbdaf71f7e ("bpftool: Add bpf_cookie to link output")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230707095425.168126-3-quentin@isovalent.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4cbeeb0dc0 ]
When CONFIG_PERF_EVENTS is not set, struct perf_event remains empty.
However, the structure is being used by bpftool indirectly via BTF.
This leads to:
skeleton/pid_iter.bpf.c:49:30: error: no member named 'bpf_cookie' in 'struct perf_event'
return BPF_CORE_READ(event, bpf_cookie);
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
...
skeleton/pid_iter.bpf.c:49:9: error: returning 'void' from a function with incompatible result type '__u64' (aka 'unsigned long long')
return BPF_CORE_READ(event, bpf_cookie);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tools and samples can't use any CONFIG_ definitions, so the fields
used there should always be present.
Define struct perf_event___local with the `preserve_access_index`
attribute inside the pid_iter BPF prog to allow compiling on any
configs. CO-RE will substitute it with the real struct perf_event
accesses later on.
Fixes: cbdaf71f7e ("bpftool: Add bpf_cookie to link output")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230707095425.168126-2-quentin@isovalent.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c628747cc8 ]
Don't reset recorded sec_def handler unconditionally on
bpf_program__set_type(). There are two situations where this is wrong.
First, if the program type didn't actually change. In that case original
SEC handler should work just fine.
Second, catch-all custom SEC handler is supposed to work with any BPF
program type and SEC() annotation, so it also doesn't make sense to
reset that.
This patch fixes both issues. This was reported recently in the context
of breaking perf tool, which uses custom catch-all handler for fancy BPF
prologue generation logic. This patch should fix the issue.
[0] https://lore.kernel.org/linux-perf-users/ab865e6d-06c5-078e-e404-7f90686db50d@amd.com/
Fixes: d6e6286a12 ("libbpf: disassociate section handler on explicit bpf_program__set_type() call")
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230707231156.1711948-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17e8e5d6e0 ]
Alexei reported:
After fast forwarding bpf-next today bpf_nf test started to fail when
run twice:
$ ./test_progs -t bpf_nf
#17 bpf_nf:OK
Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED
$ ./test_progs -t bpf_nf
All error logs:
test_bpf_nf_ct:PASS:test_bpf_nf__open_and_load 0 nsec
test_bpf_nf_ct:PASS:iptables-legacy -t raw -A PREROUTING -j CONNMARK
--set-mark 42/0 0 nsec
(network_helpers.c:102: errno: Address already in use) Failed to bind socket
test_bpf_nf_ct:FAIL:start_server unexpected start_server: actual -1 < expected 0
#17/1 bpf_nf/xdp-ct:FAIL
test_bpf_nf_ct:PASS:test_bpf_nf__open_and_load 0 nsec
test_bpf_nf_ct:PASS:iptables-legacy -t raw -A PREROUTING -j CONNMARK
--set-mark 42/0 0 nsec
(network_helpers.c:102: errno: Address already in use) Failed to bind socket
test_bpf_nf_ct:FAIL:start_server unexpected start_server: actual -1 < expected 0
#17/2 bpf_nf/tc-bpf-ct:FAIL
#17 bpf_nf:FAIL
Summary: 0/8 PASSED, 0 SKIPPED, 1 FAILED
I was able to locally reproduce as well. Rearrange the connection teardown
so that the client closes its connection first so that we don't need to
linger in TCP time-wait.
Fixes: e81fbd4c1b ("selftests/bpf: Add existing connection bpf_*_ct_lookup() test")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/CAADnVQ+0dnDq_v_vH1EfkacbfGnHANaon7zsw10pMb-D9FS0Pw@mail.gmail.com
Link: https://lore.kernel.org/bpf/20230626131942.5100-1-daniel@iogearbox.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de0e85b29e ]
Add exit hook and remove OPP table when the device gets unregistered.
This will fix the error messages when the CPU FREQ driver module is
removed and then re-inserted. It also fixes these messages while
onlining the first CPU from a policy whose all CPU's were previously
offlined.
debugfs: File 'cpu5' in directory 'opp' already present!
debugfs: File 'cpu6' in directory 'opp' already present!
debugfs: File 'cpu7' in directory 'opp' already present!
Fixes: f41e1442ac ("cpufreq: tegra194: add OPP support and set bandwidth")
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
[ Viresh: Dropped irrelevant change from it ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03997da042 ]
Since the 'cpus' field of policy structure will become empty in the
cpufreq core API, it is better to use 'related_cpus' in the exit()
callback of driver.
Fixes: c3274763bf ("cpufreq: powernow-k8: Initialize per-cpu data-structures properly")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b94da9255 ]
preserve_pci_rom_image() was accessing the romsize field in
efi_pci_io_protocol_t directly instead of using the efi_table_attr()
helper. This prevents the ROM image from being saved correctly during a
mixed mode boot.
Fixes: 2c3625cb9f ("efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function")
Signed-off-by: Mikel Rychliski <mikel@mikelr.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d6e5e8268 ]
In amd-pstate-ut, shared memory-based systems call
get_shared_mem() as part of amd_pstate_ut_check_enabled()
function. This function was written when CONFIG_X86_AMD_PSTATE
was tristate config and amd_pstate can be built as a module.
Currently CONFIG_X86_AMD_PSTATE is a boolean config and module
parameter shared_mem is removed. But amd-pstate-ut code still
accesses this module parameter. Remove those accesses.
Fixes: 456ca88d8a ("cpufreq: amd-pstate: change amd-pstate driver to be built-in type")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f96801f0cf ]
If of_parse_phandle_with_args() called from __thermal_of_bind() or
__thermal_of_unbind() fails, cooling_spec.np will not be initialized,
so move the of_node_put() calls below the respective return value checks
to avoid dereferencing an uninitialized pointer.
Fixes: 3fd6d6e2b4 ("thermal/of: Rework the thermal device tree initialization")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9cc8cd086f ]
The constraints table should be resetting the `list` object
after running through all of `info_obj` iterations.
This adjusts whitespace as well as less code will now be included
with each loop. This fixes a functional problem is fixed where a
badly formed package in the inner loop may have incorrect data.
Fixes: 146f1ed852 ("ACPI: PM: s2idle: Add AMD support to handle _DSM")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c6b1212d2 ]
When code uses a pre-increment it makes the reader question "why".
In the constraint fetching code there is no reason for the variables
to be pre-incremented so adjust to post-increment.
No intended functional changes.
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Stable-dep-of: 9cc8cd086f ("ACPI: x86: s2idle: Fix a logic error parsing AMD constraints table")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cba33db3fc ]
Commit 'fa6999e326fe ("s390/pkey: support CCA and EP11 secure ECC
private keys")' introduced PKEY_TYPE_EP11_AES securekey blobs as a
supplement to the PKEY_TYPE_EP11 (which won't work in environments
with session-bound keys). This new keyblobs has a different maximum
size, so fix paes crypto module to accept also these larger keyblobs.
Fixes: fa6999e326 ("s390/pkey: support CCA and EP11 secure ECC private keys")
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9352e4b9b ]
Commit 'fa6999e326fe ("s390/pkey: support CCA and EP11 secure ECC
private keys")' introduced a new PKEY_TYPE_EP11_AES securekey type as
a supplement to the existing PKEY_TYPE_EP11 (which won't work in
environments with session-bound keys). The pkey EP11 securekey
attributes use PKEY_TYPE_EP11_AES (instead of PKEY_TYPE_EP11)
keyblobs, to make the generated keyblobs usable also in environments,
where session-bound keys are required.
There should be no negative impacts to userspace because the internal
structure of the keyblobs is opaque. The increased size of the
generated keyblobs is reflected by the changed size of the attributes.
Fixes: fa6999e326 ("s390/pkey: support CCA and EP11 secure ECC private keys")
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fb249ce7f7 ]
Commit 'fa6999e326fe ("s390/pkey: support CCA and EP11 secure ECC
private keys")' introduced PKEY_TYPE_EP11_AES for the PKEY_GENSECK2
IOCTL, to enable userspace to generate securekey blobs of this
type. Unfortunately, all PKEY_GENSECK2 IOCTL requests for
PKEY_TYPE_EP11_AES return with an error (-EINVAL). Fix the handling
for PKEY_TYPE_EP11_AES in PKEY_GENSECK2 IOCTL, so that userspace can
generate securekey blobs of this type.
The start of the header and the keyblob, as well as the length need
special handling, depending on the internal keyversion. Add a helper
function that splits an uninitialized buffer into start and size of
the header as well as start and size of the payload, depending on the
requested keyversion.
Do the header-related calculations and the raw genkey request handling
in separate functions. Use the raw genkey request function for
internal purposes.
Fixes: fa6999e326 ("s390/pkey: support CCA and EP11 secure ECC private keys")
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 37a08f010b ]
Commit 'fa6999e326fe ("s390/pkey: support CCA and EP11 secure ECC
private keys")' introduced PKEY_TYPE_EP11_AES as a supplement to
PKEY_TYPE_EP11. All pkeys have an internal header/payload structure,
which is opaque to the userspace. The header structures for
PKEY_TYPE_EP11 and PKEY_TYPE_EP11_AES are nearly identical and there
is no reason, why different structures are used. In preparation to fix
the keyversion handling in the broken PKEY IOCTLs, the same header
structure is used for PKEY_TYPE_EP11 and PKEY_TYPE_EP11_AES. This
reduces the number of different code paths and increases the
readability.
Fixes: fa6999e326 ("s390/pkey: support CCA and EP11 secure ECC private keys")
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fbf4dec702 ]
Observed occassional failures in the futex_wait_timeout test:
ok 1 futex_wait relative succeeds
ok 2 futex_wait_bitset realtime succeeds
ok 3 futex_wait_bitset monotonic succeeds
ok 4 futex_wait_requeue_pi realtime succeeds
ok 5 futex_wait_requeue_pi monotonic succeeds
not ok 6 futex_lock_pi realtime returned 0
......
The test expects the child thread to complete some steps before
the parent thread gets to run. There is an implicit expectation
of the order of invocation of futex_lock_pi between the child thread
and the parent thread. Make this order explicit. If the order is
not met, the futex_lock_pi call in the parent thread succeeds and
will not timeout.
Fixes: f4addd54b1 ("selftests: futex: Expand timeout test")
Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4e2bd91dd ]
In current driver, counter0 will be enabled after ddr_perf_pmu_enable()
is called even though none of the 4 counters are used. This will cause
counter0 continue to count until ddr_perf_pmu_disabled() is called. If
pmu is not disabled all the time, the pmu interrupt will be asserted
from time to time due to counter0 will overflow and irq handler will
clear it. It's not an expected behavior. This patch will not enable
counter0 if none of 4 counters are used.
Fixes: 9a66d36cc7 ("drivers/perf: imx_ddr: Add DDR performance counter support to perf")
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20230811015438.1999307-2-xu.yang_2@nxp.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7fcb99877 ]
There is a 10% rounding error in the intial value of the
sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the
sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and
fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it
introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication
and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed8 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds")
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Petr Vorel <pvorel@suse.cz>
Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01948b09ed ]
For both SVE and SME we abuse the generic register field comparison
support in the cpufeature code as part of our detection of unsupported
variations in the vector lengths available to PEs, reporting the maximum
vector lengths via ZCR_EL1.LEN and SMCR_EL1.LEN. Since these are
configuration registers rather than identification registers the
assumptions the cpufeature code makes about how unknown bitfields behave
are invalid, leading to warnings when SME features like FA64 are enabled
and we hotplug a CPU:
CPU features: SANITY CHECK: Unexpected variation in SYS_SMCR_EL1. Boot CPU: 0x0000000000000f, CPU3: 0x0000008000000f
CPU features: Unsupported CPU feature variation detected.
SVE has no controls other than the vector length so is not yet impacted
but the same issue will apply there if any are defined.
Since the only field we are interested in having the cpufeature code
handle is the length field and we use a custom read function to obtain
the value we can avoid these warnings by filtering out all other bits
when we return the register value, if we're doing that we don't need to
bother reading the register at all and can simply use the RDVL/RDSVL
value we were filling in instead.
Fixes: 2e0f2478ea ("arm64/sve: Probe SVE capabilities and usable vector lengths")
FixeS: b42990d3bf ("arm64/sme: Identify supported SME vector lengths at boot")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230731-arm64-sme-fa64-hotplug-v2-1-7714c00dd902@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 264b82fdb4 ]
The 4-to-5 level mode switch trampoline disables long mode and paging in
order to be able to flick the LA57 bit. According to section 3.4.1.1 of
the x86 architecture manual [0], 64-bit GPRs might not retain the upper
32 bits of their contents across such a mode switch.
Given that RBP, RBX and RSI are live at this point, preserve them on the
stack, along with the return address that might be above 4G as well.
[0] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture
"Because the upper 32 bits of 64-bit general-purpose registers are
undefined in 32-bit modes, the upper 32 bits of any general-purpose
register are not preserved when switching from 64-bit mode to a 32-bit
mode (to protected mode or compatibility mode). Software must not
depend on these bits to maintain a value after a 64-bit to 32-bit
mode switch."
Fixes: 194a9749c7 ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230807162720.545787-2-ardb@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f0b0966b3 ]
The TEO governor takes CPU utilization into account by refining idle state
selection when the utilization is above a certain threshold. This is done by
choosing an idle state shallower than the previously selected one.
However, when doing this, the idle duration estimate needs to be
adjusted so as to prevent the scheduler tick from being stopped when the
candidate idle state is shallow, which may lead to excessive energy
usage if the CPU is not woken up quickly enough going forward.
Moreover, if the scheduler tick has been stopped already and the new
idle duration estimate is too small, the replacement candidate state
cannot be used.
Modify the relevant code to take the above observations into account.
Fixes: 9ce0f7c4bc ("cpuidle: teo: Introduce util-awareness")
Link: https://lore.kernel.org/linux-pm/CAJZ5v0jJxHj65r2HXBTd3wfbZtsg=_StzwO1kA5STDnaPe_dWA@mail.gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98dfdd9ee9 ]
Users of KERNFS should select it to enforce its being built, so
do this to prevent a build error.
In file included from ../kernel/sched/build_utility.c:97:
../kernel/sched/psi.c: In function 'psi_trigger_poll':
../kernel/sched/psi.c:1479:17: error: implicit declaration of function 'kernfs_generic_poll' [-Werror=implicit-function-declaration]
1479 | kernfs_generic_poll(t->of, wait);
Fixes: aff037078e ("sched/psi: use kernfs polling functions for PSI trigger polling")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Link: lore.kernel.org/r/202307310732.r65EQFY0-lkp@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51a0c3b7f0 ]
Perf event fd (fd_lm) is not closed when run_fill_buf() returns error.
Close fd_lm only in cat_val() to make it easier to track it is always
closed.
Fixes: 790bf585b0 ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f99e413eb5 ]
A child calls PARENT_EXIT() when it fails to run a benchmark to kill
the parent process. PARENT_EXIT() lacks unmount for the resctrl FS and
the parent won't be there to unmount it either after it gets killed.
Add the resctrl FS unmount also to PARENT_EXIT().
Fixes: 591a6e8588 ("selftests/resctrl: Add basic resctrl file system operations and data")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d320b1029 ]
The error path in fill_cache() does return before the allocated buffer
is freed leaking the buffer.
The leak was introduced when fill_cache_read() started to return errors
in commit c7b607fa93 ("selftests/resctrl: Fix null pointer
dereference on open failed"), before that both fill functions always
returned 0.
Move free() earlier to prevent the mem leak.
Fixes: c7b607fa93 ("selftests/resctrl: Fix null pointer dereference on open failed")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d920920f85 ]
If dev_pm_domain_attach_by_name() returns NULL, then 0 will be passed to
PTR_ERR() as reported by the smatch warning below:
drivers/opp/core.c:2456 _opp_attach_genpd() warn: passing zero to 'PTR_ERR'
Fix it by checking for the non-NULL virt_dev pointer before passing it to
PTR_ERR. Otherwise return -ENODEV.
Fixes: 4ea9496cbc ("opp: Fix error check in dev_pm_opp_attach_genpd()")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e767d6850 ]
Powercap zones can be defined as arranged in a hierarchy of trees and when
registering a zone with powercap_register_zone(), the kernel powercap
subsystem expects this to happen starting from the root zones down to the
leaves; on the other side, de-registration by powercap_deregister_zone()
must begin from the leaf zones.
Available SCMI powercap zones are retrieved dynamically from the platform
at probe time and, while any defined hierarchy between the zones is
described properly in the zones descriptor, the platform returns the
availables zones with no particular well-defined order: as a consequence,
the trees possibly composing the hierarchy of zones have to be somehow
walked properly to register the retrieved zones from the root.
Currently the ARM SCMI Powercap driver walks the zones using a recursive
algorithm; this approach, even though correct and tested can lead to kernel
stack overflow when processing a returned hierarchy of zones composed by
particularly high trees.
Avoid possible kernel stack overflow by substituting the recursive approach
with an iterative one supported by a dynamically allocated stack-like data
structure.
Fixes: b55eef5226 ("powercap: arm_scmi: Add SCMI Powercap based driver")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e40806e9bc ]
The nanosecond-to-millisecond skew computation uses unsigned arithmetic,
which produces user-unfriendly large positive numbers for negative skews.
Therefore, use signed arithmetic for this computation in order to preserve
the negativity.
Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Reported-by: Feng Tang <feng.tang@intel.com>
Fixes: dd02926994 ("clocksource: Improve "skew is too large" messages")
Reviewed-by: Feng Tang <feng.tang@intel.com>
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5063e8948 ]
Running the refscale test occasionally crashes the kernel with the
following error:
[ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
[ 8569.952900] #PF: supervisor read access in kernel mode
[ 8569.952902] #PF: error_code(0x0000) - not-present page
[ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
[ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
[ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
:
[ 8569.952940] Call Trace:
[ 8569.952941] <TASK>
[ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale]
[ 8569.952959] kthread+0x10e/0x130
[ 8569.952966] ret_from_fork+0x1f/0x30
[ 8569.952973] </TASK>
The likely cause is that init_waitqueue_head() is called after the call to
the torture_create_kthread() function that creates the ref_scale_reader
kthread. Although this init_waitqueue_head() call will very likely
complete before this kthread is created and starts running, it is
possible that the calling kthread will be delayed between the calls to
torture_create_kthread() and init_waitqueue_head(). In this case, the
new kthread will use the waitqueue head before it is properly initialized,
which is not good for the kernel's health and well-being.
The above crash happened here:
static inline void __add_wait_queue(...)
{
:
if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here
The offset of flags from list_head entry in wait_queue_entry is
-0x18. If reader_tasks[i].wq.head.next is NULL as allocated reader_task
structure is zero initialized, the instruction will try to access address
0xffffffffffffffe8, which is exactly the fault address listed above.
This commit therefore invokes init_waitqueue_head() before creating
the kthread.
Fixes: 653ed64b01 ("refperf: Add a test to measure performance of read-side synchronization")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0200679fc7 ]
A while ago we received the following report:
"The other outstanding issue I noticed comes from the fact that
fsconfig syscalls may occur in a different userns than that which
called fsopen. That means that resolving the uid/gid via
current_user_ns() can save a kuid that isn't mapped in the associated
namespace when the filesystem is finally mounted. This means that it
is possible for an unprivileged user to create files owned by any
group in a tmpfs mount (since we can set the SUID bit on the tmpfs
directory), or a tmpfs that is owned by any user, including the root
group/user."
The contract for {g,u}id mount options and {g,u}id values in general set
from userspace has always been that they are translated according to the
caller's idmapping. In so far, tmpfs has been doing the correct thing.
But since tmpfs is mountable in unprivileged contexts it is also
necessary to verify that the resulting {k,g}uid is representable in the
namespace of the superblock to avoid such bugs as above.
The new mount api's cross-namespace delegation abilities are already
widely used. After having talked to a bunch of userspace this is the
most faithful solution with minimal regression risks. I know of one
users - systemd - that makes use of the new mount api in this way and
they don't set unresolable {g,u}ids. So the regression risk is minimal.
Link: https://lore.kernel.org/lkml/CALxfFW4BXhEwxR0Q5LSkg-8Vb4r2MONKCcUCVioehXQKr35eHg@mail.gmail.com
Fixes: f32356261d ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org>
Reported-by: Seth Jenkins <sethjenkins@google.com>
Message-Id: <20230801-vfs-fs_context-uidgid-v1-1-daf46a050bbf@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a221ab717c ]
We do not need to release the iomap_page in iomap_invalidate_folio()
to allow the folio to be split. The splitting code will call
->release_folio() if there is still per-fs private data attached to
the folio. At that point, we will check if the folio is still dirty
and decline to release the iomap_page. It is possible to trigger the
warning in perfectly legitimate circumstances (eg if a disk read fails,
we do a partial write to the folio, then we truncate the folio), which
will cause those writes to be lost.
Fixes: 60d8231089 ("iomap: Support large folios in invalidatepage")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d5a4f8f77 ]
The d_hash_and_lookup() function returns error pointers or NULL.
Most incorrect error checks were fixed, but the one in int path_pts()
was forgotten.
Fixes: eedf265aa0 ("devpts: Make each mount of devpts an independent filesystem.")
Signed-off-by: Wang Ming <machel@vivo.com>
Message-Id: <20230713120555.7025-1-machel@vivo.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 758b492047 ]
For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
An underflow can happen with EFD_SEMAPHORE eventfds in at least the
following three subsystems:
(1) virt/kvm/eventfd.c
(2) drivers/vfio/virqfd.c
(3) drivers/virt/acrn/irqfd.c
where (2) and (3) are just modeled after (1). An eventfd must be
specified for use with the KVM_IRQFD ioctl(). This can also be an
EFD_SEMAPHORE eventfd. When the eventfd count is zero or has been
decremented to zero an underflow can be triggered when the irqfd is shut
down by raising the KVM_IRQFD_FLAG_DEASSIGN flag in the KVM_IRQFD
ioctl():
// ctx->count == 0
kvm_vm_ioctl()
-> kvm_irqfd()
-> kvm_irqfd_deassign()
-> irqfd_deactivate()
-> irqfd_shutdown()
-> eventfd_ctx_remove_wait_queue(&cnt)
-> eventfd_ctx_do_read(&cnt)
Userspace polling on the eventfd wouldn't notice the underflow because 1
is always returned as the value from eventfd_read() while ctx->count
would've underflowed. It's not a huge deal because this should only be
happening when the irqfd is shutdown but we should still fix it and
avoid the spurious wakeup.
Fixes: cb289d6244 ("eventfd - allow atomic read and waitqueue remove")
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dylan Yudaken <dylany@fb.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <tencent_7588DFD1F365950A757310D764517A14B306@qq.com>
[brauner: rewrite commit message and add explanation how this underflow can happen]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit ebfde1584d upstream.
After commit 4fb8e46c1b ("PCI: tegra194: Enable support for 256 Byte
payload"), we initialize MPS=256 for tegra194 Root Ports before enumerating
the hierarchy.
Consider an Endpoint that supports only MPS=128. In the default situation
(CONFIG_PCIE_BUS_DEFAULT set and no "pci=pcie_bus_*" parameter), Linux
tries to configure the MPS of every device to match the upstream bridge.
If the Endpoint is directly below the Root Port, Linux can reduce the Root
Port MPS to 128 to match the Endpoint. But if there's a switch in the
middle, Linux doesn't reduce the Root Port MPS because other devices below
the switch may already be configured with MPS larger than 128.
This scenario results in uncorrectable Malformed TLP errors if the Root
Port sends TLPs with payloads larger than 128 bytes. These errors can
be avoided by using the "pci=pcie_bus_safe" parameter, but it doesn't
seem to be a good idea to always have this parameter even for basic
functionality to work.
Revert commit 4fb8e46c1b ("PCI: tegra194: Enable support for 256 Byte
payload") so the Root Ports default to MPS=128, which all devices
support.
If peer-to-peer DMA is not required, one can use "pci=pcie_bus_perf" to
get the benefit of larger MPS settings.
[bhelgaas: commit log; kwilczynski: retain "u16 val_16" declaration at
the top, add missing acked by tag]
Fixes: 4fb8e46c1b ("PCI: tegra194: Enable support for 256 Byte payload")
Link: https://lore.kernel.org/linux-pci/20230619102604.3735001-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Cc: stable@vger.kernel.org # v6.0-rc1+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91ec6c8559 upstream.
This reverts commit 5a8bee63b1.
Jürg Billeter reports the following regression:
Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
someone wants the return code") `fput()` is called asynchronously if a
file is closed as part of a process exiting, i.e., if there was no
explicit `close()` before exit.
If the file was open for writing, also `put_write_access()` is called
asynchronously as part of the async `fput()`.
If that newly written file is an executable, attempting to `execve()` the
new file can fail with `ETXTBSY` if it's called after the writer process
exited but before the async `fput()` has run.
Reported-and-tested-by: "Jürg Billeter" <j@bitron.ch>
Cc: <stable@vger.kernel.org> # v6.3
Link: https://lore.kernel.org/all/4f66cded234462964899f2a661750d6798a57ec0.camel@bitron.ch/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb09074bdb upstream.
The touchpad of this device is both connected via PS/2 and i2c. This causes
strange behavior when both driver fight for control. The easy fix is to
prevent the PS/2 driver from accessing the mouse port as the full feature
set of the touchpad is only supported in the i2c interface anyway.
The strange behavior in this case is, that when an external screen is
connected and the notebook is closed, the pointer on the external screen is
moving to the lower right corner. When the notebook is opened again, this
movement stops, but the touchpad clicks are unresponsive afterwards until
reboot.
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230607173331.851192-1-wse@tuxedocomputers.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7c0cad0dc upstream.
We should be checking to see if async flips are supported in
amdgpu_dm_atomic_check() (i.e. not dm_crtc_helper_atomic_check()). Also,
async flipping isn't supported if a plane's framebuffer changes memory
domains during an atomic commit. So, move the check from
dm_crtc_helper_atomic_check() to amdgpu_dm_atomic_check() and check if
the memory domain has changed in amdgpu_dm_atomic_check().
Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2733
Fixes: c1e18c44dc ("drm/amd/display: only accept async flips for fast updates")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0bdf399342 upstream.
BPF programs that run on connect can rewrite the connect address. For
the connect system call this isn't a problem, because a copy of the address
is made when it is moved into kernel space. However, kernel_connect
simply passes through the address it is given, so the caller may observe
its address value unexpectedly change.
A practical example where this is problematic is where NFS is combined
with a system such as Cilium which implements BPF-based load balancing.
A common pattern in software-defined storage systems is to have an NFS
mount that connects to a persistent virtual IP which in turn maps to an
ephemeral server IP. This is usually done to achieve high availability:
if your server goes down you can quickly spin up a replacement and remap
the virtual IP to that endpoint. With BPF-based load balancing, mounts
will forget the virtual IP address when the address rewrite occurs
because a pointer to the only copy of that address is passed down the
stack. Server failover then breaks, because clients have forgotten the
virtual IP address. Reconnects fail and mounts remain broken. This patch
was tested by setting up a scenario like this and ensuring that NFS
reconnects worked after applying the patch.
Signed-off-by: Jordan Rife <jrife@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f7f35e5aa upstream.
The vendor check introduced by commit 554b841d47 ("tpm: Disable RNG for
all AMD fTPMs") doesn't work properly on a number of Intel fTPMs. On the
reported systems the TPM doesn't reply at bootup and returns back the
command code. This makes the TPM fail probe on Lenovo Legion Y540 laptop.
Since only Microsoft Pluton is the only known combination of AMD CPU and
fTPM from other vendor, disable hwrng otherwise. In order to make sysadmin
aware of this, print also info message to the klog.
Cc: stable@vger.kernel.org
Fixes: 554b841d47 ("tpm: Disable RNG for all AMD fTPMs")
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804
Reported-by: Patrick Steinhardt <ps@pks.im>
Reported-by: Raymond Jay Golo <rjgolo@gmail.com>
Reported-by: Ronan Pigott <ronan@rjp.ie>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d51847acb0 upstream.
The intel_pstate CPU frequency scaling driver does not
use policy->cur and it is 0.
When the CPU frequency is outdated arch_freq_get_on_cpu()
will default to the nominal clock frequency when its call to
cpufreq_quick_getpolicy_cur returns the never updated 0.
Thus, the listed frequency might be outside of currently
set limits. Some users are complaining about the high
reported frequency, albeit stale, when their system is
idle and/or it is above the reduced maximum they have set.
This patch will maintain policy_cur for the intel_pstate
driver at the current minimum CPU frequency.
Reported-by: Yang Jie <yang.jie@linux.intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217597
Signed-off-by: Doug Smythies <dsmythies@telus.net>
[ rjw: White space damage fixes and comment adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 106397376c upstream.
Current code supposes that it is enough to provide forward progress by
just waking up one wait queue after one completion batch is done.
Unfortunately this way isn't enough, cause waiter can be added to wait
queue just after it is woken up.
Follows one example(64 depth, wake_batch is 8)
1) all 64 tags are active
2) in each wait queue, there is only one single waiter
3) each time one completion batch(8 completions) wakes up just one
waiter in each wait queue, then immediately one new sleeper is added
to this wait queue
4) after 64 completions, 8 waiters are wakeup, and there are still 8
waiters in each wait queue
5) after another 8 active tags are completed, only one waiter can be
wakeup, and the other 7 can't be waken up anymore.
Turns out it isn't easy to fix this problem, so simply wakeup enough
waiters for single batch.
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20230721095715.232728-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c2489bb7e6 ]
There is race issue when concurrently splice_read main trace_pipe and
per_cpu trace_pipes which will result in data read out being different
from what actually writen.
As suggested by Steven:
> I believe we should add a ref count to trace_pipe and the per_cpu
> trace_pipes, where if they are opened, nothing else can read it.
>
> Opening trace_pipe locks all per_cpu ref counts, if any of them are
> open, then the trace_pipe open will fail (and releases any ref counts
> it had taken).
>
> Opening a per_cpu trace_pipe will up the ref count for just that
> CPU buffer. This will allow multiple tasks to read different per_cpu
> trace_pipe files, but will prevent the main trace_pipe file from
> being opened.
But because we only need to know whether per_cpu trace_pipe is open or
not, using a cpumask instead of using ref count may be easier.
After this patch, users will find that:
- Main trace_pipe can be opened by only one user, and if it is
opened, all per_cpu trace_pipes cannot be opened;
- Per_cpu trace_pipes can be opened by multiple users, but each per_cpu
trace_pipe can only be opened by one user. And if one of them is
opened, main trace_pipe cannot be opened.
Link: https://lore.kernel.org/linux-trace-kernel/20230818022645.1948314-1-zhengyejian1@huawei.com
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db1a6ad77c ]
Handle extended compliance code 0x1 (SFF8024_ECC_100G_25GAUI_C2M_AOC)
for active optical cables supporting 25G and 100G speeds.
Since the specification makes no statement about transmitter range, and
as the specific sfp module that had been tested features only 2m fiber -
short-range (SR) modes are selected.
The 100G speed is irrelevant because it would require multiple fibers /
multiple SFP28 modules combined under one netdev.
sfp-bus.c only handles a single module per netdev, so only 25Gbps modes
are selected.
sfp_parse_support already handles SFF8024_ECC_100GBASE_SR4_25GBASE_SR
with compatible properties, however that entry is a contradiction in
itself since with SFP(28) 100GBASE_SR4 is impossible - that would likely
be a mode for qsfp modules only.
Add a case for SFF8024_ECC_100G_25GAUI_C2M_AOC selecting 25gbase-r
interface mode and 25000baseSR link mode.
Also enforce SFP28 bitrate limits on the values read from sfp eeprom as
requested by Russell King.
Tested with fs.com S28-AO02 AOC SFP28 module.
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3386fb86ec ]
After we remove a GPIO chip that still has some requested descriptors,
gpiod_free_commit() will fail and we will never put the references to the
GPIO device and the owning module in gpiod_free().
Rework this function to:
- not warn on desc == NULL as this is a use-case on which most free
functions silently return
- put the references to desc->gdev and desc->gdev->owner unconditionally
so that the release callback actually gets called when the remaining
references are dropped by external GPIO users
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9944d203fa ]
Return result of b44_writephy() instead of zero to
deal with possible error.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0650d5098f ]
Since platform_get_irq() never returned zero, so it need not to check
whether it returned zero, and we use the return error code of
platform_get_irq() to replace the current return error code.
Please refer to the commit a85a6c86c2 ("driver core: platform: Clarify
that IRQ 0 is invalid") to get that platform_get_irq() never returned
zero.
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b610c4bbd1 ]
On MX8X platforms, the default clock rate is 0 if without explicit
clock setting in dts nodes. I2c can't work when i2c peripheral clk
rate is 0.
Add a i2c peripheral clk rate check before configuring the clock
register. When i2c peripheral clk rate is 0 and directly return
-EINVAL.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Acked-by: Dong Aisheng <Aisheng.dong@nxp.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f55484fd7b ]
If we repeatedly fail to fake offline memory to unplug it, we won't be
sending any unplug requests to the device. However, we only check if the
config changed when sending such (un)plug requests.
We could end up trying for a long time to unplug memory, even though
the config changed already and we're not supposed to unplug memory
anymore. For example, the hypervisor might detect a low-memory situation
while unplugging memory and decide to replug some memory. Continuing
trying to unplug memory in that case can be problematic.
So let's check on a more regular basis.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230713145551.2824980-5-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a31648fd4f ]
In case offline_and_remove_memory() fails in SBM, we leave a completely
unplugged Linux memory block stick around until we try plugging memory
again. We won't try removing that memory block again.
offline_and_remove_memory() may, for example, fail if we're racing with
another alloc_contig_range() user, if allocating temporary memory fails,
or if some memory notifier rejected the offlining request.
Let's handle that case better, by simple retrying to offline and remove
such memory.
Tested using CONFIG_MEMORY_NOTIFIER_ERROR_INJECT.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230713145551.2824980-4-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ddf4098514 ]
Just like we do with alloc_contig_range(), let's convert all unknown
errors to -EBUSY, but WARN so we can look into the issue. For example,
offline_pages() could fail with -EINTR, which would be unexpected in our
case.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230713145551.2824980-3-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f504e15b94 ]
When "unsafe unplug" is enabled, we don't fake-offline all memory ahead of
actual memory offlining using alloc_contig_range(). Instead, we rely on
offline_pages() to also perform actual page migration, which might fail
or take a very long time.
In that case, it's possible to easily run into endless loops that cannot be
aborted anymore (as offlining is triggered by a workqueue then): For
example, a single (accidentally) permanently unmovable page in
ZONE_MOVABLE results in an endless loop. For ZONE_NORMAL, races between
isolating the pageblock (and checking for unmovable pages) and
concurrent page allocation are possible and similarly result in endless
loops.
The idea of the unsafe unplug mode was to make it possible to more
reliably unplug large memory blocks. However, (a) we really should be
tackling that differently, by extending the alloc_contig_range()-based
mechanism; and (b) this mode is not the default and as far as I know,
it's unused either way.
So let's simply get rid of it.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230713145551.2824980-2-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ced58bfa1 ]
The linux block layer requires bios/requests to have lengths with a 512
byte alignment. Some drivers/layers like dm-crypt and the directi IO code
will test for it and just fail. Other drivers like SCSI just assume the
requirement is met and will end up in infinte retry loops. The problem
for drivers like SCSI is that it uses functions like blk_rq_cur_sectors
and blk_rq_sectors which divide the request's length by 512. If there's
lefovers then it just gets dropped. But other code in the block/scsi
layer may use blk_rq_bytes/blk_rq_cur_bytes and end up thinking there is
still data left and try to retry the cmd. We can then end up getting
stuck in retry loops where part of the block/scsi thinks there is data
left, but other parts think we want to do IOs of zero length.
Linux will always check for alignment, but windows will not. When
vhost-scsi then translates the iovec it gets from a windows guest to a
scatterlist, we can end up with sg items where the sg->length is not
divisible by 512 due to the misaligned offset:
sg[0].offset = 255;
sg[0].length = 3841;
sg...
sg[N].offset = 0;
sg[N].length = 255;
When the lio backends then convert the SG to bios or other iovecs, we
end up sending them with the same misaligned values and can hit the
issues above.
This just has us drop down to allocating a temp page and copying the data
when we detect a misaligned buffer and the IO is large enough that it
will get split into multiple bad IOs.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230709202859.138387-2-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d6f7e3938 ]
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning: GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE.
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230707063335.13317-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8183bb7e29 ]
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning: GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE.
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230707063335.13317-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3de41ee5f ]
On PSP v13.x ASICs, boot loader will set only the MSB to 1 and clear the
least significant bits for any command submission. Hence match against
the exact register value, otherwise a register value of all 0xFFs also
could falsely indicate that boot loader is ready. Also, from PSP v13.0.6
and newer, bits[7:0] will be used to indicate command error status.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06f2ab86a5 ]
If cfg80211 is providing extraie's for a scanning process then ath12k will
copy that over to the firmware. The extraie.len is a 32 bit value in struct
element_info and describes the amount of bytes for the vendor information
elements.
The problem is the allocation of the buffer. It has to align the TLV
sections by 4 bytes. But the code was using an u8 to store the newly
calculated length of this section (with alignment). And the new
calculated length was then used to allocate the skbuff. But the actual
code to copy in the data is using the extraie.len and not the calculated
"aligned" length.
The length of extraie with IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS enabled
was 264 bytes during tests with a wifi card. But it only allocated 8
bytes (264 bytes % 256) for it. As consequence, the code to memcpy the
extraie into the skb was then just overwriting data after skb->end. Things
like shinfo were therefore corrupted. This could usually be seen by a crash
in skb_zcopy_clear which tried to call a ubuf_info callback (using a bogus
address).
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Link: https://lore.kernel.org/r/20230809081241.32765-1-quic_wgong@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd64f80587 ]
As &qedi_percpu->p_work_lock is acquired by hard IRQ qedi_msix_handler(),
other acquisitions of the same lock under process context should disable
IRQ, otherwise deadlock could happen if the IRQ preempts the execution
while the lock is held in process context on the same CPU.
qedi_cpu_offline() is one such function which acquires the lock in process
context.
[Deadlock Scenario]
qedi_cpu_offline()
->spin_lock(&p->p_work_lock)
<irq>
->qedi_msix_handler()
->edi_process_completions()
->spin_lock_irqsave(&p->p_work_lock, flags); (deadlock here)
This flaw was found by an experimental static analysis tool I am developing
for IRQ-related deadlocks.
The tentative patch fix the potential deadlock by spin_lock_irqsave()
under process context.
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230726125655.4197-1-dg573847474@gmail.com
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b1e213a9e3 ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: dmaengine@vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e2d0c3365 ]
Hyper-V can run VMs at different privilege "levels" known as Virtual
Trust Levels (VTL). Sometimes, it chooses to run two different VMs
at different levels but they share some of their address space. In
such setups VTL2 (higher level VM) has visibility of all of the
VTL0 (level 0) memory space.
When the CONFIG_X86_MPPARSE is enabled for VTL2, the VTL2 kernel
performs a search within the low memory to locate MP tables. However,
in systems where VTL0 manages the low memory and may contain valid
tables, this scanning can result in incorrect MP table information
being provided to the VTL2 kernel, mistakenly considering VTL0's MP
table as its own
Add noop functions to avoid MP parse scan by VTL2.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/1687537688-5397-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 86582e6189 ]
On a powermac platform, via the call path:
start_kernel()
time_init()
ppc_md.calibrate_decr() (pmac_calibrate_decr)
via_calibrate_decr()
ioremap() and iounmap() are called. The unmap can enable interrupts
unexpectedly (cond_resched() in vunmap_pmd_range()), which causes a
warning later in the boot sequence in start_kernel().
Use the early_* variants of these IO functions to prevent this.
The issue is pre-existing, but is surfaced by commit 721255b982
("genirq: Use a maple tree for interrupt descriptor management").
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230706010816.72682-1-bgray@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 16e455a465 ]
Using brcmfmac with 6.5-rc3 on a brcmfmac43241b4-sdio triggers
a backtrace caused by the following field-spanning warning:
memcpy: detected field-spanning write (size 120) of single field
"¶ms_le->channel_list[0]" at
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:1072 (size 2)
The driver still works after this warning. The warning was introduced by the
new field-spanning write checks which were enabled recently.
Fix this by replacing the channel_list[1] declaration at the end of
the struct with a flexible array declaration.
Most users of struct brcmf_scan_params_le calculate the size to alloc
using the size of the non flex-array part of the struct + needed extra
space, so they do not care about sizeof(struct brcmf_scan_params_le).
brcmf_notify_escan_complete() however uses the struct on the stack,
expecting there to be room for at least 1 entry in the channel-list
to store the special -1 abort channel-id.
To make this work use an anonymous union with a padding member
added + the actual channel_list flexible array.
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230729140500.27892-1-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 285975dd67 ]
sk_getsockopt() runs without locks, we must add annotations
to sk->sk_rcvtimeo and sk->sk_sndtimeo.
In the future we might allow fetching these fields before
we lock the socket in TCP fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d7ae22ae9 ]
The commit (SHA1: 5c844d57aa) provided code
to apply "Module 6: Certain PHY registers must be written as pairs instead
of singly" errata for KSZ9477 as this chip for certain PHY registers
(0xN120 to 0xN13F, N=1,2,3,4,5) must be accesses as 32 bit words instead
of 16 or 8 bit access.
Otherwise, adjacent registers (no matter if reserved or not) are
overwritten with 0x0.
Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
bit access are out of valid regmap ranges.
As a result, following error is observed and KSZ9477 is not properly
configured:
ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0
The solution is to modify regmap_reg_range to allow accesses with 4 bytes
boundaries.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed0cf84e9c ]
It is incorrect in python to compare integer values using the "is" keyword.
The "is" keyword in python is used to compare references to two objects,
not their values. Newer version of python3 (version 3.8) throws a warning
when such incorrect comparison is made. For value comparison, "==" should
be used.
Fix this in the code and suppress the following warning:
/usr/sbin/vmbus_testing:167: SyntaxWarning: "is" with a literal. Did you mean "=="?
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20230705134408.6302-1-anisinha@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7dd44f4f3 ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM so that it won't
be built to cause below compiling error if PCI is unset:
------
ld: drivers/clk/clk-fixed-mmio.o: in function `fixed_mmio_clk_setup':
clk-fixed-mmio.c:(.text+0x5e): undefined reference to `of_iomap'
ld: clk-fixed-mmio.c:(.text+0xba): undefined reference to `iounmap'
------
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: linux-clk@vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-8-bhe@redhat.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d7f105edb ]
If the current task fails the check for the queried capability via
`capable(CAP_SYS_ADMIN)` LSMs like SELinux generate a denial message.
Issuing such denial messages unnecessarily can lead to a policy author
granting more privileges to a subject than needed to silence them.
Reorder CAP_SYS_ADMIN checks after the check whether the operation is
actually privileged.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 83da30d73b ]
On FDT systems these command line processing are already taken care of
by early_init_dt_scan_chosen(). Add similar handling to the ACPI (non-
FDT) code path to allow these config options to work for ACPI (non-FDT)
systems too.
Signed-off-by: Zhihong Dong <donmor3000@hotmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54c2c9df08 ]
This is a port of commit 4fe4a6374c ("MIPS: Only fiddle with
CHECKFLAGS if `need-compiler'") to LoongArch.
We have originally guarded fiddling with CHECKFLAGS in our arch Makefile
by checking for the CONFIG_LOONGARCH variable, not set for targets such
as `distclean', etc. that neither include `.config' nor use the compiler.
Starting from commit 805b2e1d42 ("kbuild: include Makefile.compiler
only when compiler is needed") we have had a generic `need-compiler'
variable explicitly telling us if the compiler will be used and thus its
capabilities need to be checked and expressed in the form of compilation
flags. If this variable is not set, then `make' functions such as
`cc-option' are undefined, causing all kinds of weirdness to happen if
we expect specific results to be returned.
It doesn't cause problems on LoongArch now. But as a guard we replace
the check for CONFIG_LOONGARCH with one for `need-compiler' instead, so
as to prevent the compiler from being ever called for CHECKFLAGS when
not needed.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4139f992c4 ]
It is possible for dma_request_chan() to return EPROBE_DEFER, which
means acdev->host->dev is not ready yet. At this point dev_err() will
have no output. Use dev_err_probe() instead.
Signed-off-by: Minjie Du <duminjie@vivo.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18b44bc5a6 ]
Commit db1d1e8b98 ("IMA: use vfs_getattr_nosec to get the i_version")
partially closed an IMA integrity issue when directly modifying a file
on the lower filesystem. If the overlay file is first opened by a user
and later the lower backing file is modified by root, but the extended
attribute is NOT updated, the signature validation succeeds with the old
original signature.
Update the super_block s_iflags to SB_I_IMA_UNVERIFIABLE_SIGNATURE to
force signature reevaluation on every file access until a fine grained
solution can be found.
Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a37c55b85 ]
Report current GFX clock also from average clock value as the original
CurrClock data is not valid/accurate any more as per FW team
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c21733754c ]
Currently huawei-wmi causes a lot of spam in dmesg on my
Huawei MateBook X Pro 2022:
...
[36409.328463] input input9: Unknown key pressed, code: 0x02c1
[36411.335104] input input9: Unknown key pressed, code: 0x02c1
[36412.338674] input input9: Unknown key pressed, code: 0x02c1
[36414.848564] input input9: Unknown key pressed, code: 0x02c1
[36416.858706] input input9: Unknown key pressed, code: 0x02c1
...
Fix that by ignoring events generated by ambient light sensor.
This issue was reported on GitHub and resolved with the following merge
request:
https://github.com/aymanbagabas/Huawei-WMI/pull/70
I've contacted the mainter of this repo and he gave me the "go ahead" to
send this patch to the maling list.
Signed-off-by: Konstantin Shelekhin <k.shelekhin@ftml.net>
Link: https://lore.kernel.org/r/20230722155922.173856-1-k.shelekhin@ftml.net
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7783e97f85 ]
HP Elite Dragonfly G2 (a convertible laptop/tablet) has a reliable VGBS
method. If VGBS is not called on boot, the firmware sends an initial
0xcd event shortly after calling the BTNL method, but only if the device
is booted in the laptop mode. However, if the device is booted in the
tablet mode and VGBS is not called, there is no initial 0xcc event, and
the input device for SW_TABLET_MODE is not registered up until the user
turns the device into the laptop mode.
Call VGBS on boot on this device to get the initial state of
SW_TABLET_MODE in a reliable way.
Tested with BIOS 1.13.1.
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Link: https://lore.kernel.org/r/20230716183213.64173-1-maxtram95@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3ab18de2b ]
On a HP Elite Dragonfly G2 the 0xcc and 0xcd events for SW_TABLET_MODE
are only send after the BTNL ACPI method has been called.
Likely more devices need this, so make the BTNL ACPI method unconditional
instead of only doing it on devices with a 5 button array.
Note this also makes the intel_button_array_enable() call in probe()
unconditional, that function does its own priv->array check. This makes
the intel_button_array_enable() call in probe() consistent with the calls
done on suspend/resume which also rely on the priv->array check inside
the function.
Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Closes: https://lore.kernel.org/platform-driver-x86/20230712175023.31651-1-maxtram95@gmail.com/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230715181516.5173-1-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3da4350637 ]
Microsoft Modern Wireless Headset (appearing on the host as "Microsoft
USB Link") has a playback and a capture mixer volume/switch, but they
are fairly broken. The descriptor reports wrong dB ranges for
playback, and the capture volume/switch don't influence on the actual
recording at all. Moreover, there seem instabilities in the
connection, and at best, we should disable the runtime PM.
So this ended up with a quirk entry for:
- Correct the playback dB range;
I picked up some reasonable values but it's a guess work
- Disable the capture mixer;
it's completely useless and confuses PA/PW
- Suppress get-sample-rate, apply the delay for message handling,
and suppress the auto-suspend
The behavior of the wheel control on the headset is somehow flaky,
too, but it's an issue of HID.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1207129
Link: https://lore.kernel.org/r/20230725092057.15115-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a43f95fdd3 ]
We need to specify charset, like "iocharset=utf-8", in mount options for
Chinese path if the nls_default don't support it, such as iso8859-1, the
default value for CONFIG_NLS_DEFAULT.
But now in reconnection the nls_default is used, instead of the one we
specified and used in mount, and this can lead to mount failure.
Signed-off-by: Winston Wen <wentao@uniontech.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1ed39ec11 ]
load_nls() take a char * parameter, use it to find nls module in list or
construct the module name to load it.
This change make load_nls() take a const parameter, so we don't need do
some cast like this:
ses->local_nls = load_nls((char *)ctx->local_nls->charset);
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Winston Wen <wentao@uniontech.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a2278ce9c ]
The DASD device driver has a function to requeue requests to the
blocklayer.
This function is used in various cases when basic settings for the device
have to be changed like High Performance Ficon related parameters or copy
pair settings.
The functions iterates over the device->ccw_queue and also removes the
requests from the block->ccw_queue.
In case the device is started on an alias device instead of the base
device it might be removed from the block->ccw_queue without having it
canceled properly before. This might lead to a hanging device since the
request is no longer on a queue and can not be handled properly.
Fix by iterating over the block->ccw_queue instead of the
device->ccw_queue. This will take care of all blocklayer related requests
and handle them on all associated DASD devices.
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Link: https://lore.kernel.org/r/20230721193647.3889634-4-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit acea28a6b7 ]
If a DASD request fails an error recovery procedure (ERP) request might
be built as a copy of the original request to do error recovery.
The ERP request gets a number of retries assigned.
This number is always 256 no matter what other value might have been set
for the original request. This is not what is expected when a user
specifies a certain amount of retries for the device via sysfs.
Correctly use the number of retries of the original request for ERP
requests.
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Link: https://lore.kernel.org/r/20230721193647.3889634-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d01da0a1d ]
in atl1c_tso_csum, it should check the return value of pskb_trim(),
and return an error code if an unexpected value is returned
by pskb_trim().
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d2336120a ]
When the tm module is configured with traffic, traffic
may be abnormal. This patch fixes this problem.
Before the tm module is configured, traffic processing
should be stopped. After the tm module is configured,
traffic processing is enabled.
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 80ddce5f2d ]
Since commit 3d439b1a2a ("thermal/core: Alloc-copy-free the thermal zone
parameters structure"), thermal_zone_device_register() allocates a copy
of the tzp argument and callers need not explicitly manage its lifetime.
This means the function no longer cares about the parameter being
mutable, so constify it.
No functional change.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0691dc162 ]
When handling an AAD interrupt, if IRQ events read failed (for example,
due to i2c "Transfer while suspended" failure, i.e. when attempting to
read it while DA7219 is suspended, which may happen due to a spurious
AAD interrupt), the events array contains garbage uninitialized values.
So instead of trying to interprete those values and doing any actions
based on them (potentially resulting in misbehavior, e.g. reporting
bogus events), refuse to handle the interrupt.
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
Link: https://lore.kernel.org/r/20230717193737.161784-3-dmy@semihalf.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 91e292917d ]
da7219_aad_suspend() disables jack detection, which should prevent
generating new interrupts by DA7219 while suspended. However, there is a
theoretical possibility that there is a pending interrupt generated just
before suspending DA7219 and not handled yet, so the IRQ handler may
still run after DA7219 is suspended. To prevent that, wait until the
pending IRQ handling is done.
This patch arose as an attempt to fix the following I2C failure
occurring sometimes during system suspend or resume:
[ 355.876211] i2c_designware i2c_designware.3: Transfer while suspended
[ 355.876245] WARNING: CPU: 2 PID: 3576 at drivers/i2c/busses/i2c-designware-master.c:570 i2c_dw_xfer+0x411/0x440
...
[ 355.876462] Call Trace:
[ 355.876468] <TASK>
[ 355.876475] ? update_load_avg+0x1b3/0x615
[ 355.876484] __i2c_transfer+0x101/0x1d8
[ 355.876494] i2c_transfer+0x74/0x10d
[ 355.876504] regmap_i2c_read+0x6a/0x9c
[ 355.876513] _regmap_raw_read+0x179/0x223
[ 355.876521] regmap_raw_read+0x1e1/0x28e
[ 355.876527] regmap_bulk_read+0x17d/0x1ba
[ 355.876532] ? __wake_up+0xed/0x1bb
[ 355.876542] da7219_aad_irq_thread+0x54/0x2c9 [snd_soc_da7219 5fb8ebb2179cf2fea29af090f3145d68ed8e2184]
[ 355.876556] irq_thread+0x13c/0x231
[ 355.876563] ? irq_forced_thread_fn+0x5f/0x5f
[ 355.876570] ? irq_thread_fn+0x4d/0x4d
[ 355.876576] kthread+0x13a/0x152
[ 355.876581] ? synchronize_irq+0xc3/0xc3
[ 355.876587] ? kthread_blkcg+0x31/0x31
[ 355.876592] ret_from_fork+0x1f/0x30
[ 355.876601] </TASK>
which indicates that the AAD IRQ handler is unexpectedly running when
DA7219 is suspended, and as a result, is trying to read data from DA7219
over I2C and is hitting the I2C driver "Transfer while suspended"
failure.
However, with this patch the above failure is still reproducible. So
this patch does not fix any real observed issue so far, but at least is
useful for confirming that the above issue is not caused by a pending
IRQ but rather looks like a DA7219 hardware issue with an IRQ
unexpectedly generated after jack detection is already disabled.
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
Link: https://lore.kernel.org/r/20230717193737.161784-2-dmy@semihalf.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 536bb492d3 ]
If client send smb2 negotiate request and then send smb1 negotiate
request, init_smb2_rsp_hdr is called for smb1 negotiate request since
need_neg is set to false. This patch ignore smb1 packets after ->need_neg
is set to false.
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21541
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e202a1e863 ]
ksmbd doesn't support compound read. If client send read-read in
compound to ksmbd, there can be memory leak from read buffer.
Windows and linux clients doesn't send it to server yet. For now,
No response from compound read. compound read will be supported soon.
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21587, ZDI-CAN-21588
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3df0411e13 ]
`smb2_get_msg()` in smb2_get_ksmbd_tcon() and smb2_check_user_session()
will always return the first request smb2 header in a compound request.
if `SMB2_TREE_CONNECT_HE` is the first command in compound request, will
return 0, i.e. The tree id check is skipped.
This patch use ksmbd_req_buf_next() to get current command in compound.
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21506
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a73edab69 ]
Similarly to the previous patch: offs can be used in handle_rerrors
without initializing on small payloads; in this case handle_rerrors will
not use it because of the size check, but it doesn't hurt to make sure
it is zero to please scan-build.
This fixes the following warning:
net/9p/trans_virtio.c:539:3: warning: 3rd function call argument is an uninitialized value [core.CallAndMessage]
handle_rerror(req, in_hdr_len, offs, in_pages);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 13ade4ac5c ]
handle_rerror can dereference the pages pointer, but it is not
necessarily set for small payloads.
In practice these should be filtered out by the size check, but
might as well double-check explicitly.
This fixes the following scan-build warnings:
net/9p/trans_virtio.c:401:24: warning: Dereference of null pointer [core.NullDereference]
memcpy_from_page(to, *pages++, offs, n);
^~~~~~~~
net/9p/trans_virtio.c:406:23: warning: Dereference of null pointer (loaded from variable 'pages') [core.NullDereference]
memcpy_from_page(to, *pages, offs, size);
^~~~~~
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4aaa96b59d ]
After having been assigned to NULL value at cx23885-dvb.c:1202,
pointer '0' is dereferenced at cx23885-dvb.c:2469.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Nikolay Burykin <burikin@ivk.ru>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92cbf865ea ]
Handle (and warn about) possible error waiting for MSGCODE_PING result.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53ebeea505 ]
imx jpeg encoder and decoder support 4 slots each,
aim to support some virtualization scenarios.
driver should only enable one slot one time.
but due to some hardware issue,
only slot 0 can be enabled in imx8q platform,
and they may be fixed in imx9 platform.
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0266a2f791 ]
The return value of the ksmbd_vfs_getcasexattr() is signed.
However, the return value is being assigned to an unsigned
variable and subsequently recasted, causing warnings. Use
a signed type.
Signed-off-by: Wang Ming <machel@vivo.com>
Acked-by: Tom Talpey <tom@talpey.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a99a4ff6ef ]
This partially reverts commit de231189e7 ("drm/amd/display: Fix
possible underflow for displays with large vblank").
[Why]
The increased value of VBlankNomDefaultUS causes underflow at the
desktop of an IP KVM setup
[How]
Change the value from 800 back to 668
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Daniel Miess <daniel.miess@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ebd00a5a2 ]
This reverts commit 56a16035bb.
Since the previous commit, STP works on bridge in netns.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
[ 50.558135] br0: port 1(veth0) entered blocking state
[ 50.558366] br0: port 1(veth0) entered disabled state
[ 50.558798] veth0: entered allmulticast mode
[ 50.564401] veth0: entered promiscuous mode
# ip link set veth1 master br0 up
[ 54.215487] br0: port 2(veth1) entered blocking state
[ 54.215657] br0: port 2(veth1) entered disabled state
[ 54.215848] veth1: entered allmulticast mode
[ 54.219577] veth1: entered promiscuous mode
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
[ 61.960726] br0: port 2(veth1) entered blocking state
[ 61.961097] br0: port 2(veth1) entered listening state
[ 61.961495] br0: port 1(veth0) entered blocking state
[ 61.961653] br0: port 1(veth0) entered listening state
[ 63.998835] br0: port 2(veth1) entered blocking state
[ 77.437113] br0: port 1(veth0) entered learning state
[ 86.653501] br0: received packet on veth0 with own address as source address (addr:6e:0f:e7:6f:5f:5f, vlan:0)
[ 92.797095] br0: port 1(veth0) entered forwarding state
[ 92.797398] br0: topology change detected, propagating
Let's remove the warning.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit cdaac8e7e5 upstream.
A syzbot stress test using a corrupted disk image reported that
mark_buffer_dirty() called from __nilfs_mark_inode_dirty() or
nilfs_palloc_commit_alloc_entry() may output a kernel warning, and can
panic if the kernel is booted with panic_on_warn.
This is because nilfs2 keeps buffer pointers in local structures for some
metadata and reuses them, but such buffers may be forcibly discarded by
nilfs_clear_dirty_page() in some critical situations.
This issue is reported to appear after commit 28a65b49eb ("nilfs2: do
not write dirty data after degenerating to read-only"), but the issue has
potentially existed before.
Fix this issue by checking the uptodate flag when attempting to reuse an
internally held buffer, and reloading the metadata instead of reusing the
buffer if the flag was lost.
Link: https://lkml.kernel.org/r/20230818131804.7758-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+cdfcae656bac88ba0e2d@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000003da75f05fdeffd12@google.com
Fixes: 8c26c4e269 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9baeea723c upstream.
When configuring a pin as an output pin with a value of logic 0, we
end up as having a value of logic 1 on the output pin. Setting a
logic 0 a second time (or more) after that will correctly output a
logic 0 on the output pin.
By default, all GPIO pins are configured as inputs. When we enter
sc16is7xx_gpio_direction_output() for the first time, we first set the
desired value in IOSTATE, and then we configure the pin as an output.
The datasheet states that writing to IOSTATE register will trigger a
transfer of the value to the I/O pin configured as output, so if the
pin is configured as an input, nothing will be transferred.
Therefore, set the direction first in IODIR, and then set the desired
value in IOSTATE.
This is what is done in NXP application note AN10587.
Fixes: dfeae619d7 ("serial: sc16is7xx")
Cc: stable@vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Lech Perczak <lech.perczak@camlingroup.com>
Tested-by: Lech Perczak <lech.perczak@camlingroup.com>
Link: https://lore.kernel.org/r/20230807214556.540627-6-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2861ed4d6e upstream.
The sc16is7xx_config_rs485() function is called only for the second
port (index 1, channel B), causing initialization problems for the
first port.
For the sc16is7xx driver, port->membase and port->mapbase are not set,
and their default values are 0. And we set port->iobase to the device
index. This means that when the first device is registered using the
uart_add_one_port() function, the following values will be in the port
structure:
port->membase = 0
port->mapbase = 0
port->iobase = 0
Therefore, the function uart_configure_port() in serial_core.c will
exit early because of the following check:
/*
* If there isn't a port here, don't do anything further.
*/
if (!port->iobase && !port->mapbase && !port->membase)
return;
Typically, I2C and SPI drivers do not set port->membase and
port->mapbase.
The max310x driver sets port->membase to ~0 (all ones). By
implementing the same change in this driver, uart_configure_port() is
now correctly executed for all ports.
Fixes: dfeae619d7 ("serial: sc16is7xx")
Cc: stable@vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Lech Perczak <lech.perczak@camlingroup.com>
Tested-by: Lech Perczak <lech.perczak@camlingroup.com>
Link: https://lore.kernel.org/r/20230807214556.540627-2-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29d15589f0 upstream.
When a function is using functions from mac80211 to free an skb then it
should do it consistently and not switch to the generic dev_kfree_skb_any
(or similar functions). Otherwise (like in the error handlers), mac80211
will will not be aware of the freed skb and thus not clean up related
information in its internal data structures.
Not doing so lead in the past to filled up structure which then prevented
new clients to connect.
Fixes: d5c65159f2 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Fixes: 6257c70226 ("wifi: ath11k: fix tx status reporting in encap offload mode")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230802-ath11k-ack_status_leak-v2-2-c0af729d6229@narfation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 400ece6c7f upstream.
When a station idles for a long time, hostapd will try to send a QoS Null
frame to the station as "poll". NL80211_CMD_PROBE_CLIENT is used for this
purpose. And the skb will be added to ack_status_frame - waiting for a
completion via ieee80211_report_ack_skb().
But when the peer was already removed before the tx_complete arrives, the
peer will be missing. And when using dev_kfree_skb_any (instead of going
through mac80211), the entry will stay inside ack_status_frames. This IDR
will therefore run full after 8K request were generated for such clients.
At this point, the access point will then just stall and not allow any new
clients because idr_alloc() for ack_status_frame will fail.
ieee80211_free_txskb() on the other hand will (when required) call
ieee80211_report_ack_skb() and make sure that (when required) remove the
entry from the ack_status_frame.
Tested-on: IPQ6018 hw1.0 WLAN.HK.2.5.0.1-01100-QCAHKSWPL_SILICONZ-1
Fixes: 6257c70226 ("wifi: ath11k: fix tx status reporting in encap offload mode")
Fixes: 94739d45c3 ("ath11k: switch to using ieee80211_tx_status_ext()")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230802-ath11k-ack_status_leak-v2-1-c0af729d6229@narfation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 290564367a upstream.
After rtw_usb_alloc_rx_bufs() has been called rx urbs have been
allocated and must be freed in the error path. After rtw_usb_init_rx()
has been called they are submitted, so they also must be killed.
Add these forgotten steps to the probe error path.
Besides the lost memory this also fixes a problem when the driver
fails to download the firmware in rtw_chip_info_setup(). In this
case it can happen that the completion of the rx urbs handler runs
at a time when we already freed our data structures resulting in
a kernel crash.
Fixes: a82dfd33d1 ("wifi: rtw88: Add common USB chip support")
Cc: stable@vger.kernel.org
Reported-by: Ilgaz Öcal <ilgaz@ilgaz.gen.tr>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230823075021.588596-1-s.hauer@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b642f4c5f3 upstream.
txs may be dropped if the frame is aggregated in AMSDU. When the problem
shows up, some SKBs would be hold in driver to cause network stopped
temporarily. Even if the problem can be recovered by txs timeout handling,
mt7921 still need to disable txs in AMSDU to avoid this issue.
Cc: stable@vger.kernel.org
Fixes: 163f4d22c1 ("mt76: mt7921: add MAC support")
Reviewed-by: Shayne Chen <shayne.chen@mediatek.com>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ac6678b95 upstream.
Currently the EKR battery remains even after we stop getting information
from the device. This can lead to a stale battery persisting indefinitely
in userspace.
The remote sends a heartbeat every 10 seconds. Delete the battery if we
miss two heartbeats (after 21 seconds). Restore the battery once we see
a heartbeat again.
Signed-off-by: Aaron Skomra <skomra@gmail.com>
Signed-off-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
Reviewed-by: Jason Gerecke <jason.gerecke@wacom.com>
Fixes: 9f1015d45f ("HID: wacom: EKR: attach the power_supply on first connection")
CC: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fa206bb76 upstream.
Device connected to usb otg port of GXL-based boards can not be
recognised after resumption, doesn't recover even if disconnect and
reconnect the device. dmesg shows it disconnects during resumption.
[ 41.492911] usb 1-2: USB disconnect, device number 3
[ 41.499346] usb 1-2: unregistering device
[ 41.511939] usb 1-2: unregistering interface 1-2:1.0
Calling usb_post_init() will fix this issue, and it's tested and
verified on libretech's aml-s905x-cc board.
Cc: stable@vger.kernel.org # v5.8+
Fixes: c99993376f ("usb: dwc3: Add Amlogic G12A DWC3 glue")
Signed-off-by: Luke Lu <luke.lu@libre.computer>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20230809212911.18903-1-luke.lu@libre.computer
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5fadc941d0 upstream.
There have been reports of USB-audio driver spewing errors at the
probe time on a few devices like Jabra and Logitech. The suggested
fix there couldn't be applied as is, unfortunately, because it'll
likely break other devices.
But, the patch suggested an interesting point: looking at the current
init code in stream.c, one may notice that it does initialize
differently from the device setup in endpoint.c. Namely, for UAC1, we
should call snd_usb_init_pitch() and snd_usb_init_sample_rate() after
setting the interface, while the init sequence at parsing calls them
before setting the interface blindly.
This patch changes the init sequence at parsing for UAC1 (and other
devices that need a similar behavior) to be aligned with the rest of
the code, setting the interface at first. And, this fixes the
long-standing problems on a few UAC1 devices like Jabra / Logitech,
as reported, too.
Reported-and-tested-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Closes: https://lore.kernel.org/r/202bbbc0f51522e8545783c4c5577d12a8e2d56d.camel@infinera.com
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230821111857.28926-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9011e49d54 upstream.
It has recently come to my attention that nvidia is circumventing the
protection added in 262e6ae708 ("modules: inherit
TAINT_PROPRIETARY_MODULE") by importing exports from their proprietary
modules into an allegedly GPL licensed module and then rexporting them.
Given that symbol_get was only ever intended for tightly cooperating
modules using very internal symbols it is logical to restrict it to
being used on EXPORT_SYMBOL_GPL and prevent nvidia from costly DMCA
Circumvention of Access Controls law suites.
All symbols except for four used through symbol_get were already exported
as EXPORT_SYMBOL_GPL, and the remaining four ones were switched over in
the preparation patches.
Fixes: 262e6ae708 ("modules: inherit TAINT_PROPRIETARY_MODULE")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95e7ebc682 upstream.
ds1685_rtc_poweroff is only used externally via symbol_get, which was
only ever intended for very internal symbols like this one. Use
EXPORT_SYMBOL_GPL for it so that symbol_get can enforce only being used
on EXPORT_SYMBOL_GPL symbols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Joshua Kinard <kumba@gentoo.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 569820befb upstream.
enetc_phc_index is only used via symbol_get, which was only ever
intended for very internal symbols like this one. Use EXPORT_SYMBOL_GPL
for it so that symbol_get can enforce only being used on
EXPORT_SYMBOL_GPL symbols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4a5c59a95 upstream.
au1xmmc is split somewhat awkwardly into the main mmc subsystem driver,
and callbacks in platform_data that sit under arch/mips/ and are
always built in. The latter than call mmc_detect_change through
symbol_get. Remove the use of symbol_get by requiring the driver
to be built in. In the future the interrupt handlers for card
insert/eject detection should probably be moved into the main driver,
and which point it can be built modular again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Manuel Lauss <manuel.lauss@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
[mcgrof: squashed in depends on MMC=y suggested by Arnd]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0faa29c420 upstream.
The spitz board file uses the obscure symbol_get() function
to optionally call a function from sharpsl_pm.c if that is
built. However, the two files are always built together
these days, and have been for a long time, so this can
be changed to a normal function call.
Link: https://lore.kernel.org/lkml/20230731162639.GA9441@lst.de/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e628bf939a upstream.
Create 3 kinds of files to reproduce this problem.
dd if=/dev/urandom of=127k.bin bs=1024 count=127
dd if=/dev/urandom of=128k.bin bs=1024 count=128
dd if=/dev/urandom of=129k.bin bs=1024 count=129
When copying files from ksmbd share to windows or cifs.ko, The following
error message happen from windows client.
"The file '129k.bin' is too large for the destination filesystem."
We can see the error logs from ksmbd debug prints
[48394.611537] ksmbd: RDMA r/w request 0x0: token 0x669d, length 0x20000
[48394.612054] ksmbd: smb_direct: RDMA write, len 0x20000, needed credits 0x1
[48394.612572] ksmbd: filename 129k.bin, offset 131072, len 131072
[48394.614189] ksmbd: nbytes 1024, offset 132096 mincount 0
[48394.614585] ksmbd: Failed to process 8 [-22]
And we can reproduce it with cifs.ko,
e.g. dd if=129k.bin of=/dev/null bs=128KB count=2
This problem is that ksmbd rdma return error if remaining bytes is less
than Length of Buffer Descriptor V1 Structure.
smb_direct_rdma_xmit()
...
if (desc_buf_len == 0 || total_length > buf_len ||
total_length > t->max_rdma_rw_size)
return -EINVAL;
This patch reduce descriptor size with remaining bytes and remove the
check for total_length and buf_len.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4c1cf523d upstream.
This was accidentally fixed up in commit e4c1cf523d but we can't
take the full change due to other dependancy issues, so here is just
the actual bugfix that is needed.
[Background]
keltargw reported an issue [1] that with mmaped I/Os, sometimes the
tail of the last page (after file ends) is not filled with zeroes.
The root cause is that such tail page could be wrongly selected for
inplace I/Os so the zeroed part will then be filled with compressed
data instead of zeroes.
A simple fix is to avoid doing inplace I/Os for such tail parts,
actually that was already fixed upstream in commit e4c1cf523d
("erofs: tidy up z_erofs_do_read_page()") by accident.
[1] https://lore.kernel.org/r/3ad8b469-25db-a297-21f9-75db2d6ad224@linux.alibaba.com
Reported-by: keltargw <keltar.gw@gmail.com>
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33f0467fe0 upstream.
Kernel test robot reported a kallsyms_test failure when clang lto is
enabled (thin or full) and CONFIG_KALLSYMS_SELFTEST is also enabled.
I can reproduce in my local environment with the following error message
with thin lto:
[ 1.877897] kallsyms_selftest: Test for 1750th symbol failed: (tsc_cs_mark_unstable) addr=ffffffff81038090
[ 1.877901] kallsyms_selftest: abort
It appears that commit 8cc32a9bbf ("kallsyms: strip LTO-only suffixes
from promoted global functions") caused the failure. Commit 8cc32a9bbf
changed cleanup_symbol_name() based on ".llvm." instead of '.' where
".llvm." is appended to a before-lto-optimization local symbol name.
We need to propagate such knowledge in kallsyms_selftest.c as well.
Further more, compare_symbol_name() in kallsyms.c needs change as well.
In scripts/kallsyms.c, kallsyms_names and kallsyms_seqs_of_names are used
to record symbol names themselves and index to symbol names respectively.
For example:
kallsyms_names:
...
__amd_smn_rw._entry <== seq 1000
__amd_smn_rw._entry.5 <== seq 1001
__amd_smn_rw.llvm.<hash> <== seq 1002
...
kallsyms_seqs_of_names are sorted based on cleanup_symbol_name() through, so
the order in kallsyms_seqs_of_names actually has
index 1000: seq 1002 <== __amd_smn_rw.llvm.<hash> (actual symbol comparison using '__amd_smn_rw')
index 1001: seq 1000 <== __amd_smn_rw._entry
index 1002: seq 1001 <== __amd_smn_rw._entry.5
Let us say at a particular point, at index 1000, symbol '__amd_smn_rw.llvm.<hash>'
is comparing to '__amd_smn_rw._entry' where '__amd_smn_rw._entry' is the one to
search e.g., with function kallsyms_on_each_match_symbol(). The current implementation
will find out '__amd_smn_rw._entry' is less than '__amd_smn_rw.llvm.<hash>' and
then continue to search e.g., index 999 and never found a match although the actual
index 1001 is a match.
To fix this issue, let us do cleanup_symbol_name() first and then do comparison.
In the above case, comparing '__amd_smn_rw' vs '__amd_smn_rw._entry' and
'__amd_smn_rw._entry' being greater than '__amd_smn_rw', the next comparison will
be > index 1000 and eventually index 1001 will be hit an a match is found.
For any symbols not having '.llvm.' substr, there is no functionality change
for compare_symbol_name().
Fixes: 8cc32a9bbf ("kallsyms: strip LTO-only suffixes from promoted global functions")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202308232200.1c932a90-oliver.sang@intel.com
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Reviewed-by: Song Liu <song@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230825034659.1037627-1-yonghong.song@linux.dev
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a6b58c5cd upstream.
On the parisc architecture, lockdep reports for all static objects which
are in the __initdata section (e.g. "setup_done" in devtmpfs,
"kthreadd_done" in init/main.c) this warning:
INFO: trying to register non-static key.
The warning itself is wrong, because those objects are in the __initdata
section, but the section itself is on parisc outside of range from
_stext to _end, which is why the static_obj() functions returns a wrong
answer.
While fixing this issue, I noticed that the whole existing check can
be simplified a lot.
Instead of checking against the _stext and _end symbols (which include
code areas too) just check for the .data and .bss segments (since we check a
data object). This can be done with the existing is_kernel_core_data()
macro.
In addition objects in the __initdata section can be checked with
init_section_contains(), and is_kernel_rodata() allows keys to be in the
_ro_after_init section.
This partly reverts and simplifies commit bac59d18c7 ("x86/setup: Fix static
memory detection").
Link: https://lkml.kernel.org/r/ZNqrLRaOi/3wPAdp@p100
Fixes: bac59d18c7 ("x86/setup: Fix static memory detection")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6846234f4 upstream.
Today module_frob_arch_sections() spots init sections from their
'init' prefix, and uses this to keep the init PLTs separate from the rest.
get_module_plt() uses within_module_init() to determine if a
location is in the init text or not, but this depends on whether
core code thought this was an init section.
Naturally the logic is different.
module_init_layout_section() groups the init and exit text together if
module unloading is disabled, as the exit code will never run. The result
is kernels with this configuration can't load all their modules because
there are not enough PLTs for the combined init+exit section.
A previous patch exposed module_init_layout_section(), use that so the
logic is the same.
Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f928f8b1a2 upstream.
Today module_frob_arch_sections() spots init sections from their
'init' prefix, and uses this to keep the init PLTs separate from the rest.
module_emit_plt_entry() uses within_module_init() to determine if a
location is in the init text or not, but this depends on whether
core code thought this was an init section.
Naturally the logic is different.
module_init_layout_section() groups the init and exit text together if
module unloading is disabled, as the exit code will never run. The result
is kernels with this configuration can't load all their modules because
there are not enough PLTs for the combined init+exit section.
This results in the following:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
| module_emit_plt_entry+0x184/0x1cc
| apply_relocate_add+0x2bc/0x8e4
| load_module+0xe34/0x1bd4
| init_module_from_file+0x84/0xc0
| __arm64_sys_finit_module+0x1b8/0x27c
| invoke_syscall.constprop.0+0x5c/0x104
| do_el0_svc+0x58/0x160
| el0_svc+0x38/0x110
| el0t_64_sync_handler+0xc0/0xc4
| el0t_64_sync+0x190/0x194
A previous patch exposed module_init_layout_section(), use that so the
logic is the same.
Reported-by: Adam Johnston <adam.johnston@arm.com>
Tested-by: Adam Johnston <adam.johnston@arm.com>
Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: <stable@vger.kernel.org> # 5.15.x: 60a0aab746 arm64: module-plts: inline linux/moduleloader.h
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2abcc4b5a6 upstream.
module_init_layout_section() choses whether the core module loader
considers a section as init or not. This affects the placement of the
exit section when module unloading is disabled. This code will never run,
so it can be free()d once the module has been initialised.
arm and arm64 need to count the number of PLTs they need before applying
relocations based on the section name. The init PLTs are stored separately
so they can be free()d. arm and arm64 both use within_module_init() to
decide which list of PLTs to use when applying the relocation.
Because within_module_init()'s behaviour changes when module unloading
is disabled, both architecture would need to take this into account when
counting the PLTs.
Today neither architecture does this, meaning when module unloading is
disabled there are insufficient PLTs in the init section to load some
modules, resulting in warnings:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
| module_emit_plt_entry+0x184/0x1cc
| apply_relocate_add+0x2bc/0x8e4
| load_module+0xe34/0x1bd4
| init_module_from_file+0x84/0xc0
| __arm64_sys_finit_module+0x1b8/0x27c
| invoke_syscall.constprop.0+0x5c/0x104
| do_el0_svc+0x58/0x160
| el0_svc+0x38/0x110
| el0t_64_sync_handler+0xc0/0xc4
| el0t_64_sync+0x190/0x194
Instead of duplicating module_init_layout_section()s logic, expose it.
Reported-by: Adam Johnston <adam.johnston@arm.com>
Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f641174a1 upstream.
The `nocrt` module parameter has no code associated with it and does
nothing. As `crt=-1` has same functionality as what nocrt should be
doing drop `nocrt` and associated documentation.
This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and
thus didn't function properly.
Fixes: 8c99fdce30 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08713cb006 upstream.
Jakub Kicinski says:
We've got some new kdoc warnings here:
net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc'
net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc'
include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set'
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20230810104638.746e46f1@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd0a7ec379 upstream.
The vangogh driver just gained a link time dependency that now causes
randconfig builds to fail:
x86_64-linux-ld: sound/soc/amd/vangogh/pci-acp5x.o: in function `snd_acp5x_probe':
pci-acp5x.c:(.text+0xbb): undefined reference to `snd_amd_acp_find_config'
Fixes: e89f45edb7 ("ASoC: amd: vangogh: Add check for acp config flags in vangogh platform")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230605085839.2157268-1-arnd@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cfeb6ae8bc ]
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e39c1ac68 ]
Associate the swnode of the GPIO device's (which is the interrupt
controller here) with the irq domain. Otherwise the interrupt-controller
device attribute is a no-op.
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab4109f91b ]
If a GPIO simulator device is unbound with interrupts still requested,
we will hit a use-after-free issue in __irq_domain_deactivate_irq(). The
owner of the irq domain must dispose of all mappings before destroying
the domain object.
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f982b9d57e ]
Fix the below random NULL pointer crash during boot by serializing
pinctrl group and function creation/remove calls in
rzv2m_dt_subnode_to_map() with mutex lock.
Crash logs:
pc : __pi_strcmp+0x20/0x140
lr : pinmux_func_name_to_selector+0x68/0xa4
Call trace:
__pi_strcmp+0x20/0x140
pinmux_generic_add_function+0x34/0xcc
rzv2m_dt_subnode_to_map+0x2e4/0x418
rzv2m_dt_node_to_map+0x15c/0x18c
pinctrl_dt_to_map+0x218/0x37c
create_pinctrl+0x70/0x3d8
While at it, add a comment for lock.
Fixes: 92a9b82525 ("pinctrl: renesas: Add RZ/V2M pin and gpio controller driver")
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230815131558.33787-3-biju.das.jz@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 661efa2284 ]
Fix the below random NULL pointer crash during boot by serializing
pinctrl group and function creation/remove calls in
rzg2l_dt_subnode_to_map() with mutex lock.
Crash log:
pc : __pi_strcmp+0x20/0x140
lr : pinmux_func_name_to_selector+0x68/0xa4
Call trace:
__pi_strcmp+0x20/0x140
pinmux_generic_add_function+0x34/0xcc
rzg2l_dt_subnode_to_map+0x314/0x44c
rzg2l_dt_node_to_map+0x164/0x194
pinctrl_dt_to_map+0x218/0x37c
create_pinctrl+0x70/0x3d8
While at it, add comments for bitmap_lock and lock.
Fixes: c4c4637eb5 ("pinctrl: renesas: Add RZ/G2L pin and gpio controller driver")
Tested-by: Chris Paterson <Chris.Paterson2@renesas.com>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230815131558.33787-2-biju.das.jz@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 897a6b5a03 ]
Use a device property "cirrus,firmware-uid" to get the unique firmware
identifier instead of using ACPI _SUB. There aren't any products that use
_SUB.
There will not usually be a _SUB in Soundwire nodes. The ACPI can use a
_DSD section for custom properties.
There is also a need to support instantiating this driver using software
nodes. This is for systems where the CS35L56 is a back-end device and the
ACPI refers only to the front-end audio device - there will not be any ACPI
references to CS35L56.
Fixes: e496112529 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230817112712.16637-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1bd3a76880 upstream.
Commit 41320b18a0 ("scsi: snic: Fix possible memory leak if device_add()
fails") fixed the memory leak caused by dev_set_name() when device_add()
failed. However, it did not consider that 'tgt' has already been released
when put_device(&tgt->dev) is called. Remove kfree(tgt) in the error path
to avoid double free of 'tgt' and move put_device(&tgt->dev) after the
removed kfree(tgt) to avoid a use-after-free.
Fixes: 41320b18a0 ("scsi: snic: Fix possible memory leak if device_add() fails")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230819083941.164365-1-wangzhu9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e0e9bd5f7 upstream.
Commit 98b211d641 ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes: 98b211d641 ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f406263e3 upstream.
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel.com/T/#mbf0f2ec7fbe45da47526de1d7036183981691e81
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5e ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes: 07e8c82b5e ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 583893a66d upstream.
Previously, on unplug events, the TMU mode was disabled first
followed by the Time Synchronization Handshake, irrespective of
whether the tb_switch_tmu_rate_write() API was successful or not.
However, this caused a problem with Thunderbolt 3 (TBT3)
devices, as the TSPacketInterval bits were always enabled by default,
leading the host router to assume that the device router's TMU was
already enabled and preventing it from initiating the Time
Synchronization Handshake. As a result, TBT3 monitors experienced
display flickering from the second hot plug onwards.
To address this issue, we have modified the code to only disable the
Time Synchronization Handshake during TMU disable if the
tb_switch_tmu_rate_write() function is successful. This ensures that
the TBT3 devices function correctly and eliminates the display
flickering issue.
Co-developed-by: Sanath S <Sanath.S@amd.com>
Signed-off-by: Sanath S <Sanath.S@amd.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[ USB4v2 introduced support for uni-directional TMU mode as part of
d49b4f043d ("thunderbolt: Add support for enhanced uni-directional TMU mode")
This is not a stable candidate commit, so adjust the code for backport. ]
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc22522fd5 upstream.
40613da52b ("PCI: acpiphp: Reassign resources on bridge if necessary")
changed acpiphp hotplug to use pci_assign_unassigned_bridge_resources()
which depends on bridge being available, however enable_slot() can be
called without bridge associated:
1. Legitimate case of hotplug on root bus (widely used in virt world)
2. A (misbehaving) firmware, that sends ACPI Bus Check notifications to
non existing root ports (Dell Inspiron 7352/0W6WV0), which end up at
enable_slot(..., bridge = 0) where bus has no bridge assigned to it.
acpihp doesn't know that it's a bridge, and bus specific 'PCI
subsystem' can't augment ACPI context with bridge information since
the PCI device to get this data from is/was not available.
Issue is easy to reproduce with QEMU's 'pc' machine, which supports PCI
hotplug on hostbridge slots. To reproduce, boot kernel at commit
40613da52b in VM started with following CLI (assuming guest root fs is
installed on sda1 partition):
# qemu-system-x86_64 -M pc -m 1G -enable-kvm -cpu host \
-monitor stdio -serial file:serial.log \
-kernel arch/x86/boot/bzImage \
-append "root=/dev/sda1 console=ttyS0" \
guest_disk.img
Once guest OS is fully booted at qemu prompt:
(qemu) device_add e1000
(check serial.log) it will cause NULL pointer dereference at:
void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
{
struct pci_bus *parent = bridge->subordinate;
BUG: kernel NULL pointer dereference, address: 0000000000000018
? pci_assign_unassigned_bridge_resources+0x1f/0x260
enable_slot+0x21f/0x3e0
acpiphp_hotplug_notify+0x13d/0x260
acpi_device_hotplug+0xbc/0x540
acpi_hotplug_work_fn+0x15/0x20
process_one_work+0x1f7/0x370
worker_thread+0x45/0x3b0
The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with following
sequence:
1. Suspend to RAM
2. Wake up with the same backtrace being observed:
3. 2nd suspend to RAM attempt makes laptop freeze
Fix it by using __pci_bus_assign_resources() instead of
pci_assign_unassigned_bridge_resources() as we used to do, but only in case
when bus doesn't have a bridge associated (to cover for the case of ACPI
event on hostbridge or non existing root port).
That lets us keep hotplug on root bus working like it used to and at the
same time keeps resource reassignment usable on root ports (and other 1st
level bridges) that was fixed by 40613da52b.
Fixes: 40613da52b ("PCI: acpiphp: Reassign resources on bridge if necessary")
Link: https://lore.kernel.org/r/20230726123518.2361181-2-imammedo@redhat.com
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7f2e65699 upstream.
variable *nplanes is provided by user via system call argument. The
possible value of q_data->fmt->num_planes is 1-3, while the value
of *nplanes can be 1-8. The array access by index i can cause array
out-of-bounds.
Fix this bug by checking *nplanes against the array size.
Fixes: 4e855a6efa ("[media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver")
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bc3462a0f upstream.
Shubhra reports that their laptop is heating up over s2idle. Even though
it's getting into the deepest state, it appears to be having spurious
wakeup events.
While debugging a tangential issue with the RTC Carsten reports that recent
6.1.y based kernel face a similar problem.
Looking at acpidump and GPIO register comparisons these spurious wakeup
events are from the GPIO associated with the I2C touchpad on both laptops
and occur even when the touchpad is not marked as a wake source by the
kernel.
This means that the boot firmware has programmed these bits and because
Linux didn't touch them lead to spurious wakeup events from that GPIO.
To fix this issue, restore most of the code that previously would clear all
the bits associated with wakeup sources. This will allow the kernel to only
program the wake up sources that are necessary.
This is similar to what was done previously; but only the wake bits are
cleared by default instead of interrupts and wake bits. If any other
problems are reported then it may make sense to clear interrupts again too.
Cc: Sachi King <nakato@nakato.io>
Cc: stable@vger.kernel.org
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Fixes: 65f6c7c91c ("pinctrl: amd: Revert "pinctrl: amd: disable and mask interrupts on probe"")
Reported-by: Shubhra Prakash Nandi <email2shubhra@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217754
Reported-by: Carsten Hatger <xmb8dsv4@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217626#c28
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230818144850.1439-1-mario.limonciello@amd.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 914d9d831e upstream.
While originally it was fine to format strings using "%pOF" while
holding devtree_lock, this now causes a deadlock. Lockdep reports:
of_get_parent from of_fwnode_get_parent+0x18/0x24
^^^^^^^^^^^^^
of_fwnode_get_parent from fwnode_count_parents+0xc/0x28
fwnode_count_parents from fwnode_full_name_string+0x18/0xac
fwnode_full_name_string from device_node_string+0x1a0/0x404
device_node_string from pointer+0x3c0/0x534
pointer from vsnprintf+0x248/0x36c
vsnprintf from vprintk_store+0x130/0x3b4
Fix this by moving the printing in __of_changeset_entry_apply() outside
the lock. As the only difference in the multiple prints is the action
name, use the existing "action_names" to refactor the prints into a
single print.
Fixes: a92eb7621b ("lib/vsprintf: Make use of fwnode API to obtain node names and separators")
Cc: stable@vger.kernel.org
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230801-dt-changeset-fixes-v3-2-5f0410e007dd@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 382d4cd184 upstream.
The gcc compiler translates on some architectures the 64-bit
__builtin_clzll() function to a call to the libgcc function __clzdi2(),
which should take a 64-bit parameter on 32- and 64-bit platforms.
But in the current kernel code, the built-in __clzdi2() function is
defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG ==
32, thus the return values on 32-bit kernels are in the range from
[0..31] instead of the expected [0..63] range.
This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to
take a 64-bit parameter on 32-bit kernels as well, thus it makes the
functions identical for 32- and 64-bit kernels.
This bug went unnoticed since kernel 3.11 for over 10 years, and here
are some possible reasons for that:
a) Some architectures have assembly instructions to count the bits and
which are used instead of calling __clzdi2(), e.g. on x86 the bsr
instruction and on ppc cntlz is used. On such architectures the
wrong __clzdi2() implementation isn't used and as such the bug has
no effect and won't be noticed.
b) Some architectures link to libgcc.a, and the in-kernel weak
functions get replaced by the correct 64-bit variants from libgcc.a.
c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many
places in the kernel, and most likely only in uncritical functions,
e.g. when printing hex values via seq_put_hex_ll(). The wrong return
value will still print the correct number, but just in a wrong
formatting (e.g. with too many leading zeroes).
d) 32-bit kernels aren't used that much any longer, so they are less
tested.
A trivial testcase to verify if the currently running 32-bit kernel is
affected by the bug is to look at the output of /proc/self/maps:
Here the kernel uses a correct implementation of __clzdi2():
root@debian:~# cat /proc/self/maps
00010000-00019000 r-xp 00000000 08:05 787324 /usr/bin/cat
00019000-0001a000 rwxp 00009000 08:05 787324 /usr/bin/cat
0001a000-0003b000 rwxp 00000000 00:00 0 [heap]
f7551000-f770d000 r-xp 00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...
and this kernel uses the broken implementation of __clzdi2():
root@debian:~# cat /proc/self/maps
0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324 /usr/bin/cat
0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324 /usr/bin/cat
000000001a000-000000003b000 rwxp 00000000 00:00 0 [heap]
00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 4df87bb7b6 ("lib: add weak clz/ctz functions")
Cc: Chanho Min <chanho.min@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d25ddb7e78 upstream.
When a client roamed back to a node before it got time to destroy the
pending local entry (i.e. within the same originator interval) the old
global one is directly removed from hash table and left as such.
But because this entry had an extra reference taken at lookup (i.e using
batadv_tt_global_hash_find) there is no way its memory will be reclaimed
at any time causing the following memory leak:
unreferenced object 0xffff0000073c8000 (size 18560):
comm "softirq", pid 0, jiffies 4294907738 (age 228.644s)
hex dump (first 32 bytes):
06 31 ac 12 c7 7a 05 00 01 00 00 00 00 00 00 00 .1...z..........
2c ad be 08 00 80 ff ff 6c b6 be 08 00 80 ff ff ,.......l.......
backtrace:
[<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300
[<000000000ff2fdbc>] batadv_tt_global_add+0x700/0xe20
[<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790
[<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110
[<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10
[<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0
[<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4
[<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0
[<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90
[<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74
[<000000000f39a009>] __netif_receive_skb+0x48/0xe0
[<00000000f2cd8888>] process_backlog+0x174/0x344
[<00000000507d6564>] __napi_poll+0x58/0x1f4
[<00000000b64ef9eb>] net_rx_action+0x504/0x590
[<00000000056fa5e4>] _stext+0x1b8/0x418
[<00000000878879d6>] run_ksoftirqd+0x74/0xa4
unreferenced object 0xffff00000bae1a80 (size 56):
comm "softirq", pid 0, jiffies 4294910888 (age 216.092s)
hex dump (first 32 bytes):
00 78 b1 0b 00 00 ff ff 0d 50 00 00 00 00 00 00 .x.......P......
00 00 00 00 00 00 00 00 50 c8 3c 07 00 00 ff ff ........P.<.....
backtrace:
[<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300
[<00000000d9aaa49e>] batadv_tt_global_add+0x53c/0xe20
[<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790
[<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110
[<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10
[<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0
[<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4
[<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0
[<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90
[<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74
[<000000000f39a009>] __netif_receive_skb+0x48/0xe0
[<00000000f2cd8888>] process_backlog+0x174/0x344
[<00000000507d6564>] __napi_poll+0x58/0x1f4
[<00000000b64ef9eb>] net_rx_action+0x504/0x590
[<00000000056fa5e4>] _stext+0x1b8/0x418
[<00000000878879d6>] run_ksoftirqd+0x74/0xa4
Releasing the extra reference from batadv_tt_global_hash_find even at
roam back when batadv_tt_global_free is called fixes this memory leak.
Cc: stable@vger.kernel.org
Fixes: 068ee6e204 ("batman-adv: roaming handling mechanism redesign")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Signed-off-by; Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8e42a2b0a upstream.
If the user set an MTU value, it usually means that there are special
requirements for the MTU. But if an interface gots activated, the MTU was
always recalculated and then the user set value was overwritten.
The only reason why this user set value has to be overwritten, is when the
MTU has to be decreased because batman-adv is not able to transfer packets
with the user specified size.
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6a953cce8 upstream.
If an interface changes the MTU, it is expected that an NETDEV_PRECHANGEMTU
and NETDEV_CHANGEMTU notification events is triggered. This worked fine for
.ndo_change_mtu based changes because core networking code took care of it.
But for auto-adjustments after hard-interfaces changes, these events were
simply missing.
Due to this problem, non-batman-adv components weren't aware of MTU changes
and thus couldn't perform their own tasks correctly.
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70d91dc9b2 upstream.
Set the next pointer in filename_trans_read_helper() before attaching
the new node under construction to the list, otherwise garbage would be
dereferenced on subsequent failure during cleanup in the out goto label.
Cc: <stable@vger.kernel.org>
Fixes: 4300590243 ("selinux: implement new format of filename transitions")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b816601e2 upstream.
We have some reports of linux NFS clients that cannot satisfy a linux knfsd
server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
those clients repeatedly walk all their known state using TEST_STATEID and
receive NFS4_OK for all.
Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to
cl_revoked. This would produce the observed client/server effect.
Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
and move to cl_revoked happens within the same cl_lock. This will allow
nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # v4.17+
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6867c7a332 upstream.
When a memcg is in the process of being released mem_cgroup_tryget will
fail because its reference count has already reached 0. This can happen
during reclaim if the memcg has already been offlined, and we reclaim all
remaining pages attributed to the offlined memcg. shrink_many attempts to
skip the empty memcg in this case, and continue reclaiming from the
remaining memcgs in the old generation. If there is only one memcg
remaining, or if all remaining memcgs are in the process of being released
then shrink_many will spin until all memcgs have finished being released.
The release occurs through a workqueue, so it can take a while before
kswapd is able to make any further progress.
This fix results in reductions in kswapd activity and direct reclaim in
a test where 28 apps (working set size > total memory) are repeatedly
launched in a random sequence:
A B delta ratio(%)
allocstall_movable 5962 3539 -2423 -40.64
allocstall_normal 2661 2417 -244 -9.17
kswapd_high_wmark_hit_quickly 53152 7594 -45558 -85.71
pageoutrun 57365 11750 -45615 -79.52
Link: https://lkml.kernel.org/r/20230814151636.1639123-1-tjmercier@google.com
Fixes: e4dde56cd2 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ef269ef1a upstream.
cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks
have been checked. DL BW is not allocated per-task but as a sum over
all DL tasks migrating.
If multiple controllers are attached to the cgroup next to the cpuset
controller a non-cpuset can_attach() can fail. In this case free DL BW
in cpuset_cancel_attach().
Finally, update cpuset DL task count (nr_deadline_tasks) only in
cpuset_attach().
Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85989106fe upstream.
While moving a set of tasks between exclusive cpusets,
cpuset_can_attach() -> task_can_attach() calls dl_cpu_busy(..., p) for
DL BW overflow checking and per-task DL BW allocation on the destination
root_domain for the DL tasks in this set.
This approach has the issue of not freeing already allocated DL BW in
the following error cases:
(1) The set of tasks includes multiple DL tasks and DL BW overflow
checking fails for one of the subsequent DL tasks.
(2) Another controller next to the cpuset controller which is attached
to the same cgroup fails in its can_attach().
To address this problem rework dl_cpu_busy():
(1) Split it into dl_bw_check_overflow() & dl_bw_alloc() and add a
dedicated dl_bw_free().
(2) dl_bw_alloc() & dl_bw_free() take a `u64 dl_bw` parameter instead of
a `struct task_struct *p` used in dl_cpu_busy(). This allows to
allocate DL BW for a set of tasks too rather than only for a single
task.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c24849f55 upstream.
Qais reported that iterating over all tasks when rebuilding root domains
for finding out which ones are DEADLINE and need their bandwidth
correctly restored on such root domains can be a costly operation (10+
ms delays on suspend-resume).
To fix the problem keep track of the number of DEADLINE tasks belonging
to each cpuset and then use this information (followup patch) to only
perform the above iteration if DEADLINE tasks are actually present in
the cpuset for which a corresponding root domain is being rebuilt.
Reported-by: Qais Yousef (Google) <qyousef@layalina.io>
Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 111cd11bbc upstream.
Turns out percpu_cpuset_rwsem - commit 1243dc518c ("cgroup/cpuset:
Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea,
as it has been reported to cause slowdowns in workloads that need to
change cpuset configuration frequently and it is also not implementing
priority inheritance (which causes troubles with realtime workloads).
Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it
only for SCHED_DEADLINE tasks (other policies don't care about stable
cpusets anyway).
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad3a557daf upstream.
rebuild_root_domains() and update_tasks_root_domain() have neutral
names, but actually deal with DEADLINE bandwidth accounting.
Rename them to use 'dl_' prefix so that intent is more clear.
No functional change.
Suggested-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12e6f6dc78 upstream.
For platforms with GMD_ID support (i.e., everything MTL and beyond),
identification of the display IP present should be based on the contents
of the GMD_ID register rather than a PCI devid match.
Note that since GMD_ID readout requires access to the PCI BAR, a slight
change to the driver init sequence is needed --- pci_enable_device() is
now called before i915_driver_create().
v2:
- Fix use of uninitialized i915 pointer in error path if
pci_enable_device() fails before the i915 device is created. (lkp)
- Use drm_device parameter to intel_display_device_probe. This goes
against i915 conventions, but since the primary goal here is to make
it easy to call this function from other drivers (like Xe) and since
we don't need anything from the i915 structure, this seems like an
exception where drm_device is a more natural fit.
v3:
- Go back do drm_i915_private for intel_display_device_probe. (Jani)
- Move forward decl to top of header. (Jani)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230523195609.73627-6-matthew.d.roper@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c66ca3949 upstream.
0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and
bisected it to commit b81fac906a ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves
the CR4_OSXSAVE enabling into a later place:
arch_cpu_finalize_init
identify_boot_cpu
identify_cpu
generic_identify
get_cpu_cap --> setup cpu capability
...
fpu__init_cpu
fpu__init_cpu_xstate
cr4_set_bits(X86_CR4_OSXSAVE);
As the FPU is not yet initialized the CPU capability setup fails to set
X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64'
depend on this feature and therefore fail to load, causing the regression.
Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE
enabling.
[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]
Fixes: b81fac906a ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com
Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f69383b20 upstream.
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:
1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task is preempted
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
4. Task is migrated to CPU1
5. Continuing the exec(), the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task
6. The task reschedules back to CPU0 before completing the exec() and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.
7. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.
Fixes: 33344368cb ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9730870b48 upstream.
In hw_breakpoint_control(), encode_ctrl_reg() has already encoded the
MWPnCFG3_LoadEn/MWPnCFG3_StoreEn bits in info->ctrl. We don't need to
add (1 << MWPnCFG3_LoadEn | 1 << MWPnCFG3_StoreEn) unconditionally.
Otherwise we can't set read watchpoint and write watchpoint separately.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1dcc437427 upstream.
After the commit in the Fixes: line below, HPD polling stopped working
on i915, since after that change calling drm_kms_helper_poll_enable()
doesn't restart drm_mode_config::output_poll_work if the work was
stopped (no connectors needing polling) and enabling polling for a
connector (during runtime suspend or detecting an HPD IRQ storm).
After the above change calling drm_kms_helper_poll_enable() is a nop
after it's been called already and polling for some connectors was
disabled/re-enabled.
Fix this by calling drm_kms_helper_poll_reschedule() added in the
previous patch instead, which reschedules the work whenever expected.
Fixes: d33a54e399 ("drm/probe_helper: sort out poll_running vs poll_enabled")
CC: stable@vger.kernel.org # 6.4+
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230822113015.41224-2-imre.deak@intel.com
(cherry picked from commit 50452f2f76)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f9e96bf190 upstream.
vmw_bo_unreference sets the input buffer to null on exit, resulting in
null ptr deref's on the subsequent drm gem put calls.
This went unnoticed because only very old userspace would be exercising
those paths but it wouldn't be hard to hit on old distros with brand
new kernels.
Introduce a new function that abstracts unrefing of user bo's to make
the code cleaner and more explicit.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reported-by: Ian Forbes <iforbes@vmware.com>
Fixes: 9ef8d83e8e ("drm/vmwgfx: Do not drop the reference to the handle too soon")
Cc: <stable@vger.kernel.org> # v6.4+
Reviewed-by: Maaz Mombasawala<mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230818041301.407636-1-zack@kde.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5805192c7b upstream.
In contrast to most other GUP code, GUP-fast common page table walking
code like gup_pte_range() also handles hugetlb pages. But in contrast to
other hugetlb page table walking code, it does not look at the hugetlb PTE
abstraction whereby we have only a single logical hugetlb PTE per hugetlb
page, even when using multiple cont-PTEs underneath -- which is for
example what huge_ptep_get() abstracts.
So when we have a hugetlb page that is mapped via cont-PTEs, GUP-fast
might stumble over a PTE that does not map the head page of a hugetlb page
-- not the first "head" PTE of such a cont mapping.
Logically, the whole hugetlb page is mapped (entire_mapcount == 1), but we
might end up calling gup_must_unshare() with a tail page of a hugetlb
page.
We only maintain a single PageAnonExclusive flag per hugetlb page (as
hugetlb pages cannot get partially COW-shared), stored for the head page.
That flag is clear for all tail pages.
So when gup_must_unshare() ends up calling PageAnonExclusive() with a tail
page of a hugetlb page:
1) With CONFIG_DEBUG_VM_PGFLAGS
Stumbles over the:
VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
For example, when executing the COW selftests with 64k hugetlb pages on
arm64:
[ 61.082187] page:00000000829819ff refcount:3 mapcount:1 mapping:0000000000000000 index:0x1 pfn:0x11ee11
[ 61.082842] head:0000000080f79bf7 order:4 entire_mapcount:1 nr_pages_mapped:0 pincount:2
[ 61.083384] anon flags: 0x17ffff80003000e(referenced|uptodate|dirty|head|mappedtodisk|node=0|zone=2|lastcpupid=0xfffff)
[ 61.084101] page_type: 0xffffffff()
[ 61.084332] raw: 017ffff800000000 fffffc00037b8401 0000000000000402 0000000200000000
[ 61.084840] raw: 0000000000000010 0000000000000000 00000000ffffffff 0000000000000000
[ 61.085359] head: 017ffff80003000e ffffd9e95b09b788 ffffd9e95b09b788 ffff0007ff63cf71
[ 61.085885] head: 0000000000000000 0000000000000002 00000003ffffffff 0000000000000000
[ 61.086415] page dumped because: VM_BUG_ON_PAGE(PageHuge(page) && !PageHead(page))
[ 61.086914] ------------[ cut here ]------------
[ 61.087220] kernel BUG at include/linux/page-flags.h:990!
[ 61.087591] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 61.087999] Modules linked in: ...
[ 61.089404] CPU: 0 PID: 4612 Comm: cow Kdump: loaded Not tainted 6.5.0-rc4+ #3
[ 61.089917] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 61.090409] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 61.090897] pc : gup_must_unshare.part.0+0x64/0x98
[ 61.091242] lr : gup_must_unshare.part.0+0x64/0x98
[ 61.091592] sp : ffff8000825eb940
[ 61.091826] x29: ffff8000825eb940 x28: 0000000000000000 x27: fffffc00037b8440
[ 61.092329] x26: 0400000000000001 x25: 0000000000080101 x24: 0000000000080000
[ 61.092835] x23: 0000000000080100 x22: ffff0000cffb9588 x21: ffff0000c8ec6b58
[ 61.093341] x20: 0000ffffad6b1000 x19: fffffc00037b8440 x18: ffffffffffffffff
[ 61.093850] x17: 2864616548656761 x16: 5021202626202965 x15: 6761702865677548
[ 61.094358] x14: 6567615028454741 x13: 2929656761702864 x12: 6165486567615021
[ 61.094858] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffd9e958b7a1c0
[ 61.095359] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 00000000002bffa8
[ 61.095873] x5 : ffff0008bb19e708 x4 : 0000000000000000 x3 : 0000000000000000
[ 61.096380] x2 : 0000000000000000 x1 : ffff0000cf6636c0 x0 : 0000000000000046
[ 61.096894] Call trace:
[ 61.097080] gup_must_unshare.part.0+0x64/0x98
[ 61.097392] gup_pte_range+0x3a8/0x3f0
[ 61.097662] gup_pgd_range+0x1ec/0x280
[ 61.097942] lockless_pages_from_mm+0x64/0x1a0
[ 61.098258] internal_get_user_pages_fast+0xe4/0x1d0
[ 61.098612] pin_user_pages_fast+0x58/0x78
[ 61.098917] pin_longterm_test_start+0xf4/0x2b8
[ 61.099243] gup_test_ioctl+0x170/0x3b0
[ 61.099528] __arm64_sys_ioctl+0xa8/0xf0
[ 61.099822] invoke_syscall.constprop.0+0x7c/0xd0
[ 61.100160] el0_svc_common.constprop.0+0xe8/0x100
[ 61.100500] do_el0_svc+0x38/0xa0
[ 61.100736] el0_svc+0x3c/0x198
[ 61.100971] el0t_64_sync_handler+0x134/0x150
[ 61.101280] el0t_64_sync+0x17c/0x180
[ 61.101543] Code: aa1303e0 f00074c1 912b0021 97fffeb2 (d4210000)
2) Without CONFIG_DEBUG_VM_PGFLAGS
Always detects "not exclusive" for passed tail pages and refuses to PIN
the tail pages R/O, as gup_must_unshare() == true. GUP-fast will fallback
to ordinary GUP. As ordinary GUP properly considers the logical hugetlb
PTE abstraction in hugetlb_follow_page_mask(), pinning the page will
succeed when looking at the PageAnonExclusive on the head page only.
So the only real effect of this is that with cont-PTE hugetlb pages, we'll
always fallback from GUP-fast to ordinary GUP when not working on the head
page, which ends up checking the head page and do the right thing.
Consequently, the cow selftests pass with cont-PTE hugetlb pages as well
without CONFIG_DEBUG_VM_PGFLAGS.
Note that this only applies to anon hugetlb pages that are mapped using
cont-PTEs: for example 64k hugetlb pages on a 4k arm64 kernel.
... and only when R/O-pinning (FOLL_PIN) such pages that are mapped into
the page table R/O using GUP-fast.
On production kernels (and even most debug kernels, that don't set
CONFIG_DEBUG_VM_PGFLAGS) this patch should theoretically not be required
to be backported. But of course, it does not hurt.
Link: https://lkml.kernel.org/r/20230805101256.87306-1-david@redhat.com
Fixes: a7f2266041 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d74943a2f3 upstream.
Unfortunately commit 474098edac ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()") missed that follow_page() and
follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really
don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting
or due to inaccessible (PROT_NONE) VMAs.
As spelled out in commit 0b9d705297 ("mm: numa: Support NUMA hinting
page faults from gup/gup_fast"): "Other follow_page callers like KSM
should not use FOLL_NUMA, or they would fail to get the pages if they use
follow_page instead of get_user_pages."
liubo reported [1] that smaps_rollup results are imprecise, because they
miss accounting of pages that are mapped PROT_NONE. Further, it's easy to
reproduce that KSM no longer works on inaccessible VMAs on x86-64, because
pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs,
and follow_page() refuses to return such pages right now.
As KVM really depends on these NUMA hinting faults, removing the
pte_protnone()/pmd_protnone() handling in GUP code completely is not
really an option.
To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
to restore the original behavior for now and add better comments.
Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in
is_valid_gup_args(), to add that flag for all external GUP users.
Note that there are three GUP-internal __get_user_pages() users that don't
end up calling is_valid_gup_args() and consequently won't get
FOLL_HONOR_NUMA_FAULT set.
1) get_dump_page(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE and wouldn't have honored NUMA
hinting faults already.
2) populate_vma_page_range(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have
honored NUMA hinting faults already.
3) faultin_vma_page_range(): we similarly don't want to handle NUMA
hinting faults.
To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in
inaccessible VMAs properly, we have to perform VMA accessibility checks in
gup_can_follow_protnone().
As GUP-fast should reject such pages either way in
pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and
arm64 that both implement pte_protnone() -- let's just always fallback to
ordinary GUP when stumbling over pte_protnone()/pmd_protnone().
As Linus notes [2], honoring NUMA faults might only make sense for
selected GUP users.
So we should really see if we can instead let relevant GUP callers specify
it manually, and not trigger NUMA hinting faults from GUP as default.
Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and
adding appropriate documenation.
While at it, remove a stale comment from follow_trans_huge_pmd(): That
comment for pmd_protnone() was added in commit 2b4847e730 ("mm: numa:
serialise parallel get_user_page against THP migration"), which noted:
THP does not unmap pages due to a lack of support for migration
entries at a PMD level. This allows races with get_user_pages
Nowadays, we do have PMD migration entries, so the comment no longer
applies. Let's drop it.
[1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
[2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@mail.gmail.com
Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com
Fixes: 474098edac ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: liubo <liubo254@huawei.com>
Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
Reported-by: Peter Xu <peterx@redhat.com>
Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49b0638502 upstream.
walk_page_range() and friends often operate under write-locked mmap_lock.
With introduction of vma locks, the vmas have to be locked as well during
such walks to prevent concurrent page faults in these areas. Add an
additional member to mm_walk_ops to indicate locking requirements for the
walk.
The change ensures that page walks which prevent concurrent page faults
by write-locking mmap_lock, operate correctly after introduction of
per-vma locks. With per-vma locks page faults can be handled under vma
lock without taking mmap_lock at all, so write locking mmap_lock would
not stop them. The change ensures vmas are properly locked during such
walks.
A sample issue this solves is do_mbind() performing queue_pages_range()
to queue pages for migration. Without this change a concurrent page
can be faulted into the area and be left out of migration.
Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Suggested-by: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1738b94962 upstream.
After commit 2c2241081f ("mm/gup: move private gup FOLL_ flags to
internal.h") FOLL_LONGTERM flag value got updated from 0x10000 to 0x100 at
include/linux/mm_types.h.
As hmm.hmm_device_private.hmm_gup_test uses FOLL_LONGTERM Updating same
here as well.
Before this change test goes in an infinite assert loop in
hmm.hmm_device_private.hmm_gup_test
==========================================================
RUN hmm.hmm_device_private.hmm_gup_test ...
hmm-tests.c:1962:hmm_gup_test:Expected HMM_DMIRROR_PROT_WRITE..
..(2) == m[2] (34)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
...
==========================================================
Call Trace:
<TASK>
? sched_clock+0xd/0x20
? __lock_acquire.constprop.0+0x120/0x6c0
? ktime_get+0x2c/0xd0
? sched_clock+0xd/0x20
? local_clock+0x12/0xd0
? lock_release+0x26e/0x3b0
pin_user_pages_fast+0x4c/0x70
gup_test_ioctl+0x4ff/0xbb0
? gup_test_ioctl+0x68c/0xbb0
__x64_sys_ioctl+0x99/0xd0
do_syscall_64+0x60/0x90
? syscall_exit_to_user_mode+0x2a/0x50
? do_syscall_64+0x6d/0x90
? syscall_exit_to_user_mode+0x2a/0x50
? do_syscall_64+0x6d/0x90
? irqentry_exit_to_user_mode+0xd/0x20
? irqentry_exit+0x3f/0x50
? exc_page_fault+0x96/0x200
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f6aaa31aaff
After this change test is able to pass successfully.
Link: https://lkml.kernel.org/r/20230808124347.79163-1-ayush.jain3@amd.com
Fixes: 2c2241081f ("mm/gup: move private gup FOLL_ flags to internal.h")
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e5548f85b4 upstream.
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but
smaps_pte_hole_lookup() only calls it for a single page slot, just break
out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com
Fixes: 2301003215 ("mm/smaps: simplify shmem handling of pte holes")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0362a2536 upstream.
The code calling ima_free_kexec_buffer runs long after the memblock
allocator has already been torn down, potentially resulting in a use
after free in memblock_isolate_range.
With KASAN or KFENCE, this use after free will result in a BUG
from the idle task, and a subsequent kernel panic.
Switch ima_free_kexec_buffer over to memblock_free_late to avoid
that issue.
Fixes: fee3ff99bc ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c")
Cc: stable@kernel.org
Signed-off-by: Rik van Riel <riel@surriel.com>
Suggested-by: Mike Rappoport <rppt@kernel.org>
Link: https://lore.kernel.org/r/20230817135759.0888e5ef@imladris.surriel.com
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66fbfb35da upstream.
Problem can be reproduced by unloading snd_soc_simple_card, because in
devm_get_clk_from_child() devres data is allocated as `struct clk`, but
devm_clk_release() expects devres data to be `struct devm_clk_state`.
KASAN report:
==================================================================
BUG: KASAN: slab-out-of-bounds in devm_clk_release+0x20/0x54
Read of size 8 at addr ffffff800ee09688 by task (udev-worker)/287
Call trace:
dump_backtrace+0xe8/0x11c
show_stack+0x1c/0x30
dump_stack_lvl+0x60/0x78
print_report+0x150/0x450
kasan_report+0xa8/0xf0
__asan_load8+0x78/0xa0
devm_clk_release+0x20/0x54
release_nodes+0x84/0x120
devres_release_all+0x144/0x210
device_unbind_cleanup+0x1c/0xac
really_probe+0x2f0/0x5b0
__driver_probe_device+0xc0/0x1f0
driver_probe_device+0x68/0x120
__driver_attach+0x140/0x294
bus_for_each_dev+0xec/0x160
driver_attach+0x38/0x44
bus_add_driver+0x24c/0x300
driver_register+0xf0/0x210
__platform_driver_register+0x48/0x54
asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card]
do_one_initcall+0xac/0x340
do_init_module+0xd0/0x300
load_module+0x2ba4/0x3100
__do_sys_init_module+0x2c8/0x300
__arm64_sys_init_module+0x48/0x5c
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x124/0x154
do_el0_svc+0x44/0xdc
el0_svc+0x14/0x50
el0t_64_sync_handler+0xec/0x11c
el0t_64_sync+0x14c/0x150
Allocated by task 287:
kasan_save_stack+0x38/0x60
kasan_set_track+0x28/0x40
kasan_save_alloc_info+0x20/0x30
__kasan_kmalloc+0xac/0xb0
__kmalloc_node_track_caller+0x6c/0x1c4
__devres_alloc_node+0x44/0xb4
devm_get_clk_from_child+0x44/0xa0
asoc_simple_parse_clk+0x1b8/0x1dc [snd_soc_simple_card_utils]
simple_parse_node.isra.0+0x1ec/0x230 [snd_soc_simple_card]
simple_dai_link_of+0x1bc/0x334 [snd_soc_simple_card]
__simple_for_each_link+0x2ec/0x320 [snd_soc_simple_card]
asoc_simple_probe+0x468/0x4dc [snd_soc_simple_card]
platform_probe+0x90/0xf0
really_probe+0x118/0x5b0
__driver_probe_device+0xc0/0x1f0
driver_probe_device+0x68/0x120
__driver_attach+0x140/0x294
bus_for_each_dev+0xec/0x160
driver_attach+0x38/0x44
bus_add_driver+0x24c/0x300
driver_register+0xf0/0x210
__platform_driver_register+0x48/0x54
asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card]
do_one_initcall+0xac/0x340
do_init_module+0xd0/0x300
load_module+0x2ba4/0x3100
__do_sys_init_module+0x2c8/0x300
__arm64_sys_init_module+0x48/0x5c
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x124/0x154
do_el0_svc+0x44/0xdc
el0_svc+0x14/0x50
el0t_64_sync_handler+0xec/0x11c
el0t_64_sync+0x14c/0x150
The buggy address belongs to the object at ffffff800ee09600
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 136 bytes inside of
256-byte region [ffffff800ee09600, ffffff800ee09700)
The buggy address belongs to the physical page:
page:000000002d97303b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ee08
head:000000002d97303b order:1 compound_mapcount:0 compound_pincount:0
flags: 0x10200(slab|head|zone=0)
raw: 0000000000010200 0000000000000000 dead000000000122 ffffff8002c02480
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffff800ee09580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffff800ee09600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffff800ee09680: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffffff800ee09700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffff800ee09780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Fixes: abae8e57e4 ("clk: generalize devm_clk_get() a bit")
Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Link: https://lore.kernel.org/r/20230805084847.3110586-1-andrej.skvortzov@gmail.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1cbc11aaa0 upstream.
Commmit f5ea16137a ("NFSv4: Retry LOCK on OLD_STATEID during delegation
return") attempted to solve this problem by using nfs4's generic async error
handling, but introduced a regression where v4.0 lock recovery would hang.
The additional complexity introduced by overloading that error handling is
not necessary for this case. This patch expects that commit to be
reverted.
The problem as originally explained in the above commit is:
There's a small window where a LOCK sent during a delegation return can
race with another OPEN on client, but the open stateid has not yet been
updated. In this case, the client doesn't handle the OLD_STATEID error
from the server and will lose this lock, emitting:
"NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
Fix this by using the old_stateid refresh helpers if the server replies
with OLD_STATEID.
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a260f7d726 upstream.
The Lenovo Thinkbook 14s Yoga ITL has 4 new symbols/shortcuts on their
F9-F11 and PrtSc keys:
F9: Has a symbol of a head with a headset, the manual says "Service key"
F10: Has a symbol of a telephone horn which has been picked up from the
receiver, the manual says: "Answer incoming calls"
F11: Has a symbol of a telephone horn which is resting on the receiver,
the manual says: "Reject incoming calls"
PrtSc: Has a symbol of a siccor and a dashed ellipse, the manual says:
"Open the Windows 'Snipping' Tool app"
This commit adds support for these 4 new hkey events.
Signed-off-by: André Apitzsch <git@apitzsch.eu>
Link: https://lore.kernel.org/r/20230819-lenovo_keys-v1-1-9d34eac88e0a@apitzsch.eu
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b98c16107c upstream.
The commit 06470f7468 ("mac80211: add API to allow filtering frames in BA sessions")
added reorder_buf_filtered to mark frames filtered by firmware, and it
can only work correctly if hw.max_rx_aggregation_subframes <= 64 since
it stores the bitmap in a u64 variable.
However, new HE or EHT devices can support BlockAck number up to 256 or
1024, and then using a higher subframe index leads UBSAN warning:
UBSAN: shift-out-of-bounds in net/mac80211/rx.c:1129:39
shift exponent 215 is too large for 64-bit type 'long long unsigned int'
Call Trace:
<IRQ>
dump_stack_lvl+0x48/0x70
dump_stack+0x10/0x20
__ubsan_handle_shift_out_of_bounds+0x1ac/0x360
ieee80211_release_reorder_frame.constprop.0.cold+0x64/0x69 [mac80211]
ieee80211_sta_reorder_release+0x9c/0x400 [mac80211]
ieee80211_prepare_and_rx_handle+0x1234/0x1420 [mac80211]
ieee80211_rx_list+0xaef/0xf60 [mac80211]
ieee80211_rx_napi+0x53/0xd0 [mac80211]
Since only old hardware that supports <=64 BlockAck uses
ieee80211_mark_rx_ba_filtered_frames(), limit the use as it is, so add a
WARN_ONCE() and comment to note to avoid using this function if hardware
capability is not suitable.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20230818014004.16177-1-pkshih@realtek.com
[edit commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bfedba3b2c upstream.
When building for power4, newer binutils don't recognise the "dcbfl"
extended mnemonic.
dcbfl RA, RB is equivalent to dcbf RA, RB, 1.
Switch to "dcbf" to avoid the build error.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e74216b8de ]
The commit 14af9963ba ("bonding: Support macvlans on top of tlb/rlb mode
bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
the current rlb bond mode only handles ARP packets to update remote neighbor
entries. This causes an issue when a macvlan is on top of the bond, and
remote devices send packets to the macvlan using the bond's MAC address
as the destination. After delivering the packets to the macvlan, the macvlan
will rejects them as the MAC address is incorrect. Consequently, this commit
makes macvlan over bond non-functional.
To address this problem, one potential solution is to check for the presence
of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
and return NULL in the rlb_arp_xmit() function. However, this approach
doesn't fully resolve the situation when a VLAN exists between the bond and
macvlan.
So let's just do a partial revert for commit 14af9963ba in rlb_arp_xmit().
As the comment said, Don't modify or load balance ARPs that do not originate
locally.
Fixes: 14af9963ba ("bonding: Support macvlans on top of tlb/rlb mode bonds")
Reported-by: susan.zheng@veritas.com
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e51830e29 ]
Don't queue more gc work, else we may queue the same elements multiple
times.
If an element is flagged as dead, this can mean that either the previous
gc request was invalidated/discarded by a transaction or that the previous
request is still pending in the system work queue.
The latter will happen if the gc interval is set to a very low value,
e.g. 1ms, and system work queue is backlogged.
The sets refcount is 1 if no previous gc requeusts are queued, so add
a helper for this and skip gc run if old requests are pending.
Add a helper for this and skip the gc run in this case.
Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e1be4cdc9 ]
Several instances of pipapo_resize() don't propagate allocation failures,
this causes a crash when fault injection is enabled for gfp_kernel slabs.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8357bc946a ]
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to
protect the gc list.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 720344340f ]
Abort path is missing a synchronization point with GC transactions. Add
GC sequence number hence any GC transaction losing race will be
discarded.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c9f029328 ]
Destroy work waits for the RCU grace period then it releases the objects
with no mutex held. All releases objects follow this path for
transactions, therefore, order is guaranteed and references to top-level
objects in the hierarchy remain valid.
However, netlink notifier might interfer with pending destroy work.
rcu_barrier() is not correct because objects are not release via RCU
callback. Flush destroy work before releasing objects from netlink
notifier path.
Fixes: d4bc8271db ("netfilter: nf_tables: netlink notifier might race to release objects")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b80ced971 ]
We have to validate all tables in the transaction that are in
VALIDATE_DO state, the blamed commit below did not move the break
statement to its right location so we only validate one table.
Moreover, we can't init table->validate to _SKIP when a table object
is allocated.
If we do, then if a transcaction creates a new table and then
fails the transaction, nfnetlink will loop and nft will hang until
user cancels the command.
Add back the pernet state as a place to stash the last state encountered.
This is either _DO (we hit an error during commit validation) or _SKIP
(transaction passed all checks).
Fixes: 00c320f9b7 ("netfilter: nf_tables: make validation state per table")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da71714e35 ]
When replacing an existing root qdisc, with one that is of the same kind, the
request boils down to essentially a parameterization change i.e not one that
requires allocation and grafting of a new qdisc. syzbot was able to create a
scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
create and graft scenario.
The fix ensures that only when the qdisc kinds are different that we should
allow a create and graft, otherwise it goes into the "change" codepath.
While at it, fix the code and comments to improve readability.
While syzbot was able to create the issue, it did not zone on the root cause.
Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.
v1->V2 changes:
- remove "inline" function definition (Vladmir)
- remove extrenous braces in branches (Vladmir)
- change inline function names (Pedro)
- Run tdc tests (Victor)
v2->v3 changes:
- dont break else/if (Simon)
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0bfe711592 ]
The original implementation had a very simple handling for single frame
transmissions as it just sent the single frame without a timeout handling.
With the new echo frame handling the echo frame was also introduced for
single frames but the former exception ('simple without timers') has been
maintained by accident. This leads to a 1 second timeout when closing the
socket and to an -ECOMM error when CAN_ISOTP_WAIT_TX_DONE is selected.
As the echo handling is always active (also for single frames) remove the
wrong extra condition for single frames.
Fixes: 9f39d36530 ("can: isotp: add support for transmission without flow control")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20230821144547.6658-2-socketcan@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ecff05e6c ]
This reverts commit 7255355a06.
After this commit we are not able to attach VF to VM:
virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
error: Failed to attach interface
error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
ice_check_vf_ready_for_cfg() already contain waiting for reset.
New condition in ice_check_vf_ready_for_reset() causing only problems.
Fixes: 7255355a06 ("ice: Fix ice VF reset during iavf initialization")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 10083aef78 ]
The driver is misconfiguring the hardware for some values of MTU such that
it could use multiple descriptors to receive a packet when it could have
simply used one.
Change the driver to use a round-up instead of the result of a shift, as
the shift can truncate the lower bits of the size, and result in the
problem noted above. It also aligns this driver with similar code in i40e.
The insidiousness of this problem is that everything works with the wrong
size, it's just not working as well as it could, as some MTU sizes end up
using two or more descriptors, and there is no way to tell that is
happening without looking at ice_trace or a bus analyzer.
Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f866fbc842 ]
UDP sendmsg() is lockless, so ip_select_ident_segs()
can very well be run from multiple cpus [1]
Convert inet->inet_id to an atomic_t, but implement
a dedicated path for TCP, avoiding cost of a locked
instruction (atomic_add_return())
Note that this patch will cause a trivial merge conflict
because we added inet->flags in net-next tree.
v2: added missing change in
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
(David Ahern)
[1]
BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
ip_select_ident_segs include/net/ip.h:542 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
ip_select_ident_segs include/net/ip.h:541 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x184d -> 0x184e
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
==================================================================
Fixes: 23f57406b8 ("ipv4: avoid using shared IP generator for connected sockets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f534f6581e ]
veth and vxcan need to make sure the ifindexes of the peer
are not negative, core does not validate this.
Using iproute2 with user-space-level checking removed:
Before:
# ./ip link add index 10 type veth peer index -1
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff
10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff
-1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff
Now:
$ ./ip link add index 10 type veth peer index -1
Error: ifindex can't be negative.
This problem surfaced in net-next because an explicit WARN()
was added, the root cause is older.
Fixes: e6f8f1a739 ("veth: Allow to create peer link with given ifindex")
Fixes: a8f820a380 ("can: add Virtual CAN Tunnel driver (vxcan)")
Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32bbe64a13 ]
The fixed_phy_register() function returns error pointers and never
returns NULL. Update the checks accordingly.
Fixes: b0ba512e25 ("net: bcmgenet: enable driver to work without a device tree")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23a14488ea ]
The fixed_phy_register() function returns error pointers and never
returns NULL. Update the checks accordingly.
Fixes: c25b23b8a3 ("bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2572ce6241 ]
Based on the original code semantic in case of Clause 45 MDIO, the address
command is supposed to be followed by the command sending the MMD address,
not the CSR address. The commit 002dd3de09 ("net: mdio: mdio-bitbang:
Separate C22 and C45 transactions") has erroneously broken that. So most
likely due to an unfortunate variable name it switched the code to sending
the CSR address. In our case it caused the protocol malfunction so the
read operation always failed with the turnaround bit always been driven to
one by PHY instead of zero. Fix that by getting back the correct
behaviour: sending MMD address command right after the regular address
command.
Fixes: 002dd3de09 ("net: mdio: mdio-bitbang: Separate C22 and C45 transactions")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e94b590abf ]
802.1X PAE frames are link-local frames, therefore they must be trapped to
the CPU port. Currently, the MT753X switches treat 802.1X PAE frames as
regular multicast frames, therefore flooding them to user ports. To fix
this, set 802.1X PAE frames to be trapped to the CPU port(s).
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 348c976be0 ]
The field 'virtual router' was extended to 12 bits in Spectrum-4.
Therefore, the element 'MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB' needs 3 bits for
Spectrum < 4 and 4 bits for Spectrum >= 4.
The elements are stored in an internal storage scratchpad. Currently, the
MSB is defined there as 3 bits. It means that for Spectrum-4, only 2K VRFs
can be used for multicast routing, as the highest bit is not really used by
the driver. Fix the definition of 'VIRT_ROUTER_MSB' to use 4 bits. Adjust
the definitions of 'virtual router' field in the blocks accordingly - use
'_avoid_size_check' for Spectrum-2 instead of for Spectrum-4. Fix the mask
in parse function to use 4 bits.
Fixes: 6d5d8ebb88 ("mlxsw: Rename virtual router flex key element")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/79bed2b70f6b9ed58d4df02e9798a23da648015b.1692268427.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0dc63b9cfd ]
The two most significant bits of the "local_port" field in the SSPR
register are always cleared since they are overwritten by the deprecated
and overlapping "sub_port" field.
On systems with more than 255 local ports (e.g., Spectrum-4), this
results in the firmware maintaining invalid mappings between system port
and local port. Specifically, two different systems ports (0x1 and
0x101) point to the same local port (0x1), which eventually leads to
firmware errors.
Fix by removing the deprecated "sub_port" field.
Fixes: fd24b29a1b ("mlxsw: reg: Align existing registers to use extended local_port field")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/9b909a3033c8d3d6f67f237306bef4411c5e6ae4.1692268427.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc2de151ab ]
Currently, in Spectrum-2 and above, time stamps are extracted from the CQE
into the time stamp fields in 'struct mlxsw_skb_cb', only when the CQE
time stamp type is UTC. The time stamps are read directly from the CQE and
software can get the time stamp in UTC format using CQEv2.
From Spectrum-4, the time stamps that are read from the CQE are allowed
to be also from MIRROR_UTC type.
Therefore, we get a warning [1] from the driver that the time stamp fields
were not set, when LLDP control packet is sent.
Allow the time stamp type to be MIRROR_UTC and set the time stamp in this
case as well.
[1]
WARNING: CPU: 11 PID: 0 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1409 mlxsw_sp2_ptp_hwtstamp_fill+0x1f/0x70 [mlxsw_spectrum]
[...]
Call Trace:
<IRQ>
mlxsw_sp2_ptp_receive+0x3c/0x80 [mlxsw_spectrum]
mlxsw_core_skb_receive+0x119/0x190 [mlxsw_core]
mlxsw_pci_cq_tasklet+0x3c9/0x780 [mlxsw_pci]
tasklet_action_common.constprop.0+0x9f/0x110
__do_softirq+0xbb/0x296
irq_exit_rcu+0x79/0xa0
common_interrupt+0x86/0xa0
</IRQ>
<TASK>
Fixes: 4735402173 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/bcef4d044ef608a4e258d33a7ec0ecd91f480db5.1692268427.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 043d5f68d0 ]
There are two network devices(veth1 and veth3) in ns1, and ipvlan1 with
L3S mode and ipvlan2 with L2 mode are created based on them as
figure (1). In this case, ipvlan_register_nf_hook() will be called to
register nf hook which is needed by ipvlans in L3S mode in ns1 and value
of ipvl_nf_hook_refcnt is set to 1.
(1)
ns1 ns2
------------ ------------
veth1--ipvlan1 (L3S)
veth3--ipvlan2 (L2)
(2)
ns1 ns2
------------ ------------
veth1--ipvlan1 (L3S)
ipvlan2 (L2) veth3
| |
|------->-------->--------->--------
migrate
When veth3 migrates from ns1 to ns2 as figure (2), veth3 will register in
ns2 and calls call_netdevice_notifiers with NETDEV_REGISTER event:
dev_change_net_namespace
call_netdevice_notifiers
ipvlan_device_event
ipvlan_migrate_l3s_hook
ipvlan_register_nf_hook(newnet) (I)
ipvlan_unregister_nf_hook(oldnet) (II)
In function ipvlan_migrate_l3s_hook(), ipvl_nf_hook_refcnt in ns1 is not 0
since veth1 with ipvlan1 still in ns1, (I) and (II) will be called to
register nf_hook in ns2 and unregister nf_hook in ns1. As a result,
ipvl_nf_hook_refcnt in ns1 is decreased incorrectly and this in ns2
is increased incorrectly. When the second net namespace is removed, a
reference count leak warning in ipvlan_ns_exit() will be triggered.
This patch add a check before ipvlan_migrate_l3s_hook() is called. The
warning can be triggered as follows:
$ ip netns add ns1
$ ip netns add ns2
$ ip netns exec ns1 ip link add veth1 type veth peer name veth2
$ ip netns exec ns1 ip link add veth3 type veth peer name veth4
$ ip netns exec ns1 ip link add ipv1 link veth1 type ipvlan mode l3s
$ ip netns exec ns1 ip link add ipv2 link veth3 type ipvlan mode l2
$ ip netns exec ns1 ip link set veth3 netns ns2
$ ip net del ns2
Fixes: 3133822f5a ("ipvlan: use pernet operations and restrict l3s hooks to master netns")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230817145449.141827-1-luwei32@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d44036cad3 ]
The blamed commit resolved a bug where frames would still get stuck at
egress, even though they're smaller than the maxSDU[tc], because the
driver did not take into account the extra 33 ns that the queue system
needs for scheduling the frame.
It now takes that into account, but the arithmetic that we perform in
vsc9959_tas_remaining_gate_len_ps() is buggy, because we operate on
64-bit unsigned integers, so gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS
may become a very large integer if gate_len_ns < 33 ns.
In practice, this means that we've introduced a regression where all
traffic class gates which are permanently closed will not get detected
by the driver, and we won't enable oversize frame dropping for them.
Before:
mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 1 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 2 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 3 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 4 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 5 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 6 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
After:
mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
Fixes: 11afdc6526 ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230817120111.3522827-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eecb91b9f9 ]
Kmemleak report a leak in graph_trace_open():
unreferenced object 0xffff0040b95f4a00 (size 128):
comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
hex dump (first 32 bytes):
e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
backtrace:
[<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
[<000000007df90faa>] graph_trace_open+0xb0/0x344
[<00000000737524cd>] __tracing_open+0x450/0xb10
[<0000000098043327>] tracing_open+0x1a0/0x2a0
[<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
[<000000004015bcd6>] vfs_open+0x98/0xd0
[<000000002b5f60c9>] do_open+0x520/0x8d0
[<00000000376c7820>] path_openat+0x1c0/0x3e0
[<00000000336a54b5>] do_filp_open+0x14c/0x324
[<000000002802df13>] do_sys_openat2+0x2c4/0x530
[<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
[<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
[<00000000313647bf>] do_el0_svc+0xac/0xec
[<000000002ef1c651>] el0_svc+0x20/0x30
[<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
[<000000000c309c35>] el0_sync+0x160/0x180
The root cause is descripted as follows:
__tracing_open() { // 1. File 'trace' is being opened;
...
*iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is
// currently set;
...
iter->trace->open(iter); // 3. Call graph_trace_open() here,
// and memory are allocated in it;
...
}
s_start() { // 4. The opened file is being read;
...
*iter->trace = *tr->current_trace; // 5. If tracer is switched to
// 'nop' or others, then memory
// in step 3 are leaked!!!
...
}
To fix it, in s_start(), close tracer before switching then reopen the
new tracer after switching. And some tracers like 'wakeup' may not update
'iter->private' in some cases when reopen, then it should be cleared
to avoid being mistakenly closed again.
Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com
Fixes: d7350c3f45 ("tracing/core: make the read callbacks reentrants")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4d6b54381 ]
While debugging another issue I noticed that the stack trace contains one
invalid entry at the end:
<idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x150/0x2c0
=> kcompactd+0x9ca/0xc20
=> kthread+0x2f6/0x3d8
=> __ret_from_fork+0x8a/0xe8
=> 0x6b6b6b6b6b6b6b6b
This is because the code failed to add the one element containing the
number of entries to field_size.
Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 887f92e09e ]
While debugging another issue I noticed that the stack trace output
contains the number of entries on top:
<idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
=> 0x10
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x242/0x2c0
=> __wait_for_common+0x434/0x680
=> __wait_rcu_gp+0x198/0x3e0
=> synchronize_rcu+0x112/0x138
=> ring_buffer_reset_online_cpus+0x140/0x2e0
=> tracing_reset_online_cpus+0x15c/0x1d0
=> tracing_set_clock+0x180/0x1d8
=> hist_register_trigger+0x486/0x670
=> event_hist_trigger_parse+0x494/0x1318
=> trigger_process_regex+0x1d4/0x258
=> event_trigger_write+0xb4/0x170
=> vfs_write+0x210/0xad0
=> ksys_write+0x122/0x208
Fix this by skipping the first element. Also replace the pointer
logic with an index variable which is easier to read.
Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b71645d6af ]
Trace ring buffer can no longer record anything after executing
following commands at the shell prompt:
# cd /sys/kernel/tracing
# cat tracing_cpumask
fff
# echo 0 > tracing_cpumask
# echo 1 > snapshot
# echo fff > tracing_cpumask
# echo 1 > tracing_on
# echo "hello world" > trace_marker
-bash: echo: write error: Bad file descriptor
The root cause is that:
1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers
in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask());
2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped
with 'tr->max_buffer.buffer', then the 'record_disabled' became 0
(see update_max_tr());
3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1;
Then array_buffer and max_buffer are both unavailable due to value of
'record_disabled' is not 0.
To fix it, enable or disable both array_buffer and max_buffer at the same
time in tracing_set_cpumask().
Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Fixes: 71babb2705 ("tracing: change CPU ring buffer state from tracing_cpumask")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 609a1bcd7b ]
When the code to use the PTP HW clock was added, it didn't update
the Kconfig entry for the PTP dependency, leading to build errors,
so update the Kconfig entry to depend on PTP_1588_CLOCK_OPTIONAL.
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.o: in function `iwl_mvm_ptp_init':
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:294: undefined reference to `ptp_clock_register'
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:294:(.text+0xce8): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_register'
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:301: undefined reference to `ptp_clock_index'
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:301:(.text+0xd18): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index'
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.o: in function `iwl_mvm_ptp_remove':
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:315: undefined reference to `ptp_clock_index'
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:315:(.text+0xe80): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index'
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:319: undefined reference to `ptp_clock_unregister'
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:319:(.text+0xeac): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_unregister'
Fixes: 1595ecce1c ("wifi: iwlwifi: mvm: add support for PTP HW clock (PHC)")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/all/202308110447.4QSJHmFH-lkp@intel.com/
Cc: Krishnanand Prabhu <krishnanand.prabhu@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Gregory Greenman <gregory.greenman@intel.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230812052947.22913-1-rdunlap@infradead.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee8b94c851 ]
Got kmemleak errors with the following ltp can_filter testcase:
for ((i=1; i<=100; i++))
do
./can_filter &
sleep 0.1
done
==============================================================
[<00000000db4a4943>] can_rx_register+0x147/0x360 [can]
[<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw]
[<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0
[<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70
[<00000000fd468496>] do_syscall_64+0x33/0x40
[<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
It's a bug in the concurrent scenario of unregister_netdevice_many()
and raw_release() as following:
cpu0 cpu1
unregister_netdevice_many(can_dev)
unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this
net_set_todo(can_dev)
raw_release(can_socket)
dev = dev_get_by_index(, ro->ifindex); // dev == NULL
if (dev) { // receivers in dev_rcv_lists not free because dev is NULL
raw_disable_allfilters(, dev, );
dev_put(dev);
}
...
ro->bound = 0;
...
call_netdevice_notifiers(NETDEV_UNREGISTER, )
raw_notify(, NETDEV_UNREGISTER, )
if (ro->bound) // invalid because ro->bound has been set 0
raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed
Add a net_device pointer member in struct raw_sock to record bound
can_dev, and use rtnl_lock to serialize raw_socket members between
raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use
ro->dev to decide whether to free receivers in dev_rcv_lists.
Fixes: 8d0caedb75 ("can: bcm/raw/isotp: use per module netdevice notifier")
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46f881b5b1 ]
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable@vger.kernel.org
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 40613da52b ]
When using ACPI PCI hotplug, hotplugging a device with large BARs may fail
if bridge windows programmed by firmware are not large enough.
Reproducer:
$ qemu-kvm -monitor stdio -M q35 -m 4G \
-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=on \
-device id=rp1,pcie-root-port,bus=pcie.0,chassis=4 \
disk_image
wait till linux guest boots, then hotplug device:
(qemu) device_add qxl,bus=rp1
hotplug on guest side fails with:
pci 0000:01:00.0: [1b36:0100] type 00 class 0x038000
pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x03ffffff]
pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x03ffffff]
pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 0x1c: [io 0x0000-0x001f]
pci 0000:01:00.0: BAR 0: no space for [mem size 0x04000000]
pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x04000000]
pci 0000:01:00.0: BAR 1: no space for [mem size 0x04000000]
pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x04000000]
pci 0000:01:00.0: BAR 2: assigned [mem 0xfe800000-0xfe801fff]
pci 0000:01:00.0: BAR 3: assigned [io 0x1000-0x101f]
qxl 0000:01:00.0: enabling device (0000 -> 0003)
Unable to create vram_mapping
qxl: probe of 0000:01:00.0 failed with error -12
However when using native PCIe hotplug
'-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off'
it works fine, since kernel attempts to reassign unused resources.
Use the same machinery as native PCIe hotplug to (re)assign resources.
Link: https://lore.kernel.org/r/20230424191557.2464760-1-imammedo@redhat.com
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 895cedc179 ]
On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA-
unmapping the Receive buffers, but rpcrdma_post_recvs() neglected
to remap them after a new connection had been established. The
result was immediate failure of the new connection with the Receives
flushing with LOCAL_PROT_ERR.
Fixes: 671c450b6f ("xprtrdma: Fix oops in Receive handler after device removal")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4e89f1a6d ]
Another highly rare error case when a page allocating loop (inside
__nfs4_get_acl_uncached, this time) is not properly unwound on error.
Since pages array is allocated being uninitialized, need to free only
lower array indices. NULL checks were useful before commit 62a1573fcf
("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had
been initialized to zero on stack.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 62a1573fcf ("NFSv4 fix acl retrieval over krb5i/krb5p mounts")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e3733fd2b ]
There is a slight issue with error handling code inside
nfs42_proc_getxattr(). If page allocating loop fails then we free the
failing page array element which is NULL but __free_page() can't deal with
NULL args.
Found by Linux Verification Center (linuxtesting.org).
Fixes: a1f26739cc ("NFSv4.2: improve page handling for GETXATTR")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit e4dd0d3a2f upstream.
In the real workload, I encountered an issue which could cause the RTO
timer to retransmit the skb per 1ms with linear option enabled. The amount
of lost-retransmitted skbs can go up to 1000+ instantly.
The root cause is that if the icsk_rto happens to be zero in the 6th round
(which is the TCP_THIN_LINEAR_RETRIES value), then it will always be zero
due to the changed calculation method in tcp_retransmit_timer() as follows:
icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
Above line could be converted to
icsk->icsk_rto = min(0 << 1, TCP_RTO_MAX) = 0
Therefore, the timer expires so quickly without any doubt.
I read through the RFC 6298 and found that the RTO value can be rounded
up to a certain value, in Linux, say TCP_RTO_MIN as default, which is
regarded as the lower bound in this patch as suggested by Eric.
Fixes: 36e31b0af5 ("net: TCP thin linear timeouts")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ffd6f0442 upstream.
This can clean up all irq warnings because of unbalanced
amdgpu_irq_get/put when unplugging/unbinding device, and leave
irq count decrease in each ip fini function.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 812a05256d upstream.
The vangogh driver just gained a link time dependency that now causes
randconfig builds to fail:
x86_64-linux-ld: sound/soc/amd/vangogh/pci-acp5x.o: in function `snd_acp5x_probe':
pci-acp5x.c:(.text+0xbb): undefined reference to `snd_amd_acp_find_config'
Fixes: e89f45edb7 ("ASoC: amd: vangogh: Add check for acp config flags in vangogh platform")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230602124447.863476-1-arnd@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1740b1ab2 upstream.
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.
[ 203.349571] [drm] Fence fallback timer expired on ring sdma0
[ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0
For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7b7d9e8ae upstream.
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry. This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.
To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.
commit 4b31b92b14 ("drm/amdgpu: complete gfxoff allow signal during
suspend without delay") modified power gating flow so that if called
in s0ix that it ensured that GFXOFF wasn't put in work queue but
instead processed immediately.
This is dead code due to commit 10cb67eb8a ("drm/amdgpu: skip
CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly
called as part of the suspend entry code. Remove that dead code.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dce6d8f985 upstream.
mmc_add_host() may return error, if we ignore its return value,
1. the memory allocated in mmc_alloc_host() will be leaked
2. null-ptr-deref will happen when calling mmc_remove_host()
in remove function spmmc_drv_remove() because deleting not
added device.
Fix this by checking the return value of mmc_add_host(). Moreover,
I fixed the error handling path of spmmc_drv_probe() to clean up.
Fixes: 4e268fed8b ("mmc: Add mmc driver for Sunplus SP7021")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Link: https://lore.kernel.org/r/20230622090233.188539-1-harperchen1110@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b430d4ac9 upstream.
For a completed request, after the mmc_blk_mq_complete_rq(mq, req)
function is executed, the bitmap_tags corresponding to the
request will be cleared, that is, the request will be regarded as
idle. If the request is acquired by a different type of process at
this time, the issue_type of the request may change. It further
caused the value of mq->in_flight[issue_type] to be abnormal,
and a large number of requests could not be sent.
p1: p2:
mmc_blk_mq_complete_rq
blk_mq_free_request
blk_mq_get_request
blk_mq_rq_ctx_init
mmc_blk_mq_dec_in_flight
mmc_issue_type(mq, req)
This strategy can ensure the consistency of issue_type
before and after executing mmc_blk_mq_complete_rq.
Fixes: 81196976ed ("mmc: block: Add blk-mq support")
Cc: stable@vger.kernel.org
Signed-off-by: Yibin Ding <yibin.ding@unisoc.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20230802023023.1318134-1-yunlong.xing@unisoc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c984ff1423 upstream.
blk_crypto_profile_init() calls lockdep_register_key(), which warns and
does not register if the provided memory is a static object.
blk-crypto-fallback currently has a static blk_crypto_profile and calls
blk_crypto_profile_init() thereupon, resulting in the warning and
failure to register.
Fortunately it is simple enough to use a dynamically allocated profile
and make lockdep function correctly.
Fixes: 2fb48d88e7 ("blk-crypto: use dynamic lock class for blk_crypto_profile::lock")
Cc: stable@vger.kernel.org
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230817141615.15387-1-sweettea-kernel@dorminy.me
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f43f549cd upstream.
When the value of ZT is set via ptrace we don't disable traps for SME.
This means that when a the task has never used SME before then the value
set via ptrace will never be seen by the target task since it will
trigger a SME access trap which will flush the register state.
Disable SME traps when setting ZT, this means we also need to allocate
storage for SVE if it is not already allocated, for the benefit of
streaming SVE.
Fixes: f90b529bcb ("arm64/sme: Implement ZT0 ptrace support")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 6.3.x
Link: https://lore.kernel.org/r/20230816-arm64-zt-ptrace-first-use-v2-1-00aa82847e28@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d0a8d2fba upstream.
When we use NT_ARM_SSVE to either enable streaming mode or change the
vector length for a process we do not currently do anything to ensure that
there is storage allocated for the SME specific register state. If the
task had not previously used SME or we changed the vector length then
the task will not have had TIF_SME set or backing storage for ZA/ZT
allocated, resulting in inconsistent register sizes when saving state
and spurious traps which flush the newly set register state.
We should set TIF_SME to disable traps and ensure that storage is
allocated for ZA and ZT if it is not already allocated. This requires
modifying sme_alloc() to make the flush of any existing register state
optional so we don't disturb existing state for ZA and ZT.
Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 5.19.x
Link: https://lore.kernel.org/r/20230810-arm64-fix-ptrace-race-v1-1-a5361fad2bd6@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebceec271e upstream.
This patch fixes an issue affecting the Wifi/Bluetooth connectivity on
ROCK Pi 4 boards. Commit f471b1b2db ("arm64: dts: rockchip: Fix Bluetooth
on ROCK Pi 4 boards") introduced a problem with the clock configuration.
Specifically, the clock-names property of the sdio-pwrseq node was not
updated to 'lpo', causing the driver to wait indefinitely for the wrong clock
signal 'ext_clock' instead of the expected one 'lpo'. This prevented the proper
initialization of Wifi/Bluetooth chip on ROCK Pi 4 boards.
To address this, this patch updates the clock-names property of the
sdio-pwrseq node to "lpo" to align with the changes made to the bluetooth node.
This patch has been tested on ROCK Pi 4B.
Fixes: f471b1b2db ("arm64: dts: rockchip: Fix Bluetooth on ROCK Pi 4 boards")
Cc: stable@vger.kernel.org
Signed-off-by: Yogesh Hegde <yogi.kernel@gmail.com>
Link: https://lore.kernel.org/r/ZLbATQRjOl09aLAp@zephyrusG14
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c507ce90e upstream.
Kernel uses `struct virtio_net_ctrl_rss` to save command-specific-data
for both the VIRTIO_NET_CTRL_MQ_HASH_CONFIG and
VIRTIO_NET_CTRL_MQ_RSS_CONFIG commands.
According to the VirtIO standard, "Field reserved MUST contain zeroes.
It is defined to make the structure to match the layout of
virtio_net_rss_config structure, defined in 5.1.6.5.7.".
Yet for the VIRTIO_NET_CTRL_MQ_HASH_CONFIG command case, the `max_tx_vq`
field in struct virtio_net_ctrl_rss, which corresponds to the
`reserved` field in struct virtio_net_hash_config, is not zeroed,
thereby violating the VirtIO standard.
This patch solves this problem by zeroing this field in
virtnet_init_default_rss().
Cc: Andrew Melnychenko <andrew@daynix.com>
Cc: stable@vger.kernel.org
Fixes: c7114b1249 ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20230810110405.25558-1-yin31149@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 270d73e650 upstream.
Commit abdb1742a3 removed code that clears ctx->username when sec=none, so attempting
to mount with '-o sec=none' now fails with -EACCES. Fix it by adding that logic to the
parsing of the 'sec' option, as well as checking if the mount is using null auth before
setting the username when parsing the 'user' option.
Fixes: abdb1742a3 ("cifs: get rid of mount options string parsing")
Cc: stable@vger.kernel.org
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a894c8737 upstream.
For the TLB_PTLOCK checks we used an optimization to store the spc
register into the spinlock to unlock it. This optimization works as
long as the lightweight spinlock checks (CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK)
aren't enabled, because they really check if the lock word is zero or
__ARCH_SPIN_LOCK_UNLOCKED_VAL and abort with a kernel crash
("Spinlock was trashed") otherwise.
Drop that optimization to make it possible to activate both checks
at the same time.
Noticed-by: Sam James <sam@gentoo.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Sam James <sam@gentoo.org>
Cc: stable@vger.kernel.org # v6.4+
Fixes: 15e64ef652 ("parisc: Add lightweight spinlock checks")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69513dd669 upstream.
Under the current code, when cifs_readpage_worker is called, the call
contract is that the callee should unlock the page. This is documented
in the read_folio section of Documentation/filesystems/vfs.rst as:
> The filesystem should unlock the folio once the read has completed,
> whether it was successful or not.
Without this change, when fscache is in use and cache hit occurs during
a read, the page lock is leaked, producing the following stack on
subsequent reads (via mmap) to the page:
$ cat /proc/3890/task/12864/stack
[<0>] folio_wait_bit_common+0x124/0x350
[<0>] filemap_read_folio+0xad/0xf0
[<0>] filemap_fault+0x8b1/0xab0
[<0>] __do_fault+0x39/0x150
[<0>] do_fault+0x25c/0x3e0
[<0>] __handle_mm_fault+0x6ca/0xc70
[<0>] handle_mm_fault+0xe9/0x350
[<0>] do_user_addr_fault+0x225/0x6c0
[<0>] exc_page_fault+0x84/0x1b0
[<0>] asm_exc_page_fault+0x27/0x30
This requires a reboot to resolve; it is a deadlock.
Note however that the call to cifs_readpage_from_fscache does mark the
page clean, but does not free the folio lock. This happens in
__cifs_readpage_from_fscache on success. Releasing the lock at that
point however is not appropriate as cifs_readahead also calls
cifs_readpage_from_fscache and *does* unconditionally release the lock
after its return. This change therefore effectively makes
cifs_readpage_worker work like cifs_readahead.
Signed-off-by: Russell Harmon <russ@har.mn>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dfe2aeb226 ]
Unloading a hardware specific 8250 driver can produce error "Unable to
handle kernel paging request at virtual address" about ten seconds after
unloading the driver. This happens on uart_hangup() calling
uart_change_pm().
Turns out commit 04e82793f0 ("serial: 8250: Reinit port->pm on port
specific driver unbind") was only a partial fix. If the hardware specific
driver has initialized port->pm function, we need to clear port->pm too.
Just reinitializing port->ops does not do this. Otherwise serial8250_pm()
will call port->pm() instead of serial8250_do_pm().
Fixes: 04e82793f0 ("serial: 8250: Reinit port->pm on port specific driver unbind")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20230804131553.52927-1-tony@atomide.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b05b99390 ]
It was reported that the riscv kernel hangs while executing the test
in [1].
Indeed, the test hangs when trying to write a buffer to a file. The
problem is that the riscv implementation of raw_copy_from_user() does not
return the correct number of bytes not written when an exception happens
and is fixed up, instead it always returns the initial size to copy,
even if some bytes were actually copied.
generic_perform_write() pre-faults the user pages and bails out if nothing
can be written, otherwise it will access the userspace buffer: here the
riscv implementation keeps returning it was not able to copy any byte
though the pre-faulting indicates otherwise. So generic_perform_write()
keeps retrying to access the user memory and ends up in an infinite
loop.
Note that before the commit mentioned in [1] that introduced this
regression, it worked because generic_perform_write() would bail out if
only one byte could not be written.
So fix this by returning the number of bytes effectively not written in
__asm_copy_[to|from]_user() and __clear_user(), as it is expected.
Link: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/ [1]
Fixes: ebcbd75e39 ("riscv: Fix the bug in memory access fixup code")
Reported-by: Bo YU <tsu.yubo@gmail.com>
Closes: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/#t
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Closes: https://lore.kernel.org/linux-riscv/ZNOnCakhwIeue3yr@aurel32.net/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Link: https://lore.kernel.org/r/20230811150604.1621784-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79bc3f85c5 ]
The instructions c.jr and c.jalr must have rs1 != 0, but
riscv_insn_is_c_jr() and riscv_insn_is_c_jalr() do not check for this. So,
riscv_insn_is_c_jr() can match a reserved encoding, while
riscv_insn_is_c_jalr() can match the c.ebreak instruction.
Rewrite them with check for rs1 != 0.
Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Fixes: ec5f908775 ("RISC-V: Move riscv_insn_is_* macros into a common header")
Link: https://lore.kernel.org/r/20230731183925.152145-1-namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52449c17bd ]
When we test seccomp with 6.4 kernel, we found errno has wrong value.
If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf5058, we will
get ENOSYS instead. We got same result with commit 9c2598d435 ("riscv:
entry: Save a0 prior syscall_enter_from_user_mode()").
After analysing code, we think that regs->a0 = -ENOSYS should only be
executed when syscall != -1. In __seccomp_filter, when seccomp rejected
this syscall with specified errno, they will set a0 to return number as
syscall ABI, and then return -1. This return number is finally pass as
return number of syscall_enter_from_user_mode, and then is compared with
NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
is always executed. It covered a0 set by seccomp, so we always get
ENOSYS when match seccomp RET_ERRNO rule.
Fixes: f0bddf5058 ("riscv: entry: Convert to generic entry")
Reported-by: Felix Yan <felixonmars@archlinux.org>
Co-developed-by: Ruizhe Pan <c141028@gmail.com>
Signed-off-by: Ruizhe Pan <c141028@gmail.com>
Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
Tested-by: Felix Yan <felixonmars@archlinux.org>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230801141607.435192-1-CoelacanthusHex@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46cdff2369 ]
Set spec->en_3kpull_low default to true.
Then fillback ALC236 and ALC257 to false.
Additional note: this addresses a regression caused by the previous
fix 69ea4c9d02 ("ALSA: hda/realtek - remove 3k pull low procedure").
The previous workaround was applied too widely without necessity,
which resulted in the pop noise at PM again. This patch corrects the
condition and restores the old behavior for the devices that don't
suffer from the original problem.
Fixes: 69ea4c9d02 ("ALSA: hda/realtek - remove 3k pull low procedure")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217732
Link: https://lore.kernel.org/r/01e212a538fc407ca6edd10b81ff7b05@realtek.com
Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9757300d27 ]
SA8775 and newer target have added support for an increased number of
interrupt targets. To implement this change, the intr_target field, which
is used to configure the interrupt target in the interrupt configuration
register is increased from 3 bits to 4 bits.
In accordance to these updates, a new intr_target_width member is
introduced in msm_pingroup structure. This member stores the value of
width of intr_target field in the interrupt configuration register. This
value is used to dynamically calculate and generate mask for setting the
intr_target field. By default, this mask is set to 3 bit wide, to ensure
backward compatibility with the older targets.
Fixes: 4b6b185599 ("pinctrl: qcom: add the tlmm driver sa8775p platforms")
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8775p-ride
Signed-off-by: Ninad Naik <quic_ninanaik@quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Link: https://lore.kernel.org/r/20230809100634.3961-1-quic_ninanaik@quicinc.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1f848f121 ]
When the tdm lane mask is computed, the driver currently fills the 1st lane
before moving on to the next. If the stream has less channels than the
lanes can accommodate, slots will be disabled on the last lanes.
Unfortunately, the HW distribute channels in a different way. It distribute
channels in pair on each lanes before moving on the next slots.
This difference leads to problems if a device has an interface with more
than 1 lane and with more than 2 slots per lane.
For example: a playback interface with 2 lanes and 4 slots each (total 8
slots - zero based numbering)
- Playing a 8ch stream:
- All slots activated by the driver
- channel #2 will be played on lane #1 - slot #0 following HW placement
- Playing a 4ch stream:
- Lane #1 disabled by the driver
- channel #2 will be played on lane #0 - slot #2
This behaviour is obviously not desirable.
Change the way slots are activated on the TDM lanes to follow what the HW
does and make sure each channel always get mapped to the same slot/lane.
Fixes: 1a11d88f49 ("ASoC: meson: add tdm formatter base driver")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230809171931.1244502-1-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78e869dd8b ]
Although the memory map of i.MX93 reference manual rev. 2 claims that
analog top has start address of 0x44480000 and end address of 0x4448ffff,
this overlaps with TMU memory area starting at 0x44482000, as stated in
section 73.6.1.
As PLL configuration registers start at addresses up to 0x44481400, as used
by clk-imx93, reduce the anatop size to 0x2000, so exclude the TMU area
but keep all PLL registers inside.
Fixes: ec8b5b5058 ("arm64: dts: freescale: Add i.MX93 dtsi support")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a2b96e42a ]
If the tuning step is not set, the tuning step is set to 1.
For some sd cards, the following Tuning timeout will occur.
Tuning failed, falling back to fixed sampling clock
So set the default tuning step. This refers to the NXP vendor's
commit below:
https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/
arch/arm/boot/dts/imx6sx.dtsi#L1108-L1109
Fixes: 1e336aa0c0 ("mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f02b53375e ]
The CSI1 PHY reference clock is limited to 125 MHz according to:
i.MX 8M Mini Applications Processor Reference Manual, Rev. 3, 11/2020
Table 5-1. Clock Root Table (continued) / page 307
Slice Index n = 123 .
Currently the IMX8MM_CLK_CSI1_PHY_REF clock is configured to be
fed directly from 1 GHz PLL2 , which overclocks them. Instead, drop
the configuration altogether, which defaults the clock to 24 MHz REF
clock input, which for the PHY reference clock is just fine.
Based on a patch from Marek Vasut for the imx8mn.
Fixes: e523b7c54c ("arm64: dts: imx8mm: Add CSI nodes")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Marek Vasut <marex@denx.de>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be18293e47 ]
If the tuning step is not set, the tuning step is set to 1.
For some sd cards, the following Tuning timeout will occur.
Tuning failed, falling back to fixed sampling clock
mmc0: Tuning failed, falling back to fixed sampling clock
So set the default tuning step. This refers to the NXP vendor's
commit below:
https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/
arch/arm/boot/dts/imx7s.dtsi#L1216-L1217
Fixes: 1e336aa0c0 ("mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e9f5cd85f1 ]
Currently the dtbs_check generates warnings like this:
$nodename:0: 'dma-apbh@110000' does not match '^dma-controller(@.*)?$'
So fix all affected dma-apbh node names.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Stable-dep-of: be18293e47 ("ARM: dts: imx: Set default tuning step for imx7d usdhc")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 762b700982 ]
RTC interrupt level should be set to "LOW". This was revealed by the
introduction of commit:
f181987ef4 ("rtc: m41t80: use IRQ flags obtained from fwnode")
which changed the way IRQ type is obtained.
Signed-off-by: Andrej Picej <andrej.picej@norik.com>
Reviewed-by: Stefan Riedmüller <s.riedmueller@phytec.de>
Fixes: 800d595151 ("ARM: dts: imx6: Add initial support for phyBOARD-Mira")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2bd1d2dd80 ]
There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
in HS400 mode. This ends up resulting in some block errors after a while
or after a "heavy" operation utilising the eMMC (e.g. resizing a
filesystem). An example of these errors is as follows:
[ 289.171014] mmc1: running CQE recovery
[ 290.048972] mmc1: running CQE recovery
[ 290.054834] mmc1: running CQE recovery
[ 290.060817] mmc1: running CQE recovery
[ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0
[ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466)
[ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288
[ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289
[ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290
[ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291
[ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292
[ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293
[ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294
[ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295
[ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296
[ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297
Disabling the Command Queue seems to stop the CQE recovery from running,
but doesn't seem to improve the I/O errors. Until this can be investigated
further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
errors from occurring.
Fixes: 246450344d ("arm64: dts: rockchip: rk3399: Radxa ROCK 4C+")
Signed-off-by: Christopher Obbard <chris.obbard@collabora.com>
Link: https://lore.kernel.org/r/20230705144255.115299-3-chris.obbard@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cee572756a ]
There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
in HS400 mode. This ends up resulting in some block errors after a while
or after a "heavy" operation utilising the eMMC (e.g. resizing a
filesystem). An example of these errors is as follows:
[ 289.171014] mmc1: running CQE recovery
[ 290.048972] mmc1: running CQE recovery
[ 290.054834] mmc1: running CQE recovery
[ 290.060817] mmc1: running CQE recovery
[ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0
[ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466)
[ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288
[ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289
[ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290
[ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291
[ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292
[ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293
[ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294
[ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295
[ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296
[ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297
Disabling the Command Queue seems to stop the CQE recovery from running,
but doesn't seem to improve the I/O errors. Until this can be investigated
further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
errors from occurring.
While we are here, set the eMMC maximum clock frequency to 1.5MHz to
follow the ROCK 4C+.
Fixes: 1b5715c602 ("arm64: dts: rockchip: add ROCK Pi 4 DTS support")
Signed-off-by: Christopher Obbard <chris.obbard@collabora.com>
Tested-By: Folker Schwesinger <dev@folker-schwesinger.de>
Link: https://lore.kernel.org/r/20230705144255.115299-2-chris.obbard@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 798f1df86e ]
The commit 3a786086c6 ("arm64: dts: qcom: Add missing "-thermal"
suffix for thermal zones") renamed the thermal zone in the pm8150l.dtsi
file to comply with the schema. However this resulted in a clash with
the RB5 board file, which already contained the pm8150l-thermal zone for
the on-board sensor. This resulted in the board file definition
overriding the thermal zone defined in the PMIC include file (and thus
the on-die PMIC temp alarm was not probing at all).
Rename the thermal zone in qcom/qrb5165-rb5.dts to remove this override.
Fixes: 3a786086c6 ("arm64: dts: qcom: Add missing "-thermal" suffix for thermal zones")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20230613131224.666668-1-dmitry.baryshkov@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34539b442b ]
The am335x devices started producing boot errors for resetting musb module
in because of subtle timing changes:
Unhandled fault: external abort on non-linefetch (0x1008)
...
sysc_poll_reset_sysconfig from sysc_reset+0x109/0x12
sysc_reset from sysc_probe+0xa99/0xeb0
...
The fix is to flush posted write after enable before reset during
probe. Note that some devices also need to specify the delay after enable
with ti,sysc-delay-us, but this is not needed for musb on am335x based on
my tests.
Reported-by: kernelci.org bot <bot@kernelci.org>
Closes: https://storage.kernelci.org/next/master/next-20230614/arm/multi_v7_defconfig+CONFIG_THUMB2_KERNEL=y/gcc-10/lab-cip/baseline-beaglebone-black.html
Fixes: 596e795569 ("bus: ti-sysc: Add support for software reset")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2eb9625a3a ]
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Without this fix device/firmware does not recover unless system
is power cycled.
Fixes: 2950219d87 ("qede: Add basic network device support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d0c88e84e ]
The status of global socket memory pressure is updated when:
a) __sk_mem_raise_allocated():
enter: sk_memory_allocated(sk) > sysctl_mem[1]
leave: sk_memory_allocated(sk) <= sysctl_mem[0]
b) __sk_mem_reduce_allocated():
leave: sk_under_memory_pressure(sk) &&
sk_memory_allocated(sk) < sysctl_mem[0]
So the conditions of leaving global pressure are inconstant, which
may lead to the situation that one pressured net-memcg prevents the
global pressure from being cleared when there is indeed no global
pressure, thus the global constrains are still in effect unexpectedly
on the other sockets.
This patch fixes this by ignoring the net-memcg's pressure when
deciding whether should leave global memory pressure.
Fixes: e1aab161e0 ("socket: initial cgroup code.")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20230816091226.1542-1-wuyun.abel@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e16ca7fb9f ]
When offloading a TC encap action, the action information for the
hardware might not be "ready": if there's currently no neighbour entry
available for the destination address, we can't construct the Ethernet
header to prepend to the packet. In this case, we still offload the
flow rule, but with its action-set-list ID pointing at a "fallback"
action which simply delivers the packet to its default destination (as
though no flow rule had matched), thus allowing software TC to handle
it. Later, when we receive a neighbouring update that allows us to
construct the encap header, the rule will become "ready" and we will
update its action-set-list ID in hardware to point at the actual
offloaded actions.
This patch sets up these fallback ASLs, but does not yet use them.
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: fa165e1949 ("sfc: don't unregister flow_indr if it was never registered")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23d775f12d ]
If the switch is reset during active EEPROM transactions, as in
just after an SoC reset after power up, the I2C bus transaction
may be cut short leaving the EEPROM internal I2C state machine
in the wrong state. When the switch is reset again, the bad
state machine state may result in data being read from the wrong
memory location causing the switch to enter unexpected mode
rendering it inoperational.
Fixes: a3dcb3e7e7 ("net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset")
Signed-off-by: Alfred Lee <l00g33k@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230815001323.24739-1-l00g33k@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34a79876d9 ]
Before this fix, running high rate traffic through XDP_REDIRECT
with multibuf could overrun the fifo used to release the
xdp frames after tx completion. This resulted in corrupted data
being consumed on the free side.
The culplirt was a miscalculation of the fifo size: the maximum ratio
between fifo entries / data segments was incorrect. This ratio serves to
calculate the max fifo size for a full sq where each packet uses the
worst case number of entries in the fifo.
This patch fixes the formula and names the constant. It also makes sure
that future values will use a power of 2 number of entries for the fifo
mask to work.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Fixes: 3f734b8c59 ("net/mlx5e: XDP, Use multiple single-entry objects in xdpi_fifo")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 751969e5b1 ]
Return an error if a field's mask is neither full nor empty. When a mask
is only partial the field is not being used for rule programming but it
gives a wrong impression it is used. Fix by returning an error on any
partial mask to make it clear they are not supported.
The ip_ver assignment is moved earlier in code to allow using it in
iavf_validate_fdir_fltr_masks.
Fixes: 527691bf06 ("iavf: Support IPv4 Flow Director filters")
Fixes: e90cbc257a ("iavf: Support IPv6 Flow Director filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a552bfa16b ]
Recent changes in net-next (commit 759ab1edb5 ("net: store netdevs
in an xarray")) refactored the handling of pre-assigned ifindexes
and let syzbot surface a latent problem in ovs. ovs does not validate
ifindex, making it possible to create netdev ports with negative
ifindex values. It's easy to repro with YNL:
$ ./cli.py --spec netlink/specs/ovs_datapath.yaml \
--do new \
--json '{"upcall-pid": 1, "name":"my-dp"}'
$ ./cli.py --spec netlink/specs/ovs_vport.yaml \
--do new \
--json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}'
$ ip link show
-65536: some-port0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 7a:48:21:ad:0b:fb brd ff:ff:ff:ff:ff:ff
...
Validate the inputs. Now the second command correctly returns:
$ ./cli.py --spec netlink/specs/ovs_vport.yaml \
--do new \
--json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}'
lib.ynl.NlError: Netlink error: Numerical result out of range
nl_len = 108 (92) nl_flags = 0x300 nl_type = 2
error: -34 extack: {'msg': 'integer out of range', 'unknown': [[type:4 len:36] b'\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x03\x00\xff\xff\xff\x7f\x00\x00\x00\x00\x08\x00\x01\x00\x08\x00\x00\x00'], 'bad-attr': '.ifindex'}
Accept 0 since it used to be silently ignored.
Fixes: 54c4ef34c4 ("openvswitch: allow specifying ifindex of new interfaces")
Reported-by: syzbot+7456b5dcf65111553320@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230814203840.2908710-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dafcbce071 ]
Similar to commit 01f4fd2708 ("bonding: Fix incorrect deletion of
ETH_P_8021AD protocol vid from slaves"), we can trigger BUG_ON(!vlan_info)
in unregister_vlan_dev() with the following testcase:
# ip netns add ns1
# ip netns exec ns1 ip link add team1 type team
# ip netns exec ns1 ip link add team_slave type veth peer veth2
# ip netns exec ns1 ip link set team_slave master team1
# ip netns exec ns1 ip link add link team_slave name team_slave.10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link add link team1 name team1.10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link set team_slave nomaster
# ip netns del ns1
Add S-VLAN tag related features support to team driver. So the team driver
will always propagate the VLAN info to its slaves.
Fixes: 8ad227ff89 ("net: vlan: add 802.1ad support")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230814032301.2804971-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23185c6aed ]
Do not allow to insert elements from datapath to objects maps.
Fixes: 8aeff920dc ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6a33d8b73d ]
Netlink event path is missing a synchronization point with GC
transactions. Add GC sequence number update to netns release path and
netlink event path, any GC transaction losing race will be discarded.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9bfab6d23a ]
In SCTP protocol, it is using the same timer (T2 timer) for SHUTDOWN and
SHUTDOWN_ACK retransmission. However in sctp conntrack the default timeout
value for SCTP_CONNTRACK_SHUTDOWN_ACK_SENT state is 3 secs while it's 300
msecs for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV state.
As Paolo Valerio noticed, this might cause unwanted expiration of the ct
entry. In my test, with 1s tc netem delay set on the NAT path, after the
SHUTDOWN is sent, the sctp ct entry enters SCTP_CONNTRACK_SHUTDOWN_SEND
state. However, due to 300ms (too short) delay, when the SHUTDOWN_ACK is
sent back from the peer, the sctp ct entry has expired and been deleted,
and then the SHUTDOWN_ACK has to be dropped.
Also, it is confusing these two sysctl options always show 0 due to all
timeout values using sec as unit:
net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0
net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0
This patch fixes it by also using 3 secs for sctp shutdown send and recv
state in sctp conntrack, which is also RTO.initial value in SCTP protocol.
Note that the very short time value for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV
was probably used for a rare scenario where SHUTDOWN is sent on 1st path
but SHUTDOWN_ACK is replied on 2nd path, then a new connection started
immediately on 1st path. So this patch also moves from SHUTDOWN_SEND/RECV
to CLOSE when receiving INIT in the ORIGINAL direction.
Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7845914f45 ]
nftables selftests fail:
run-tests.sh testcases/sets/0044interval_overlap_0
Expected: 0-2 . 0-3, got:
W: [FAILED] ./testcases/sets/0044interval_overlap_0: got 1
Insertion must ignore duplicate but expired entries.
Moreover, there is a strange asymmetry in nft_pipapo_activate:
It refetches the current element, whereas the other ->activate callbacks
(bitmap, hash, rhash, rbtree) use elem->priv.
Same for .remove: other set implementations take elem->priv,
nft_pipapo_remove fetches elem->priv, then does a relookup,
remove this.
I suspect this was the reason for the change that prompted the
removal of the expired check in pipapo_get() in the first place,
but skipping exired elements there makes no sense to me, this helper
is used for normal get requests, insertions (duplicate check)
and deactivate callback.
In first two cases expired elements must be skipped.
For ->deactivate(), this gets called for DELSETELEM, so it
seems to me that expired elements should be skipped as well, i.e.
delete request should fail with -ENOENT error.
Fixes: 24138933b9 ("netfilter: nf_tables: don't skip expired elements during walk")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 90e5b3462e ]
When flushing, individual set elements are disabled in the next
generation via the ->flush callback.
Catchall elements are not disabled. This is incorrect and may lead to
double-deactivations of catchall elements which then results in memory
leaks:
WARNING: CPU: 1 PID: 3300 at include/net/netfilter/nf_tables.h:1172 nft_map_deactivate+0x549/0x730
CPU: 1 PID: 3300 Comm: nft Not tainted 6.5.0-rc5+ #60
RIP: 0010:nft_map_deactivate+0x549/0x730
[..]
? nft_map_deactivate+0x549/0x730
nf_tables_delset+0xb66/0xeb0
(the warn is due to nft_use_dec() detecting underflow).
Fixes: aaa31047a6 ("netfilter: nftables: add catch-all set element support")
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96d3c1cade ]
The encode_dma() function has some validation on in_trans->size but it
would be more clear to move those checks to find_and_map_user_pages().
The encode_dma() had two checks:
if (in_trans->addr + in_trans->size < in_trans->addr || !in_trans->size)
return -EINVAL;
The in_trans->addr variable is the starting address. The in_trans->size
variable is the total size of the transfer. The transfer can occur in
parts and the resources->xferred_dma_size tracks how many bytes we have
already transferred.
This patch introduces a new variable "remaining" which represents the
amount we want to transfer (in_trans->size) minus the amount we have
already transferred (resources->xferred_dma_size).
I have modified the check for if in_trans->size is zero to instead check
if in_trans->size is less than resources->xferred_dma_size. If we have
already transferred more bytes than in_trans->size then there are negative
bytes remaining which doesn't make sense. If there are zero bytes
remaining to be copied, just return success.
The check in encode_dma() checked that "addr + size" could not overflow
and barring a driver bug that should work, but it's easier to check if
we do this in parts. First check that "in_trans->addr +
resources->xferred_dma_size" is safe. Then check that "xfer_start_addr +
remaining" is safe.
My final concern was that we are dealing with u64 values but on 32bit
systems the kmalloc() function will truncate the sizes to 32 bits. So
I calculated "total = in_trans->size + offset_in_page(xfer_start_addr);"
and returned -EINVAL if it were >= SIZE_MAX. This will not affect 64bit
systems.
Fixes: 129776ac2e ("accel/qaic: Add control path")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/24d3348b-25ac-4c1b-b171-9dae7c43e4e0@moroto.mountain
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 758c910781 ]
If it fails to get the devices's MAC address, octep_probe exits while
leaving the delayed work intr_poll_task queued. When the work later
runs, it's a use after free.
Move the cancelation of intr_poll_task from octep_remove into
octep_device_cleanup. This does not change anything in the octep_remove
flow, but octep_device_cleanup is called also in the octep_probe error
path, where the cancelation is needed.
Note that the cancelation of ctrl_mbox_task has to follow
intr_poll_task's, because the ctrl_mbox_task may be queued by
intr_poll_task.
Fixes: 24d4333233 ("octeon_ep: poll for control messages")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-5-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28458c8000 ]
tx_timeout_task is canceled too early when removing the driver. Nothing
prevents .ndo_tx_timeout from triggering and queuing the work again.
Better cancel it after the netdev is unregistered.
It's harmless for octep_tx_timeout_task to run in the window between the
unregistration and cancelation, because it checks netif_running.
Fixes: 862cd659a6 ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-3-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c461e394d ]
On Zynq UltraScale+ MPSoC ubuntu platform when systemctl issues suspend,
network manager bring down the interface and goes into suspend. When it
wakes up it again enables the interface.
This leads to xilinx-psgtr "PLL lock timeout" on interface bringup, as
the power management controller power down the entire FPD (including
SERDES) if none of the FPD devices are in use and serdes is not
initialized on resume.
$ sudo rtcwake -m no -s 120 -v
$ sudo systemctl suspend <this does ifconfig eth1 down>
$ ifconfig eth1 up
xilinx-psgtr fd400000.phy: lane 0 (type 10, protocol 5): PLL lock timeout
phy phy-fd400000.phy.0: phy poweron failed --> -110
macb driver is called in this way:
1. macb_close: Stop network interface. In this function, it
reset MACB IP and disables PHY and network interface.
2. macb_suspend: It is called in kernel suspend flow. But because
network interface has been disabled(netif_running(ndev) is
false), it does nothing and returns directly;
3. System goes into suspend state. Some time later, system is
waken up by RTC wakeup device;
4. macb_resume: It does nothing because network interface has
been disabled;
5. macb_open: It is called to enable network interface again. ethernet
interface is initialized in this API but serdes which is power-off
by PMUFW during FPD-off suspend is not initialized again and so
we hit GT PLL lock issue on open.
To resolve this PLL timeout issue always do PS GTR initialization
when ethernet device is configured as non-wakeup source.
Fixes: f22bd29ba1 ("net: macb: Fix ZynqMP SGMII non-wakeup source resume failure")
Fixes: 8b73fa3ae0 ("net: macb: Added ZynqMP-specific initialization")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1691414091-2260697-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5598c9bfdb ]
This should be done before the soft min/max frequencies are restored.
When we disable the "Ignore efficient frequency" flag, GuC does not
actually bring the requested freq down to RPn.
Specifically, this scenario-
- ignore efficient freq set to true
- reduce min to RPn (from efficient)
- suspend
- resume (includes GuC load, restore soft min/max, restore efficient freq)
- validate min freq has been resored to RPn
This will fail if we didn't first restore(disable, in this case) efficient
freq flag before setting the soft min frequency.
v2: Bring the min freq down to RPn when we disable efficient freq (Rodrigo)
Also made the change to set the min softlimit to RPn at init. Otherwise, we
were storing RPe there.
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8736
Fixes: 55f9720dbf ("drm/i915/guc/slpc: Provide sysfs for efficient freq")
Fixes: 95ccf312a1 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency")
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230726010044.3280402-1-vinay.belgaumkar@intel.com
(cherry picked from commit 28e671114f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 855067defa ]
This test verifies whether the encapsulated packets have the correct
configured TTL. It does so by sending ICMP packets through the test
topology and mirroring them to a gretap netdevice. On a busy host
however, more than just the test ICMP packets may end up flowing
through the topology, get mirrored, and counted. This leads to
potential spurious failures as the test observes much more mirrored
packets than the sent test packets, and assumes a bug.
Fix this by tightening up the mirror action match. Change it from
matchall to a flower classifier matching on ICMP packets specifically.
Fixes: 45315673e0 ("selftests: forwarding: Test changes in mirror-to-gretap")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc941e548b ]
Uwe reports:
"Most PHYs signal WoL using an interrupt. So disabling interrupts [at
shutdown] breaks WoL at least on PHYs covered by the marvell driver."
Discussing with Ioana, the problem which was trying to be solved was:
"The board in question is a LS1021ATSN which has two AR8031 PHYs that
share an interrupt line. In case only one of the PHYs is probed and
there are pending interrupts on the PHY#2 an IRQ storm will happen
since there is no entity to clear the interrupt from PHY#2's registers.
PHY#1's driver will get stuck in .handle_interrupt() indefinitely."
Further confirmation that "the two AR8031 PHYs are on the same MDIO
bus."
With WoL using interrupts to wake the system, in such a case, the
system will begin booting with an asserted interrupt. Thus, we need to
cope with an interrupt asserted during boot.
Solve this instead by disabling interrupts during PHY probe. This will
ensure in Ioana's situation that both PHYs of the same type sharing an
interrupt line on a common MDIO bus will have their interrupt outputs
disabled when the driver probes the device, but before we hook in any
interrupt handlers - thus avoiding the interrupt storm.
A better fix would be for platform firmware to disable the interrupting
devices at source during boot, before control is handed to the kernel.
Fixes: e2f016cf77 ("net: phy: add a shutdown procedure")
Link: 20230804071757.383971-1-u.kleine-koenig@pengutronix.de
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 829c6524d6 ]
The reference of pdev->dev is taken by of_find_device_by_node, so
it should be released when not need anymore.
Fixes: 7dc54d3b8d ("net: pcs: add Renesas MII converter driver")
Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51b813176f ]
Commit 25266128fe ("virtio-net: fix race between set queues and
probe") tries to fix the race between set queues and probe by calling
_virtnet_set_queues() before DRIVER_OK is set. This violates virtio
spec. Fixing this by setting queues after virtio_device_ready().
Note that rtnl needs to be held for userspace requests to change the
number of queues. So we are serialized in this way.
Fixes: 25266128fe ("virtio-net: fix race between set queues and probe")
Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3ec2b5d87 ]
In destruction flow, the assignment of NULL to xso->dev
caused to skip of xfrm_dev_state_free() call, which was
called in xfrm_state_put(to_put) routine.
Instead of open-coded variant of xfrm_dev_state_delete() and
xfrm_dev_state_free(), let's use them directly.
Fixes: f8a70afafc ("xfrm: add TX datapath support for IPsec packet offload mode")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 982c3aca8b ]
The policy memory was released but not HW driver data. Add
call to xfrm_dev_policy_delete(), so drivers will have a chance
to release their resources.
Fixes: 919e43fad5 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e2424708d ]
The previous commit 4e484b3e96 ("xfrm: rate limit SA mapping change
message to user space") added one additional attribute named
XFRMA_MTIMER_THRESH and described its type at compat_policy
(net/xfrm/xfrm_compat.c).
However, the author forgot to also describe the nla_policy at
xfrma_policy (net/xfrm/xfrm_user.c). Hence, this suppose NLA_U32 (4
bytes) value can be faked as empty (0 bytes) by a malicious user, which
leads to 4 bytes overflow read and heap information leak when parsing
nlattrs.
To exploit this, one malicious user can spray the SLUB objects and then
leverage this 4 bytes OOB read to leak the heap data into
x->mapping_maxage (see xfrm_update_ae_params(...)), and leak it to
userspace via copy_to_user_state_extra(...).
The above bug is assigned CVE-2023-3773. To fix it, this commit just
completes the nla_policy description for XFRMA_MTIMER_THRESH, which
enforces the length check and avoids such OOB read.
Fixes: 4e484b3e96 ("xfrm: rate limit SA mapping change message to user space")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6018a26627 ]
When ip_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ip_vti device sends IPv6 packets.
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fd41f1ba6 ]
When ipv6_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ipv6_vti device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff88802e08edc2 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-next-20230707-00001-g84e2cad7f979 #410
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
vti6_tnl_xmit+0x3e6/0x1ee0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
Allocated by task 9176:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
netlink_sendmsg+0x9b1/0xe30
sock_sendmsg+0xde/0x190
____sys_sendmsg+0x739/0x920
___sys_sendmsg+0x110/0x1b0
__sys_sendmsg+0xf7/0x1c0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 9176:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
kasan_save_free_info+0x2b/0x40
____kasan_slab_free+0x160/0x1c0
slab_free_freelist_hook+0x11b/0x220
kmem_cache_free+0xf0/0x490
skb_free_head+0x17f/0x1b0
skb_release_data+0x59c/0x850
consume_skb+0xd2/0x170
netlink_unicast+0x54f/0x7f0
netlink_sendmsg+0x926/0xe30
sock_sendmsg+0xde/0x190
____sys_sendmsg+0x739/0x920
___sys_sendmsg+0x110/0x1b0
__sys_sendmsg+0xf7/0x1c0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff88802e08ed00
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 194 bytes inside of
freed 640-byte region [ffff88802e08ed00, ffff88802e08ef80)
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53223f2ed1 ]
When the xfrm device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when the xfrm device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff8881111458ef by task swapper/3/0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
xfrmi_xmit+0x173/0x1ca0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:intel_idle_hlt+0x23/0x30
Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4
RSP: 0018:ffffc90000197d78 EFLAGS: 00000246
RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5
RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50
RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d
R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001
R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000
cpuidle_enter_state+0xd3/0x6f0
cpuidle_enter+0x4e/0xa0
do_idle+0x2fe/0x3c0
cpu_startup_entry+0x18/0x20
start_secondary+0x200/0x290
secondary_startup_64_no_verify+0x167/0x16b
</TASK>
Allocated by task 939:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
inet6_ifa_notify+0x118/0x230
__ipv6_ifa_notify+0x177/0xbe0
addrconf_dad_completed+0x133/0xe00
addrconf_dad_work+0x764/0x1390
process_one_work+0xa32/0x16f0
worker_thread+0x67d/0x10c0
kthread+0x344/0x440
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888111145800
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 239 bytes inside of
freed 640-byte region [ffff888111145800, ffff888111145a80)
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 57010b8ece ]
After the elimination of inner modes, a couple of warnings that
were previously unreachable can now be triggered by malformed
inbound packets.
Fix this by:
1. Moving the setting of skb->protocol into the decap functions.
2. Returning -EINVAL when unexpected protocol is seen.
Reported-by: Maciej Żenczykowski<maze@google.com>
Fixes: 5f24f41e8e ("xfrm: Remove inner/outer modes from input path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1e0e61d61 ]
According to all consumers code of attrs[XFRMA_SEC_CTX], like
* verify_sec_ctx_len(), convert to xfrm_user_sec_ctx*
* xfrm_state_construct(), call security_xfrm_state_alloc whose prototype
is int security_xfrm_state_alloc(.., struct xfrm_user_sec_ctx *sec_ctx);
* copy_from_user_sec_ctx(), convert to xfrm_user_sec_ctx *
...
It seems that the expected parsing result for XFRMA_SEC_CTX should be
structure xfrm_user_sec_ctx, and the current xfrm_sec_ctx is confusing
and misleading (Luckily, they happen to have same size 8 bytes).
This commit amend the policy structure to xfrm_user_sec_ctx to avoid
ambiguity.
Fixes: cf5cb79f69 ("[XFRM] netlink: Establish an attribute policy")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75065a8929 ]
When running xfrm_state_walk_init(), the xfrm_address_filter being used
is okay to have a splen/dplen that equals to sizeof(xfrm_address_t)<<3.
This commit replaces >= to > to make sure the boundary checking is
correct.
Fixes: 37bd22420f ("af_key: pfkey_dump needs parameter validation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 833fd800bf upstream.
The kprobes optimization check can_optimize() calls
insn_is_indirect_jump() to detect indirect jump instructions in
a target function. If any is found, creating an optprobe is disallowed
in the function because the jump could be from a jump table and could
potentially land in the middle of the target optprobe.
With retpolines, insn_is_indirect_jump() additionally looks for calls to
indirect thunks which the compiler potentially used to replace original
jumps. This extra check is however unnecessary because jump tables are
disabled when the kernel is built with retpolines. The same is currently
the case with IBT.
Based on this observation, remove the logic to look for calls to
indirect thunks and skip the check for indirect jumps altogether if the
kernel is built with retpolines or IBT. Remove subsequently the symbols
__indirect_thunk_start and __indirect_thunk_end which are no longer
needed.
Dropping this logic indirectly fixes a problem where the range
[__indirect_thunk_start, __indirect_thunk_end] wrongly included also the
return thunk. It caused that machines which used the return thunk as
a mitigation and didn't have it patched by any alternative ended up not
being able to use optprobes in any regular function.
Fixes: 0b53c374b9 ("x86/retpoline: Use -mfunction-return")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230711091952.27944-3-petr.pavlu@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79cd2a1122 upstream.
The linker script arch/x86/kernel/vmlinux.lds.S matches the thunk
sections ".text.__x86.*" from arch/x86/lib/retpoline.S as follows:
.text {
[...]
TEXT_TEXT
[...]
__indirect_thunk_start = .;
*(.text.__x86.*)
__indirect_thunk_end = .;
[...]
}
Macro TEXT_TEXT references TEXT_MAIN which normally expands to only
".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes
".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk
sections. The output layout is then different than expected. For
instance, the currently defined range [__indirect_thunk_start,
__indirect_thunk_end] becomes empty.
Prevent the problem by using ".." as the first separator, for example,
".text..__x86.indirect_thunk". This pattern is utilized by other
explicit section names which start with one of the standard prefixes,
such as ".text" or ".data", and that need to be individually selected in
the linker script.
[ nathan: Fix conflicts with SRSO and fold in fix issue brought up by
Andrew Cooper in post-review:
https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ]
Fixes: dc5723b02e ("kbuild: add support for Clang LTO")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f58d6fbcb7 upstream.
Initially, it was thought that doing an innocuous division in the #DE
handler would take care to prevent any leaking of old data from the
divider but by the time the fault is raised, the speculation has already
advanced too far and such data could already have been used by younger
operations.
Therefore, do the innocuous division on every exit to userspace so that
userspace doesn't see any potentially old data from integer divisions in
kernel space.
Do the same before VMRUN too, to protect host data from leaking into the
guest too.
Fixes: 77245f1c3c ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba5ca5e5e6 upstream.
Use LEA instead of ADD when adjusting %rsp in srso_safe_ret{,_alias}()
so as to avoid clobbering flags. Drop one of the INT3 instructions to
account for the LEA consuming one more byte than the ADD.
KVM's emulator makes indirect calls into a jump table of sorts, where
the destination of each call is a small blob of code that performs fast
emulation by executing the target instruction with fixed operands.
E.g. to emulate ADC, fastop() invokes adcb_al_dl():
adcb_al_dl:
<+0>: adc %dl,%al
<+2>: jmp <__x86_return_thunk>
A major motivation for doing fast emulation is to leverage the CPU to
handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
both an input and output to the target of the call. fastop() collects
the RFLAGS result by pushing RFLAGS onto the stack and popping them back
into a variable (held in %rdi in this case):
asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
<+71>: mov 0xc0(%r8),%rdx
<+78>: mov 0x100(%r8),%rcx
<+85>: push %rdi
<+86>: popf
<+87>: call *%rsi
<+89>: nop
<+90>: nop
<+91>: nop
<+92>: pushf
<+93>: pop %rdi
and then propagating the arithmetic flags into the vCPU's emulator state:
ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
<+64>: and $0xfffffffffffff72a,%r9
<+94>: and $0x8d5,%edi
<+109>: or %rdi,%r9
<+122>: mov %r9,0x10(%r8)
The failures can be most easily reproduced by running the "emulator"
test in KVM-Unit-Tests.
If you're feeling a bit of deja vu, see commit b63f20a778
("x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386").
In addition, this breaks booting of clang-compiled guest on
a gcc-compiled host where the host contains the %rsp-modifying SRSO
mitigations.
[ bp: Massage commit message, extend, remove addresses. ]
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Closes: https://lore.kernel.org/all/de474347-122d-54cd-eabf-9dcc95ab9eae@amd.com
Reported-by: Srikanth Aithal <sraithal@amd.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20230810013334.GA5354@dev-arch.thelio-3990X/
Link: https://lore.kernel.org/r/20230811155255.250835-1-seanjc@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5409730962 upstream.
Christian reported spurious module load crashes after some of Song's
module memory layout patches.
Turns out that if the very last instruction on the very last page of the
module is a 'JMP __x86_return_thunk' then __static_call_fixup() will
trip a fault and die.
And while the module rework made this slightly more likely to happen,
it's always been possible.
Fixes: ee88d363d1 ("x86,static_call: Use alternative RET encoding")
Reported-by: Christian Bricart <christian@bricart.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/20230816104419.GA982867@hirez.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbf4600877 upstream.
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3 ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 864bcaa38e upstream.
Similar to how it doesn't make sense to have UNTRAIN_RET have two
untrain calls, it also doesn't make sense for VMEXIT to have an extra
IBPB call.
This cures VMEXIT doing potentially unret+IBPB or double IBPB.
Also, the (SEV) VMEXIT case seems to have been overlooked.
Redefine the meaning of the synthetic IBPB flags to:
- ENTRY_IBPB -- issue IBPB on entry (was: entry + VMEXIT)
- IBPB_ON_VMEXIT -- issue IBPB on VMEXIT
And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains
the previous behaviour and issues IBPB on entry+VMEXIT.
The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT.
Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that
check IBPB_ON_VMEXIT.
All this avoids having the VMEXIT case having to check both ENTRY_IBPB
and IBPB_ON_VMEXIT and simplifies the alternatives.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7c25c441e upstream.
Since there can only be one active return_thunk, there only needs be
one (matching) untrain_ret. It fundamentally doesn't make sense to
allow multiple untrain_ret at the same time.
Fold all the 3 different untrain methods into a single (temporary)
helper stub.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121149.042774962@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d43490d0ab upstream.
Use the existing configurable return thunk. There is absolute no
justification for having created this __x86_return_thunk alternative.
To clarify, the whole thing looks like:
Zen3/4 does:
srso_alias_untrain_ret:
nop2
lfence
jmp srso_alias_return_thunk
int3
srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so
add $8, %rsp
ret
int3
srso_alias_return_thunk:
call srso_alias_safe_ret
ud2
While Zen1/2 does:
srso_untrain_ret:
movabs $foo, %rax
lfence
call srso_safe_ret (jmp srso_return_thunk ?)
int3
srso_safe_ret: // embedded in movabs instruction
add $8,%rsp
ret
int3
srso_return_thunk:
call srso_safe_ret
ud2
While retbleed does:
zen_untrain_ret:
test $0xcc, %bl
lfence
jmp zen_return_thunk
int3
zen_return_thunk: // embedded in the test instruction
ret
int3
Where Zen1/2 flush the BTB entry using the instruction decoder trick
(test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence
(srso_safe_ret()) which forces the function return instruction to
speculate into a trap (UD2). This RET will then mispredict and
execution will continue at the return site read from the top of the
stack.
Pick one of three options at boot (evey function can only ever return
once).
[ bp: Fixup commit message uarch details and add them in a comment in
the code too. Add a comment about the srso_select_mitigation()
dependency on retbleed_select_mitigation(). Add moar ifdeffery for
32-bit builds. Add a dummy srso_untrain_ret_alias() definition for
32-bit alternatives needing the symbol. ]
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ae68b26c3 upstream.
Objtool --rethunk does two things:
- it collects all (tail) call's of __x86_return_thunk and places them
into .return_sites. These are typically compiler generated, but
RET also emits this same.
- it fudges the validation of the __x86_return_thunk symbol; because
this symbol is inside another instruction, it can't actually find
the instruction pointed to by the symbol offset and gets upset.
Because these two things pertained to the same symbol, there was no
pressing need to separate these two separate things.
However, alas, along comes SRSO and more crazy things to deal with
appeared.
The SRSO patch itself added the following symbol names to identify as
rethunk:
'srso_untrain_ret', 'srso_safe_ret' and '__ret'
Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a
new similarly embedded return thunk, and 'srso_untrain_ret' is
completely unrelated to anything the above does (and was only included
because of that INT3 vs UD2 issue fixed previous).
Clear things up by adding a second category for the embedded instruction
thing.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.704502245@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af023ef335 upstream.
vmlinux.o: warning: objtool: srso_untrain_ret() falls through to next function __x86_return_skl()
vmlinux.o: warning: objtool: __x86_return_thunk() falls through to next function __x86_return_skl()
This is because these functions (can) end with CALL, which objtool
does not consider a terminating instruction. Therefore, replace the
INT3 instruction (which is a non-fatal trap) with UD2 (which is a
fatal-trap).
This indicates execution will not continue past this point.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.637802730@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77f6711900 upstream.
Commit
fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
reimplemented __x86_return_thunk with a mix of SYM_FUNC_START and
SYM_CODE_END, this is not a sane combination.
Since nothing should ever actually 'CALL' this, make it consistently
CODE.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.571027074@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69f035c480 upstream.
In the I2C_FUNC_SMBUS_BLOCK_DATA case, the invalid length byte value
(outside of 1-32) of the SMBus block data response from the Slave device
is not correctly handled by the I2C Designware driver.
In case IC_EMPTYFIFO_HOLD_MASTER_EN==1, which cannot be detected
from the registers, the Master can be disabled only if the STOP bit
is set. Without STOP bit set, the Master remains active, holding the bus
until receiving a block data response length. This hangs the bus and
is unrecoverable.
Avoid this by issuing another dump read to reach the stop condition when
an invalid length byte is received.
Cc: stable@vger.kernel.org
Signed-off-by: Tam Nguyen <tamnguyenchi@os.amperecomputing.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/20230726080001.337353-3-tamnguyenchi@os.amperecomputing.com
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09c3717c3a upstream.
bio_ctrl->len_to_oe_boundary is used to make sure we stay inside a zone
as we submit bios for writes. Every time we add a page to the bio, we
decrement those bytes from len_to_oe_boundary, and then we submit the
bio if we happen to hit zero.
Most of the time, len_to_oe_boundary gets set to U32_MAX.
submit_extent_page() adds pages into our bio, and the size of the bio
ends up limited by:
- Are we contiguous on disk?
- Does bio_add_page() allow us to stuff more in?
- is len_to_oe_boundary > 0?
The len_to_oe_boundary math starts with U32_MAX, which isn't page or
sector aligned, and subtracts from it until it hits zero. In the
non-zoned case, the last IO we submit before we hit zero is going to be
unaligned, triggering BUGs.
This is hard to trigger because bio_add_page() isn't going to make a bio
of U32_MAX size unless you give it a perfect set of pages and fully
contiguous extents on disk. We can hit it pretty reliably while making
large swapfiles during provisioning because the machine is freshly
booted, mostly idle, and the disk is freshly formatted. It's also
possible to trigger with reads when read_ahead_kb is set to 4GB.
The code has been clean up and shifted around a few times, but this flaw
has been lurking since the counter was added. I think the commit
24e6c80822 ("btrfs: simplify main loop in submit_extent_page") ended
up exposing the bug.
The fix used here is to skip doing math on len_to_oe_boundary unless
we've changed it from the default U32_MAX value. bio_add_page() is the
real limit we want, and there's no reason to do extra math when block
layer is doing it for us.
Sample reproducer, note you'll need to change the path to the bdi and
device:
SUBVOL=/btrfs/swapvol
SWAPFILE=$SUBVOL/swapfile
SZMB=8192
mkfs.btrfs -f /dev/vdb
mount /dev/vdb /btrfs
btrfs subvol create $SUBVOL
chattr +C $SUBVOL
dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
sync
echo 4 > /proc/sys/vm/drop_caches
echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb
while true; do
echo 1 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/drop_caches
dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock
done
Fixes: 24e6c80822 ("btrfs: simplify main loop in submit_extent_page")
CC: stable@vger.kernel.org # 6.4+
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b471965fdb upstream.
Fstests with POST_MKFS_CMD="btrfstune -m" (as in the mailing list)
reported a few of the test cases failing.
The failure scenario can be summarized and simplified as follows:
$ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb1 /dev/sdb2 :0
$ btrfstune -m /dev/sdb1 :0
$ wipefs -a /dev/sdb1 :0
$ mount -o degraded /dev/sdb2 /btrfs :0
$ btrfs replace start -B -f -r 1 /dev/sdb1 /btrfs :1
STDERR:
ERROR: ioctl(DEV_REPLACE_START) failed on "/btrfs": Input/output error
[11290.583502] BTRFS warning (device sdb2): tree block 22036480 mirror 2 has bad fsid, has 99835c32-49f0-4668-9e66-dc277a96b4a6 want da40350c-33ac-4872-92a8-4948ed8c04d0
[11290.586580] BTRFS error (device sdb2): unable to fix up (regular) error at logical 22020096 on dev /dev/sdb8 physical 1048576
As above, the replace is failing because we are verifying the header with
fs_devices::fsid instead of fs_devices::metadata_uuid, despite the
metadata_uuid actually being present.
To fix this, use fs_devices::metadata_uuid. We copy fsid into
fs_devices::metadata_uuid if there is no metadata_uuid, so its fine.
Fixes: a3ddbaebc7 ("btrfs: scrub: introduce a helper to verify one metadata block")
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29eefa6d0d upstream.
Pausing and canceling balance can race to interrupt balance lead to BUG_ON
panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance
does not take this race scenario into account.
However, the race condition has no other side effects. We can fix that.
Reproducing it with panic trace like this:
kernel BUG at fs/btrfs/volumes.c:4618!
RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0
Call Trace:
<TASK>
? do_nanosleep+0x60/0x120
? hrtimer_nanosleep+0xb7/0x1a0
? sched_core_clone_cookie+0x70/0x70
btrfs_ioctl_balance_ctl+0x55/0x70
btrfs_ioctl+0xa46/0xd20
__x64_sys_ioctl+0x7d/0xa0
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Race scenario as follows:
> mutex_unlock(&fs_info->balance_mutex);
> --------------------
> .......issue pause and cancel req in another thread
> --------------------
> ret = __btrfs_balance(fs_info);
>
> mutex_lock(&fs_info->balance_mutex);
> if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
> btrfs_info(fs_info, "balance: paused");
> btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
> }
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c962098ca4 upstream.
In production we were seeing a variety of WARN_ON()'s in the extent_map
code, specifically in btrfs_drop_extent_map_range() when we have to call
add_extent_mapping() for our second split.
Consider the following extent map layout
PINNED
[0 16K) [32K, 48K)
and then we call btrfs_drop_extent_map_range for [0, 36K), with
skip_pinned == true. The initial loop will have
start = 0
end = 36K
len = 36K
we will find the [0, 16k) extent, but since we are pinned we will skip
it, which has this code
start = em_end;
if (end != (u64)-1)
len = start + len - em_end;
em_end here is 16K, so now the values are
start = 16K
len = 16K + 36K - 16K = 36K
len should instead be 20K. This is a problem when we find the next
extent at [32K, 48K), we need to split this extent to leave [36K, 48k),
however the code for the split looks like this
split->start = start + len;
split->len = em_end - (start + len);
In this case we have
em_end = 48K
split->start = 16K + 36K // this should be 16K + 20K
split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K
and now we have an invalid extent_map in the tree that potentially
overlaps other entries in the extent map. Even in the non-overlapping
case we will have split->start set improperly, which will cause problems
with any block related calculations.
We don't actually need len in this loop, we can simply use end as our
end point, and only adjust start up when we find a pinned extent we need
to skip.
Adjust the logic to do this, which keeps us from inserting an invalid
extent map.
We only skip_pinned in the relocation case, so this is relatively rare,
except in the case where you are running relocation a lot, which can
happen with auto relocation on.
Fixes: 55ef689900 ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b378f6ad4 upstream.
The readdir implementation currently processes always up to the last index
it finds. This however can result in an infinite loop if the directory has
a large number of entries such that they won't all fit in the given buffer
passed to the readdir callback, that is, dir_emit() returns a non-zero
value. Because in that case readdir() will be called again and if in the
meanwhile new directory entries were added and we still can't put all the
remaining entries in the buffer, we keep repeating this over and over.
The following C program and test script reproduce the problem:
$ cat /mnt/readdir_prog.c
#include <sys/types.h>
#include <dirent.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
DIR *dir = opendir(".");
struct dirent *dd;
while ((dd = readdir(dir))) {
printf("%s\n", dd->d_name);
rename(dd->d_name, "TEMPFILE");
rename("TEMPFILE", dd->d_name);
}
closedir(dir);
}
$ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV &> /dev/null
#mkfs.xfs -f $DEV &> /dev/null
#mkfs.ext4 -F $DEV &> /dev/null
mount $DEV $MNT
mkdir $MNT/testdir
for ((i = 1; i <= 2000; i++)); do
echo -n > $MNT/testdir/file_$i
done
cd $MNT/testdir
/mnt/readdir_prog
cd /mnt
umount $MNT
This behaviour is surprising to applications and it's unlike ext4, xfs,
tmpfs, vfat and other filesystems, which always finish. In this case where
new entries were added due to renames, some file names may be reported
more than once, but this varies according to each filesystem - for example
ext4 never reported the same file more than once while xfs reports the
first 13 file names twice.
So change our readdir implementation to track the last index number when
opendir() is called and then make readdir() never process beyond that
index number. This gives the same behaviour as ext4.
Reported-by: Rob Landley <rob@landley.net>
Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 282069845a upstream.
Do not read the data register to clear the error flags for lpuart32
platforms, the additional read may cause the receive FIFO underflow
since the DMA has already read the data register.
Actually all lpuart32 platforms support write 1 to clear those error
bits, let's use this method to better clear the error flags.
Fixes: 42b68768e5 ("serial: fsl_lpuart: DMA support for 32-bit variant")
Cc: stable <stable@kernel.org>
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://lore.kernel.org/r/20230801022304.24251-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c4f8333b5 upstream.
In commit 9b9c8195f3 ("tty: n_gsm: fix UAF in gsm_cleanup_mux"), the UAF
problem is not completely fixed. There is a race condition in
gsm_cleanup_mux(), which caused this UAF.
The UAF problem is triggered by the following race:
task[5046] task[5054]
----------------------- -----------------------
gsm_cleanup_mux();
dlci = gsm->dlci[0];
mutex_lock(&gsm->mutex);
gsm_cleanup_mux();
dlci = gsm->dlci[0]; //Didn't take the lock
gsm_dlci_release(gsm->dlci[i]);
gsm->dlci[i] = NULL;
mutex_unlock(&gsm->mutex);
mutex_lock(&gsm->mutex);
dlci->dead = true; //UAF
Fix it by assigning values after mutex_lock().
Link: https://syzkaller.appspot.com/text?tag=CrashReport&x=176188b5a80000
Cc: stable <stable@kernel.org>
Fixes: 9b9c8195f3 ("tty: n_gsm: fix UAF in gsm_cleanup_mux")
Fixes: aa371e96f0 ("tty: n_gsm: fix restart handling via CLD command")
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Co-developed-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Link: https://lore.kernel.org/r/20230811031121.153237-1-yiyang13@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b38f6ddc9 upstream.
We recently had problems where a network namespace was deleted
causing hard to debug reconnect problems. To help deal with
configuration issues like this it is useful to dump the network
namespace to better debug what happened.
So add this to information displayed in /proc/fs/cifs/DebugData for
the server (and channels if mounted with multichannel). For example:
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0 Net namespace: 4026531840
This can be easily compared with what is displayed for the
processes on the system. For example /proc/1/ns/net in this case
showed the same thing (see below), and we can see that the namespace
is still valid in this example.
'net:[4026531840]'
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d6ba607d6 upstream.
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa max vqp attr to avoid
such bugs.
Fixes: ad69dd0bf2 ("vdpa: Introduce query of device config layout")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-7-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3003e1b54 upstream.
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa queue index attr to avoid
such bugs.
Fixes: 13b00b1356 ("vdpa: Add support for querying vendor statistics")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernelorg
Message-Id: <20230727175757.73988-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79c8651587 upstream.
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa features attr to avoid
such bugs.
Fixes: 90fea5a800 ("vdpa: device feature provisioning")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f3175979e upstream.
With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the
/proc/powerpc/rtas/firmware_update interface to prepare a system
firmware update yields a BUG():
kernel BUG at mm/usercopy.c:102!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ #2
Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries
NIP: c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000
REGS: c0000000148c76a0 TRAP: 0700 Not tainted (6.5.0-rc3+)
MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002242 XER: 0000000c
CFAR: c0000000001fbd34 IRQMASK: 0
[ ... GPRs omitted ... ]
NIP usercopy_abort+0xa0/0xb0
LR usercopy_abort+0x9c/0xb0
Call Trace:
usercopy_abort+0x9c/0xb0 (unreliable)
__check_heap_object+0x1b4/0x1d0
__check_object_size+0x2d0/0x380
rtas_flash_write+0xe4/0x250
proc_reg_write+0xfc/0x160
vfs_write+0xfc/0x4e0
ksys_write+0x90/0x160
system_call_exception+0x178/0x320
system_call_common+0x160/0x2c4
The blocks of the firmware image are copied directly from user memory
to objects allocated from flash_block_cache, so flash_block_cache must
be created using kmem_cache_create_usercopy() to mark it safe for user
access.
Fixes: 6d07d1cd30 ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[mpe: Trim and indent oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8329d0c735 upstream.
In the multi-core JPEG encoder/decoder setup, the driver for the
individual cores references the parent device's platform driver data.
However, in the parent driver, this is only set at the end of the probe
function, way later than devm_of_platform_populate(), which triggers
the probe of the cores. This causes a kernel splat in the sub-device
probe function.
Move platform_set_drvdata() to before devm_of_platform_populate() to
fix this.
Fixes: 934e8bccac ("mtk-jpegenc: support jpegenc multi-hardware")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0872b2c0ab upstream.
in mmphw_probe(), check the return value of clk_prepare_enable()
and return the error code if clk_prepare_enable() returns an
unexpected value.
Fixes: d63028c389 ("video: mmp display controller support")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27ec43c77b upstream.
Tegra processors prior to Tegra186 used APB DMA for I2C requiring
CONFIG_TEGRA20_APB_DMA=y while Tegra186 and later use GPC DMA requiring
CONFIG_TEGRA186_GPC_DMA=y.
The check for if the processor uses APB DMA is inverted and so the wrong
DMA config options are checked.
This means if CONFIG_TEGRA20_APB_DMA=y but CONFIG_TEGRA186_GPC_DMA=n
with a Tegra186 or later processor the driver will incorrectly think DMA is
enabled and attempt to request DMA channels that will never be availible,
leaving the driver in a perpetual EPROBE_DEFER state.
Fixes: 48cb6356fa ("i2c: tegra: Add GPCDMA support")
Signed-off-by: Parker Newman <pnewman@connecttech.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Acked-by: Akhil R <akhilrajeev@nvidia.com>
Link: https://lore.kernel.org/r/fcfcf9b3-c8c4-9b34-2ff8-cd60a3d490bd@connecttech.com
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4caf4cb1ea upstream.
iproc_i2c_rd_reg() and iproc_i2c_wr_reg() are called from both
interrupt context (e.g. bcm_iproc_i2c_isr) and process context
(e.g. bcm_iproc_i2c_suspend). Therefore, interrupts should be
disabled to avoid potential deadlock. To prevent this scenario,
use spin_lock_irqsave().
Fixes: 9a10387280 ("i2c: iproc: add NIC I2C support")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Acked-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e8f5f849ff ]
With deferred close we can have closes that race with lease breaks,
and so with the current checks for whether to send the lease response,
oplock_response(), this can mean that an unmount (kill_sb) can occur
just before we were checking if the tcon->ses is valid. See below:
[Fri Aug 4 04:12:50 2023] RIP: 0010:cifs_oplock_break+0x1f7/0x5b0 [cifs]
[Fri Aug 4 04:12:50 2023] Code: 7d a8 48 8b 7d c0 c0 e9 02 48 89 45 b8 41 89 cf e8 3e f5 ff ff 4c 89 f7 41 83 e7 01 e8 82 b3 03 f2 49 8b 45 50 48 85 c0 74 5e <48> 83 78 60 00 74 57 45 84 ff 75 52 48 8b 43 98 48 83 eb 68 48 39
[Fri Aug 4 04:12:50 2023] RSP: 0018:ffffb30607ddbdf8 EFLAGS: 00010206
[Fri Aug 4 04:12:50 2023] RAX: 632d223d32612022 RBX: ffff97136944b1e0 RCX: 0000000080100009
[Fri Aug 4 04:12:50 2023] RDX: 0000000000000001 RSI: 0000000080100009 RDI: ffff97136944b188
[Fri Aug 4 04:12:50 2023] RBP: ffffb30607ddbe58 R08: 0000000000000001 R09: ffffffffc08e0900
[Fri Aug 4 04:12:50 2023] R10: 0000000000000001 R11: 000000000000000f R12: ffff97136944b138
[Fri Aug 4 04:12:50 2023] R13: ffff97149147c000 R14: ffff97136944b188 R15: 0000000000000000
[Fri Aug 4 04:12:50 2023] FS: 0000000000000000(0000) GS:ffff9714f7c00000(0000) knlGS:0000000000000000
[Fri Aug 4 04:12:50 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Aug 4 04:12:50 2023] CR2: 00007fd8de9c7590 CR3: 000000011228e000 CR4: 0000000000350ef0
[Fri Aug 4 04:12:50 2023] Call Trace:
[Fri Aug 4 04:12:50 2023] <TASK>
[Fri Aug 4 04:12:50 2023] process_one_work+0x225/0x3d0
[Fri Aug 4 04:12:50 2023] worker_thread+0x4d/0x3e0
[Fri Aug 4 04:12:50 2023] ? process_one_work+0x3d0/0x3d0
[Fri Aug 4 04:12:50 2023] kthread+0x12a/0x150
[Fri Aug 4 04:12:50 2023] ? set_kthread_struct+0x50/0x50
[Fri Aug 4 04:12:50 2023] ret_from_fork+0x22/0x30
[Fri Aug 4 04:12:50 2023] </TASK>
To fix this change the ordering of the checks before sending the oplock_response
to first check if the openFileList is empty.
Fixes: da787d5b74 ("SMB3: Do not send lease break acknowledgment if all file handles have been closed")
Suggested-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad03a0f44c ]
mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after
the control virtqueue ASID iotlb has been populated. The control vq
iotlb must not be cleared, since it will not be populated again.
So call the ASID aware destroy function which makes sure that the
right vq resource is destroyed.
Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ee811009a ]
The mr->initialized flag is shared between the control vq and data vq
part of the mr init/uninit. But if the control vq and data vq get placed
in different ASIDs, it can happen that initializing the control vq will
prevent the data vq mr from being initialized.
This patch consolidates the control and data vq init parts into their
own init functions. The mr->initialized will now be used for the data vq
only. The control vq currently doesn't need a flag.
The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
split into data and control vq functions which are now also ASID aware.
Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 55c91fedd0 ]
vm_dev has a separate lifecycle because it has a 'struct device'
embedded. Thus, having a release callback for it is correct.
Allocating the vm_dev struct with devres totally breaks this protection,
though. Instead of waiting for the vm_dev release callback, the memory
is freed when the platform_device is removed. Resulting in a
use-after-free when finally the callback is to be called.
To easily see the problem, compile the kernel with
CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs.
The fix is easy, don't use devres in this case.
Found during my research about object lifetime problems.
Fixes: 7eb781b1bb ("virtio_mmio: add cleanup for virtio_mmio_probe")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0657b20c5a ]
If a task creates a new block group and that block group becomes unused
before we finish its creation, at btrfs_create_pending_block_groups(),
then when btrfs_mark_bg_unused() is called against the block group, we
assume that the block group is currently in the list of block groups to
reclaim, and we move it out of the list of new block groups and into the
list of unused block groups. This has two consequences:
1) We move it out of the list of new block groups associated to the
current transaction. So the block group creation is not finished and
if we attempt to delete the bg because it's unused, we will not find
the block group item in the extent tree (or the new block group tree),
its device extent items in the device tree etc, resulting in the
deletion to fail due to the missing items;
2) We don't increment the reference count on the block group when we
move it to the list of unused block groups, because we assumed the
block group was on the list of block groups to reclaim, and in that
case it already has the correct reference count. However the block
group was on the list of new block groups, in which case no extra
reference was taken because it's local to the current task. This
later results in doing an extra reference count decrement when
removing the block group from the unused list, eventually leading the
reference count to 0.
This second case was caught when running generic/297 from fstests, which
produced the following assertion failure and stack trace:
[589.559] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4299
[589.559] ------------[ cut here ]------------
[589.559] kernel BUG at fs/btrfs/block-group.c:4299!
[589.560] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[589.560] CPU: 8 PID: 2819134 Comm: umount Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1
[589.560] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[589.560] RIP: 0010:btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.561] Code: 68 62 da c0 (...)
[589.561] RSP: 0018:ffffa55a8c3b3d98 EFLAGS: 00010246
[589.561] RAX: 0000000000000058 RBX: ffff8f030d7f2000 RCX: 0000000000000000
[589.562] RDX: 0000000000000000 RSI: ffffffff953f0878 RDI: 00000000ffffffff
[589.562] RBP: ffff8f030d7f2088 R08: 0000000000000000 R09: ffffa55a8c3b3c50
[589.562] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8f05850b4c00
[589.562] R13: ffff8f030d7f2090 R14: ffff8f05850b4cd8 R15: dead000000000100
[589.563] FS: 00007f497fd2e840(0000) GS:ffff8f09dfc00000(0000) knlGS:0000000000000000
[589.563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[589.563] CR2: 00007f497ff8ec10 CR3: 0000000271472006 CR4: 0000000000370ee0
[589.563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[589.564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[589.564] Call Trace:
[589.564] <TASK>
[589.565] ? __die_body+0x1b/0x60
[589.565] ? die+0x39/0x60
[589.565] ? do_trap+0xeb/0x110
[589.565] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? do_error_trap+0x6a/0x90
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? exc_invalid_op+0x4e/0x70
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? asm_exc_invalid_op+0x16/0x20
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] close_ctree+0x35d/0x560 [btrfs]
[589.568] ? fsnotify_sb_delete+0x13e/0x1d0
[589.568] ? dispose_list+0x3a/0x50
[589.568] ? evict_inodes+0x151/0x1a0
[589.568] generic_shutdown_super+0x73/0x1a0
[589.569] kill_anon_super+0x14/0x30
[589.569] btrfs_kill_super+0x12/0x20 [btrfs]
[589.569] deactivate_locked_super+0x2e/0x70
[589.569] cleanup_mnt+0x104/0x160
[589.570] task_work_run+0x56/0x90
[589.570] exit_to_user_mode_prepare+0x160/0x170
[589.570] syscall_exit_to_user_mode+0x22/0x50
[589.570] ? __x64_sys_umount+0x12/0x20
[589.571] do_syscall_64+0x48/0x90
[589.571] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[589.571] RIP: 0033:0x7f497ff0a567
[589.571] Code: af 98 0e (...)
[589.572] RSP: 002b:00007ffc98347358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[589.572] RAX: 0000000000000000 RBX: 00007f49800b8264 RCX: 00007f497ff0a567
[589.572] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000557f558abfa0
[589.573] RBP: 0000557f558a6ba0 R08: 0000000000000000 R09: 00007ffc98346100
[589.573] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[589.573] R13: 0000557f558abfa0 R14: 0000557f558a6cb0 R15: 0000557f558a6dd0
[589.573] </TASK>
[589.574] Modules linked in: dm_snapshot dm_thin_pool (...)
[589.576] ---[ end trace 0000000000000000 ]---
Fix this by adding a runtime flag to the block group to tell that the
block group is still in the list of new block groups, and therefore it
should not be moved to the list of unused block groups, at
btrfs_mark_bg_unused(), until the flag is cleared, when we finish the
creation of the block group at btrfs_create_pending_block_groups().
Fixes: a9f189716c ("btrfs: move out now unused BG from the reclaim list")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9f189716c ]
An unused block group is easy to remove to free up space and should be
reclaimed fast. Such block group can often already be a target of the
reclaim process. As we check list_empty(&bg->bg_list), we keep it in the
reclaim list. That block group is never reclaimed until the file system
is filled e.g. up to 75%.
Instead, we can move unused block group to the unused list and delete it
fast.
Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1f0a9816f ]
In function ‘fortify_memcpy_chk’,
inlined from ‘get_conn_info_complete’ at net/bluetooth/mgmt.c:7281:2:
include/linux/fortify-string.h:592:25: error: call to
‘__read_overflow2_field’ declared with attribute warning: detected read
beyond size of field (2nd parameter); maybe use struct_group()?
[-Werror=attribute-warning]
592 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
This is due to the wrong member is used for memcpy(). Use correct one.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5251605f4d ]
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GZ301V series which uses an SPI connected Cirrus amp.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230706223323.30871-2-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33d7c9c3bf ]
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG G614J series which uses an SPI connected Cirrus amp.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230704044619.19343-5-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9abc77fb14 ]
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GA402X series which uses an I2C connected Cirrus amp.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230704044619.19343-3-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8cc87c055d ]
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GV601V series which uses an I2C connected Cirrus amp.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230704044619.19343-2-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f4a08fed4 ]
The variable codec->regmap is often protected by the lock
codec->regmap_lock when is accessed. However, it is accessed without
holding the lock when is accessed in snd_hdac_regmap_sync():
if (codec->regmap)
In my opinion, this may be a harmful race, because if codec->regmap is
set to NULL right after the condition is checked, a null-pointer
dereference can occur in the called function regcache_sync():
map->lock(map->lock_arg); --> Line 360 in drivers/base/regmap/regcache.c
To fix this possible null-pointer dereference caused by data race, the
mutex_lock coverage is extended to protect the if statement as well as the
function call to regcache_sync().
[ Note: the lack of the regmap_lock itself is harmless for the current
codec driver implementations, as snd_hdac_regmap_sync() is only for
PM runtime resume that is prohibited during the codec probe.
But the change makes the whole code more consistent, so it's merged
as is -- tiwai ]
Reported-by: BassCheck <bass@buaa.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Link: https://lore.kernel.org/r/20230703031016.1184711-1-islituo@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97498cd610 ]
In a previous commit 2681631c29 ("fs/ntfs3: Add null pointer check to
attr_load_runs_vcn"), ni can be NULL in attr_load_runs_vcn(), and thus it
should be checked before being used.
However, in the call stack of this commit, mft_ni in mi_read() is
aliased with ni in attr_load_runs_vcn(), and it is also used in
mi_read() at two places:
mi_read()
rw_lock = &mft_ni->file.run_lock -> No check
attr_load_runs_vcn(mft_ni, ...)
ni (namely mft_ni) is checked in the previous commit
attr_load_runs_vcn(..., &mft_ni->file.run) -> No check
Thus, to avoid possible null-pointer dereferences, the related checks
should be added.
These bugs are reported by a static analysis tool implemented by myself,
and they are found by extending a known bug fixed in the previous commit.
Thus, they could be theoretical bugs.
Signed-off-by: Jia-Ju Bai <baijiaju@buaa.edu.cn>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b0da5c549 ]
When the msgs are corrupted we need to dump them and then it will
be easier to dig what has happened and where the issue is.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2bfa94408 ]
Intel Barlow Ridge discrete USB4 host router has the same limitation as
the previous generations so make sure the USB3 bandwidth limitation
quirk is applied to Barlow Ridge too.
Signed-off-by: Gil Fine <gil.fine@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f14a21066 ]
Intel Barlow Ridge is the first USB4 v2 controller from Intel. The
controller exposes standard USB4 PCI class ID in typical configurations,
however there is a way to configure it so that it uses a special class
ID to allow using s different driver than the Windows inbox one. For
this reason add the Barlow Ridge PCI ID to the Linux driver too so that
the driver can attach regardless of the class ID.
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6fa0a72cbb ]
Some fields such as gt_logd_secs of the struct gfs2_tune are accessed
without holding the lock gt_spin in gfs2_show_options():
val = sdp->sd_tune.gt_logd_secs;
if (val != 30)
seq_printf(s, ",commit=%d", val);
And thus can cause data races when gfs2_show_options() and other functions
such as gfs2_reconfigure() are concurrently executed:
spin_lock(>->gt_spin);
gt->gt_logd_secs = newargs->ar_commit;
To fix these possible data races, the lock sdp->sd_tune.gt_spin is
acquired before accessing the fields of gfs2_tune and released after these
accesses.
Further changes by Andreas:
- Don't hold the spin lock over the seq_printf operations.
Reported-by: BassCheck <bass@buaa.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53d061c19d ]
USB PHY DPDM wakeup bit is enabled by default, when USB wakeup
is not required(/sys/.../wakeup is disabled), this bit should be
disabled, otherwise we will have unexpected wakeup if do USB device
connect/disconnect while system sleep.
This bit can be enabled for both host and device mode.
Signed-off-by: Li Jun <jun.li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Message-ID: <20230517081907.3410465-3-xu.yang_2@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ac37fbdad ]
As we use bvalid for vbus wakeup source, to save power when
suspend, turn off the vbus comparator for imx7d and imx8mm.
Below is this bit description from RM of iMX8MM
"VBUS Valid Comparator Enable:
This signal controls the USB OTG PHY VBUS Valid comparator which
indicates whether the voltage on the USB_OTG*_VBUS pin is below
the VBUS Valid threshold. The VBUS Valid threshold is nominally
4.75V on this USB PHY. The VBUS Valid threshold can be adjusted
using the USBNC_OTGn_PHY_CFG1[OTGTUNE0] bit field. Status of the
VBUS Valid comparator, when it is enabled, is reported on the
USBNC_OTGn_PHY_STATUS[VBUS_VLD] bit.
When OTGDISABLE0 (USBNC_USB_OTGx_PHY_CFG2[10])is set to 1'b0 and
DRVVBUS0 is set to 1'b1, the Bandgap circuitry and VBUS Valid
comparator are powered, even in Suspend or Sleep mode.
DRVVBUS0 should be reset to 1'b0 when the internal VBUS Valid comparator
is not required, to reduce quiescent current in Suspend or Sleep mode.
- 0 The VBUS Valid comparator is disabled
- 1 The VBUS Valid comparator is enabled"
Signed-off-by: Li Jun <jun.li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Message-ID: <20230517081907.3410465-2-xu.yang_2@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a4776205b ]
The XHCI_PLAT quirk was only needed to ensure non-PCI xHC host avoided
setting up MSI interrupts in generic xhci codepaths.
The MSI setup code is now moved to PCI specific xhci-pci.c file so
the quirk is no longer needed.
Remove setting the XHCI_PLAT quirk for HiSilocon SoC xHC, NVIDIA Tegra xHC,
MediaTek xHC, the generic xhci-plat driver, and the checks for XHCI_PLAT
in xhci-pci.c MSI setup code.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Message-ID: <20230602144009.1225632-5-mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1402ba08ab ]
According to the USB4 retimer guide the correct order is immediately
after sending ENUMERATE_RETIMERS so update the code to follow this.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3df55cd773 ]
If pdev is NULL, then it is still dereferenced.
This fixes this smatch warning:
drivers/media/platform/mediatek/vpu/mtk_vpu.c:570 vpu_load_firmware() warn: address of NULL pointer 'pdev'
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Yunfei Dong <yunfei.dong@mediatek.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c3ff12a92b ]
ISOC transfers expect a certain cadence of requests being queued. Not
keeping up with the expected rate of requests results in missed ISOC
transfers (EXDEV). The application layer may or may not produce video
frames to match this expectation, so uvc gadget driver must handle cases
where the application is not queuing up buffers fast enough to fulfill
ISOC requirements.
Currently, uvc gadget driver waits for new video buffer to become available
before queuing up usb requests. With this patch the gadget driver queues up
0 length usb requests whenever there are no video buffers available. The
USB controller's complete callback is used as the limiter for how quickly
the 0 length packets will be queued. Video buffers are still queued as
soon as they become available.
Link: https://lore.kernel.org/CAMHf4WKbi6KBPQztj9FA4kPvESc1fVKrC8G73-cs6tTeQby9=w@mail.gmail.com/
Signed-off-by: Avichal Rakesh <arakesh@google.com>
Link: https://lore.kernel.org/r/20230508231103.1621375-1-arakesh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e599046994 ]
When serial console over USB is enabled, gs_console_connect
queues gs_console_work, where it acquires the spinlock and
queues the usb request, and this request goes to gadget layer.
Now consider a situation where gadget layer prints something
to dmesg, this will eventually call gs_console_write() which
requires cons->lock. And this causes spinlock recursion. Avoid
this by excluding usb_ep_queue from the spinlock.
spin_lock_irqsave //needs cons->lock
gs_console_write
.
.
_printk
__warn_printk
dev_warn/pr_err
.
.
[USB Gadget Layer]
.
.
usb_ep_queue
gs_console_work
__gs_console_push // acquires cons->lock
process_one_work
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
Link: https://lore.kernel.org/r/1683638872-6885-1-git-send-email-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5b7eb477c ]
From the experiments with camera sensors using SGRBG10_1X10/3280x2464 and
SRGGB10_1X10/3280x2464 formats, it becomes clear that on sdm845 and sm8250
VFE outputs the lines padded to a length multiple of 16 bytes. As in the
current driver the value of the bpl_alignment is set to 8 bytes, the frames
captured in formats with the bytes-per-line value being not a multiple of
16 get corrupted.
Set the bpl_alignment of the camss video output device to 16 for sdm845 and
sm8250 to fix that.
Signed-off-by: Andrey Konovalov <andrey.konovalov@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Acked-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56b5c3e67b ]
Getting below error when using KCSAN to check the driver. Adding lock to
protect parameter num_rdy when getting the value with function:
v4l2_m2m_num_src_bufs_ready/v4l2_m2m_num_dst_bufs_ready.
kworker/u16:3: [name:report&]BUG: KCSAN: data-race in v4l2_m2m_buf_queue
kworker/u16:3: [name:report&]
kworker/u16:3: [name:report&]read-write to 0xffffff8105f35b94 of 1 bytes by task 20865 on cpu 7:
kworker/u16:3: v4l2_m2m_buf_queue+0xd8/0x10c
Signed-off-by: Pina Chen <pina.chen@mediatek.com>
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6bd6cd29c9 ]
Returning early from stm32_usart_serial_remove() results in a resource
leak as several cleanup functions are not called. The driver core ignores
the return value and there is no possibility to clean up later.
uart_remove_one_port() only returns non-zero if there is some
inconsistency (i.e. stm32_usart_driver.state[port->line].uart_port == NULL).
This should never happen, and even if it does it's a bad idea to exit
early in the remove callback without cleaning up.
This prepares changing the prototype of struct platform_driver::remove to
return void. See commit 5c5a7680e6 ("platform: Provide a remove callback
that returns no value") for further details about this quest.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512173810.131447-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99f280700b ]
Don't collect exiting session in smb2_reconnect_server(), because it
will be released soon.
Note that the exiting session will stay in server->smb_ses_list until
it complete the cifs_free_ipc() and logoff() and then delete itself
from the list.
Signed-off-by: Winston Wen <wentao@uniontech.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8635e8df47 ]
This reverts commit cead61a671.
It exported __stack_smash_handler and __guard, while they may not be
defined by anyone.
The code *declares* __stack_smash_handler and __guard. It does not
create weak symbols. If no external library is linked, they are left
undefined, but yet exported.
If a loadable module tries to access non-existing symbols, bad things
(a page fault, NULL pointer dereference, etc.) will happen. So, the
current code is wrong and dangerous.
If the code were written as follows, it would *define* them as weak
symbols so modules would be able to get access to them.
void (*__stack_smash_handler)(void *) __attribute__((weak));
EXPORT_SYMBOL(__stack_smash_handler);
long __guard __attribute__((weak));
EXPORT_SYMBOL(__guard);
In fact, modpost forbids exporting undefined symbols. It shows an error
message if it detects such a mistake.
ERROR: modpost: "..." [...] was exported without definition
Unfortunately, it is checked only when the code is built as modular.
The problem described above has been unnoticed for a long time because
arch/um/os-Linux/user_syms.c is always built-in.
With a planned change in Kbuild, exporting undefined symbols will always
result in a build error instead of a run-time error. It is a good thing,
but we need to fix the breakage in advance.
One fix is to define weak symbols as shown above. An alternative is to
export them conditionally as follows:
#ifdef CONFIG_STACKPROTECTOR
extern void __stack_smash_handler(void *);
EXPORT_SYMBOL(__stack_smash_handler);
external long __guard;
EXPORT_SYMBOL(__guard);
#endif
This is what other architectures do; EXPORT_SYMBOL(__stack_chk_guard)
is guarded by #ifdef CONFIG_STACKPROTECTOR.
However, adding the #ifdef guard is not sensible because UML cannot
enable the stack-protector in the first place! (Please note UML does
not select HAVE_STACKPROTECTOR in Kconfig.)
So, the code is already broken (and unused) in multiple ways.
Just remove.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30f90f3c1c ]
[Why]
Hardware implements root clock gating by utilizing the DPP DTO registers
with a special case of DTO enabled, phase = 0, modulo = 1. This
conflicts with our policy to always update the DPPDTO for cases where
it's expected to be disabled.
The pipes unexpectedly enter a higher power state than expected because
of this programming flow.
[How]
Guard the upper layers of HWSS against this hardware quirk with
programming the register with an internal state flag in DCCG.
While technically acting as global state for the DCCG, HWSS shouldn't be
expected to understand the hardware quirk for having DTO disabled
causing more power than DTO enabled with this specific setting.
This also prevents sequencing errors from occuring in the future if
we have to program DPP DTO in multiple locations.
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e60ab4eb3 ]
[Description]
- Previously we wanted to apply extra 60us of prefetch for min DCFCLK
(200Mhz), but DCFCLK can be calculated to be 201Mhz which underflows
also without the extra prefetch
- Instead, apply the the extra 60us prefetch for any DCFCLK freq <=
300Mhz
Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a31e8b89b ]
[Why]
Calls to dcn20_adjust_freesync_v_startup are no longer
needed as of dcn3+ and can cause underflow in some cases
[How]
Move calls to dcn20_adjust_freesync_v_startup up into
validate_bandwidth for dcn2.x
Reviewed-by: Jun Lei <jun.lei@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Daniel Miess <daniel.miess@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 187916e6ed ]
When using cpu to update page tables, vm update fences are unused.
Install stub fence into these fence pointers instead of NULL
to avoid NULL dereference when calling dma_fence_wait() on them.
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66419036f6 ]
An Interrupt Remapping Table (IRT) stores interrupt remapping configuration
for each device. In a normal operation, the AMD IOMMU caches the table
to optimize subsequent data accesses. This requires the IOMMU driver to
invalidate IRT whenever it updates the table. The invalidation process
includes issuing an INVALIDATE_INTERRUPT_TABLE command following by
a COMPLETION_WAIT command.
However, there are cases in which the IRT is updated at a high rate.
For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every
vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large
amount of vcpus and VFIO PCI pass-through devices, the invalidation
process could potentially become a performance bottleneck.
Introducing a new kernel boot option:
amd_iommu=irtcachedis
which disables IRTE caching by setting the IRTCachedis bit in each IOMMU
Control register, and bypass the IRT invalidation process.
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 48aea8b445 ]
Adds the USB and Bluetooth IDs for the Logitech G915 TKL keyboard, for device detection
For this device, this provides battery reporting on top of hid-generic
Reviewed-by: Bastien Nocera <hadess@hadess.net>
Signed-off-by: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7607f12ba7 ]
In the beginning, commit 18eeef46d3 ("HID: i2c-hid: goodix: Tie the
reset line to true state of the regulator") introduced a change to tie
the reset line of the Goodix touchscreen to the state of the regulator
to fix a power leakage issue in suspend.
After some time, the change was deemed unnecessary and was reverted in
commit 557e05fa9f ("HID: i2c-hid: goodix: Stop tying the reset line to
the regulator") due to difficulties in managing regulator notifiers for
designs like Evoker, which provides a second power rail to touchscreen.
However, the revert caused a power regression on another Chromebook
device Steelix in the field, which has a dedicated always-on regulator
for touchscreen and was covered by the workaround in the first commit.
To address both cases, this patch adds the support for the new
"goodix,no-reset-during-suspend" property in the driver:
- When set to true, the driver does not assert the reset GPIO during
power-down.
Instead, the GPIO will be asserted during power-up to ensure the
touchscreen always has a clean start and consistent behavior after
resuming.
This is for designs with a dedicated always-on regulator.
- When set to false or unset, the driver uses the original control flow
and asserts GPIO and disables regulators normally.
This is for the two-regulator and shared-regulator designs.
Signed-off-by: Fei Shao <fshao@chromium.org>
Suggested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Jeff LaBundy <jeff@labundy.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 359ed24a0d ]
We observed that on Chromebook device Steelix, if Goodix GT7375P
touchscreen is powered in suspend (because, for example, it connects to
an always-on regulator) and with the reset GPIO asserted, it will
introduce about 14mW power leakage.
To address that, we add this property to skip reset during suspend.
If it's set, the driver will stop asserting the reset GPIO during
power-down. Refer to the comments in the driver for details.
Signed-off-by: Fei Shao <fshao@chromium.org>
Suggested-by: Jeff LaBundy <jeff@labundy.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Reviewed-by: Jeff LaBundy <jeff@labundy.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 314a7ffd7c ]
This commit fixes a memory leak caused when clearing the user_mappings
info when a new context is opened immediately after user_mapping is
captured and a hard reset is performed.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8b9cea584 ]
Currently upon a heartbeat failure, we don't know if the failure
is due to firmware hang or due to a bad PCI link. Hence, we
are reading a PCI config space register with a known value (vendor ID)
so we will know which of the two possibilities caused the heartbeat
failure.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f751b99255 ]
The functionality described in Commit 61bef9e68d ("ASoC: SOF: Intel: hda: enforce exclusion between HDaudio and SoundWire")
does not seem to be properly implemented with two issues that need to
be corrected.
a) The test used is incorrect when DisplayAudio codecs are not supported.
b) Conversely when only Display Audio codecs can be found, we do want
to start the SoundWire links, if any. That will help add the relevant
topologies and machine descriptors, and identify cases where the
SoundWire information in ACPI needs to be modified with a quirk.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/20230606222529.57156-2-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b407460ee9 ]
It is considered good practice to call cpu_relax() in busy loops, see
Documentation/process/volatile-considered-harmful.rst. This can not
only lower CPU power consumption or yield to a hyperthreaded twin
processor, but also allows an architecture to mitigate hardware issues
(e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the
architecture-specific cpu_relax() implementation.
In addition, cpu_relax() is also a compiler barrier. It is not
immediately obvious that the @op argument "function" will result in an
actual function call (e.g. in case of inlining).
Where a function call is a C sequence point, this is lost on inlining.
Therefore, with agressive enough optimization it might be possible for
the compiler to hoist the:
(val) = op(args);
"load" out of the loop because it doesn't see the value changing. The
addition of cpu_relax() would inhibit this.
As the iopoll helpers lack calls to cpu_relax(), people are sometimes
reluctant to use them, and may fall back to open-coded polling loops
(including cpu_relax() calls) instead.
Fix this by adding calls to cpu_relax() to the iopoll helpers:
- For the non-atomic case, it is sufficient to call cpu_relax() in
case of a zero sleep-between-reads value, as a call to
usleep_range() is a safe barrier otherwise. However, it doesn't
hurt to add the call regardless, for simplicity, and for similarity
with the atomic case below.
- For the atomic case, cpu_relax() must be called regardless of the
sleep-between-reads value, as there is no guarantee all
architecture-specific implementations of udelay() handle this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/45c87bec3397fdd704376807f0eec5cc71be440f.1685692810.git.geert+renesas@glider.be
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d14bd943f ]
Fix USB-related warnings in prtrvt, prtvt7, prti6q and prtwd2 device trees
by disabling unused usbphynop1 and usbphynop2 USB PHYs and providing proper
configuration for the over-current detection. This fixes the following
warnings with the current kernel:
usb_phy_generic usbphynop1: dummy supplies not allowed for exclusive requests
usb_phy_generic usbphynop2: dummy supplies not allowed for exclusive requests
imx_usb 2184200.usb: No over current polarity defined
By the way, fix over-current detection on usbotg port for prtvt7, prti6q
and prtwd2 boards. Only prtrvt do not have OC on USB OTG port.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e89f45edb7 ]
We have SOF and generic ACP support enabled for Vangogh platform
on some machines. Since we have same PCI id used for probing,
add check for machine configuration flag to avoid conflict with
newer pci drivers. Such machine flag has been initialized via
dmi match on few Vangogh based machines. If no flag is
specified probe and register older platform device.
Signed-off-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com>
Link: https://lore.kernel.org/r/20230530110802.674939-1-venkataprasad.potturu@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 608f1b0dbd ]
Each time we go through dsp_work() it does a devm_kasprintf() to
allocate memory to hold the part name string. It's not strictly a memory
leak because devm will free it all if the driver is removed. But we keep
allocating more and more memory to hold the same string.
Move the allocation so that it is performed after the version and
secured state information is gathered and handle allocation errors.
Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/Message-Id: <20230518150250.1121006-2-rf@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f38129bb08 ]
This reverts commit 80c6d6804f.
The orignal commit was intended as a workaround to prevent underflow and
flickering when using one normal monitor and the other high refresh rate
monitor (> 120Hz).
This patch is being reverted in favour of a software solution to enable
SubVP+DRR
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87c2213e85 ]
The type of size is unsigned int, if size is 0x40000000, there will
be an integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: hackyzh002 <hackyzh002@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0138250150 ]
The following call trace is observed when removing the amdgpu driver, which
is caused by that BOs allocated for psp are not freed until removing.
[61811.450562] RIP: 0010:amddrm_buddy_fini.cold+0x29/0x47 [amddrm_buddy]
[61811.450577] Call Trace:
[61811.450577] <TASK>
[61811.450579] amdgpu_vram_mgr_fini+0x135/0x1c0 [amdgpu]
[61811.450728] amdgpu_ttm_fini+0x207/0x290 [amdgpu]
[61811.450870] amdgpu_bo_fini+0x27/0xa0 [amdgpu]
[61811.451012] gmc_v9_0_sw_fini+0x4a/0x60 [amdgpu]
[61811.451166] amdgpu_device_fini_sw+0x117/0x520 [amdgpu]
[61811.451306] amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
[61811.451447] devm_drm_dev_init_release+0x4d/0x80 [drm]
[61811.451466] devm_action_release+0x15/0x20
[61811.451469] release_nodes+0x40/0xb0
[61811.451471] devres_release_all+0x9b/0xd0
[61811.451473] __device_release_driver+0x1bb/0x2a0
[61811.451476] driver_detach+0xf3/0x140
[61811.451479] bus_remove_driver+0x6c/0xf0
[61811.451481] driver_unregister+0x31/0x60
[61811.451483] pci_unregister_driver+0x40/0x90
[61811.451486] amdgpu_exit+0x15/0x447 [amdgpu]
For smu v13_0_2, if the GPU supports xgmi, refer to
commit f5c7e77970 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2"),
it will run gpu recover in AMDGPU_RESET_FOR_DEVICE_REMOVE mode when removing,
which makes all devices in hive list have hw reset but no resume except the
basic ip blocks, then other ip blocks will not call .hw_fini according to
ip_block.status.hw.
Since psp_free_shared_bufs just includes some software operations, so move
it to psp_sw_fini.
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96c7c2f4d5 ]
It already happend a few times that patches slipped through which
implemented access to an entity through a job that was already removed
from the entities queue. Since jobs and entities might have different
lifecycles, this can potentially cause UAF bugs.
In order to make it obvious that a jobs entity pointer shouldn't be
accessed after drm_sched_entity_pop_job() was called successfully, set
the jobs entity pointer to NULL once the job is removed from the entity
queue.
Moreover, debugging a potential NULL pointer dereference is way easier
than potentially corrupted memory through a UAF.
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://lore.kernel.org/r/20230418100453.4433-1-dakr@redhat.com
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 128c1ca030 ]
[Why&How]
- Implement interface to program DTBCLK DTO’s
according to reference DTBCLK returned by PMFW
- This is required because DTO programming
requires exact DTBCLK reference freq or it could
result in underflow
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e58f30246c ]
In commit 7beecaf7d5 ("net: phy: at803x: improve the WOL feature"), it
seems not correct to use a wol_en bit in a 1588 Control Register which is
only available on AR8031/AR8033(share the same phy_id) to determine if WoL
is enabled. Change it back to use AT803X_INTR_ENABLE_WOL for determining
the WoL status which is applicable on all chips supporting wol. Also update
the at803x_set_wol() function to only update the 1588 register on chips
having it. After this change, disabling wol at probe from commit
d7cd5e06c9 ("net: phy: at803x: disable WOL at probe") is no longer
needed. Change it to just disable the WoL bit in 1588 register for
AR8031/AR8033 to be aligned with AT803X_INTR_ENABLE_WOL in probe.
Fixes: 7beecaf7d5 ("net: phy: at803x: improve the WOL feature")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Viorel Suman <viorel.suman@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 988e8d90b3 ]
Use devm_regulator_get_enable_optional() instead of hand writing it. It
saves some line of code.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: e58f30246c ("net: phy: at803x: fix the wol setting functions")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 6ccbd7fd47 upstream.
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization.
Commit c5a130325f ("ACPI/APEI: Add parameter check before error
injection") exported page_is_ram(), hence the __init annotation should
be removed.
This fixes the modpost warning in ARCH=alpha builds:
WARNING: modpost: vmlinux: page_is_ram: EXPORT_SYMBOL used for init symbol. Remove __init or EXPORT_SYMBOL.
Fixes: c5a130325f ("ACPI/APEI: Add parameter check before error injection")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1cd0302be5 upstream.
The ACPI device CSC3556 is a Cirrus Logic CS35L56 mono amplifier which
is used in multiples, and can be connected either to I2C or SPI.
There will be multiple instances under the same Device() node. Add it
to ignore_serial_bus_ids and handle it in the serial-multi-instantiate
driver.
There can be a 5th I2cSerialBusV2, but this is an alias address and doesn't
represent a real device. Ignore this by having a dummy 5th entry in the
serial-multi-instantiate instance list with the name of a non-existent
driver, on the same pattern as done for bsg2150.
Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230728111345.7224-1-rf@opensource.cirrus.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 676b7c5eca upstream.
The current code assumes that the CSC3551(multiple cs35l41) always have
its interrupt pin connected to GPIO thus the IRQ can be acquired with
acpi_dev_gpio_irq_get. However on some newer laptop models this is no
longer the case as they have the CSC3551's interrupt pin connected to
APIC. This causes smi_i2c_probe to fail on these machines.
To support these machines, a new macro IRQ_RESOURCE_AUTO was introduced
for cs35l41 smi_node, and smi_get_irq function was modified so it tries
to get GPIO irq resource first and if failed, tries to get
APIC irq resource for cs35l41.
This patch affects only the cs35l41's probing and brings no negative
influence on machines that indeed have the cs35l41's interrupt pin
connected to GPIO.
Signed-off-by: David Xu <xuwd1@hotmail.com>
Link: https://lore.kernel.org/r/SY4P282MB18350CD8288687B87FFD2243E037A@SY4P282MB1835.AUSP282.PROD.OUTLOOK.COM
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b6aa6610d upstream.
The lenovo-ymc driver is causing the keyboard + touchpad to stop working
on some regular laptop models such as the Lenovo ThinkBook 13s G2 ITL 20V9.
The problem is that there are YMC WMI GUID methods in the ACPI tables
of these laptops, despite them not being Yogas and lenovo-ymc loading
causes libinput to see a SW_TABLET_MODE switch with state 1.
This in turn causes libinput to ignore events from the builtin keyboard
and touchpad, since it filters those out for a Yoga in tablet mode.
Similar issues with false-positive SW_TABLET_MODE=1 reporting have
been seen with the intel-hid driver.
Copy the intel-hid driver approach to fix this and only bind to the WMI
device on machines where the DMI chassis-type indicates the machine
is a convertible.
Add a 'force' module parameter to allow overriding the chassis-type check
so that users can easily test if the YMC interface works on models which
report an unexpected chassis-type.
Fixes: e82882cdd2 ("platform/x86: Add driver for Yoga Tablet Mode switch")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229373
Cc: André Apitzsch <git@apitzsch.eu>
Cc: stable@vger.kernel.org
Tested-by: Andrew Kallmeyer <kallmeyeras@gmail.com>
Tested-by: Gergő Köteles <soyer@irl.hu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230812144818.383230-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a66d59b5f upstream.
The msi-ec driver fails to build for me (gcc 7.5):
CC [M] drivers/platform/x86/msi-ec.o
drivers/platform/x86/msi-ec.c:72:6: error: initializer element is not constant
{ SM_ECO_NAME, 0xc2 },
^~~~~~~~~~~
drivers/platform/x86/msi-ec.c:72:6: note: (near initialization for ‘CONF0.shift_mode.modes[0].name’)
drivers/platform/x86/msi-ec.c:73:6: error: initializer element is not constant
{ SM_COMFORT_NAME, 0xc1 },
^~~~~~~~~~~~~~~
drivers/platform/x86/msi-ec.c:73:6: note: (near initialization for ‘CONF0.shift_mode.modes[1].name’)
drivers/platform/x86/msi-ec.c:74:6: error: initializer element is not constant
{ SM_SPORT_NAME, 0xc0 },
^~~~~~~~~~~~~
drivers/platform/x86/msi-ec.c:74:6: note: (near initialization for ‘CONF0.shift_mode.modes[2].name’)
(...)
Don't try to be smart, just use defines for the constant strings. The
compiler will recognize it's the same string and will store it only
once in the data section anyway.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 392cacf2aa ("platform/x86: Add new msi-ec driver")
Cc: stable@vger.kernel.org
Cc: Nikita Kravets <teackot@gmail.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mark Gross <markgross@kernel.org>
Link: https://lore.kernel.org/r/20230805101010.54d49e91@endymion.delvare
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef222f551e upstream.
While performing certain power-off sequences, PCI drivers are called to
suspend and resume their underlying devices through PCI PM (power
management) interface. However the hardware does not support PCI PM
suspend/resume operations so system wide suspend/resume leads to bad MFW
(management firmware) state which causes various follow-up errors in driver
when communicating with the device/firmware.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding system
to go into suspended/standby mode.
Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230807093725.46829-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1516ee035d upstream.
While performing certain power-off sequences, PCI drivers are called to
suspend and resume their underlying devices through PCI PM (power
management) interface. However the hardware does not support PCI PM
suspend/resume operations so system wide suspend/resume leads to bad MFW
(management firmware) state which causes various follow-up errors in driver
when communicating with the device/firmware.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding system
to go into suspended/standby mode.
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230807093725.46829-2-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 175544ad48 upstream.
Hyper-V provides the ability to connect Fibre Channel LUNs to the host
system and present them in a guest VM as a SCSI device. I/O to the vFC
device is handled by the storvsc driver. The storvsc driver includes a
partial integration with the FC transport implemented in the generic
portion of the Linux SCSI subsystem so that FC attributes can be displayed
in /sys. However, the partial integration means that some aspects of vFC
don't work properly. Unfortunately, a full and correct integration isn't
practical because of limitations in what Hyper-V provides to the guest.
In particular, in the context of Hyper-V storvsc, the FC transport timeout
function fc_eh_timed_out() causes a kernel panic because it can't find the
rport and dereferences a NULL pointer. The original patch that added the
call from storvsc_eh_timed_out() to fc_eh_timed_out() is faulty in this
regard.
In many cases a timeout is due to a transient condition, so the situation
can be improved by just continuing to wait like with other I/O requests
issued by storvsc, and avoiding the guaranteed panic. For a permanent
failure, continuing to wait may result in a hung thread instead of a panic,
which again may be better.
So fix the panic by removing the storvsc call to fc_eh_timed_out(). This
allows storvsc to keep waiting for a response. The change has been tested
by users who experienced a panic in fc_eh_timed_out() due to transient
timeouts, and it solves their problem.
In the future we may want to deprecate the vFC functionality in storvsc
since it can't be fully fixed. But it has current users for whom it is
working well enough, so it should probably stay for a while longer.
Fixes: 3930d73098 ("scsi: storvsc: use default I/O timeout handler for FC devices")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1690606764-79669-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9426d3cef5 upstream.
(lightly modified commit message mostly by Linus Torvalds)
The parsing code for /proc/scsi/scsi is disgusting and broken. We should
have just used 'sscanf()' or something simple like that, but the logic may
actually predate our kernel sscanf library routine for all I know. It
certainly predates both git and BK histories.
And we can't change it to be something sane like that now, because the
string matching at the start is done case-insensitively, and the separator
parsing between numbers isn't done at all, so *any* separator will work,
including a possible terminating NUL character.
This interface is root-only, and entirely for legacy use, so there is
absolutely no point in trying to tighten up the parsing. Because any
separator has traditionally worked, it's entirely possible that people have
used random characters rather than the suggested space.
So don't bother to try to pretty it up, and let's just make a minimal patch
that can be back-ported and we can forget about this whole sorry thing for
another two decades.
Just make it at least not read past the end of the supplied data.
Link: https://lore.kernel.org/linux-scsi/b570f5fe-cb7c-863a-6ed9-f6774c219b88@cybernetics.com/
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin K Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92fb94b69c upstream.
We set cache_block_group_error if btrfs_cache_block_group() returns an
error, this is because we could end up not finding space to allocate and
mistakenly return -ENOSPC, and which could then abort the transaction
with the incorrect errno, and in the case of ENOSPC result in a
WARN_ON() that will trip up tests like generic/475.
However there's the case where multiple threads can be racing, one
thread gets the proper error, and the other thread doesn't actually call
btrfs_cache_block_group(), it instead sees ->cached ==
BTRFS_CACHE_ERROR. Again the result is the same, we fail to allocate
our space and return -ENOSPC. Instead we need to set
cache_block_group_error to -EIO in this case to make sure that if we do
not make our allocation we get the appropriate error returned back to
the caller.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ebcd021c9 upstream.
[BUG]
Syzbot reported a crash that an ASSERT() got triggered inside
prepare_to_merge().
That ASSERT() makes sure the reloc tree is properly pointed back by its
subvolume tree.
[CAUSE]
After more debugging output, it turns out we had an invalid reloc tree:
BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17
Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM,
QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree.
But reloc trees can only exist for subvolumes, as for non-subvolume
trees, we just COW the involved tree block, no need to create a reloc
tree since those tree blocks won't be shared with other trees.
Only subvolumes tree can share tree blocks with other trees (thus they
have BTRFS_ROOT_SHAREABLE flag).
Thus this new debug output proves my previous assumption that corrupted
on-disk data can trigger that ASSERT().
[FIX]
Besides the dedicated fix and the graceful exit, also let tree-checker to
check such root keys, to make sure reloc trees can only exist for subvolumes.
CC: stable@vger.kernel.org # 5.15+
Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05d7ce5045 upstream.
[BUG]
Syzbot reported a crash that an ASSERT() got triggered inside
prepare_to_merge().
[CAUSE]
The root cause of the triggered ASSERT() is we can have a race between
quota tree creation and relocation.
This leads us to create a duplicated quota tree in the
btrfs_read_fs_root() path, and since it's treated as fs tree, it would
have ROOT_SHAREABLE flag, causing us to create a reloc tree for it.
The bug itself is fixed by a dedicated patch for it, but this already
taught us the ASSERT() is not something straightforward for
developers.
[ENHANCEMENT]
Instead of using an ASSERT(), let's handle it gracefully and output
extra info about the mismatch reloc roots to help debug.
Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside
merge_reloc_roots() later.
Also replace those ASSERT(0)s with WARN_ON()s.
CC: stable@vger.kernel.org # 5.15+
Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12b2d64e59 upstream.
When the call to btrfs_reloc_clone_csums in cow_file_range returns an
error, we jump to the out_unlock label with the extent_reserved variable
set to false. The cleanup at the label will then call
extent_clear_unlock_delalloc on the range from start to end. But we've
already added cur_alloc_size to start before the jump, so there might no
range be left from the newly incremented start to end. Move the check for
'start < end' so that it is reached by also for the !extent_reserved case.
CC: stable@vger.kernel.org # 6.1+
Fixes: a315e68f6e ("Btrfs: fix invalid attempt to free reserved space on failure to cow range")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c25699871 upstream.
__extent_writepage could have started on more pages than the one it was
called for. This happens regularly for zoned file systems, and in theory
could happen for compressed I/O if the worker thread was executed very
quickly. For such pages extent_write_cache_pages waits for writeback
to complete before moving on to the next page, which is highly inefficient
as it blocks the flusher thread.
Port over the PageDirty check that was added to write_cache_pages in
commit 515f4a037f ("mm: write_cache_pages optimise page cleaning") to
fix this.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit effa24f689 upstream.
extent_write_cache_pages stops writing pages as soon as nr_to_write hits
zero. That is the right thing for opportunistic writeback, but incorrect
for data integrity writeback, which needs to ensure that no dirty pages
are left in the range. Thus only stop the writeback for WB_SYNC_NONE
if nr_to_write hits 0.
This is a port of write_cache_pages changes in commit 05fe478dd0
("mm: write_cache_pages integrity fix").
Note that I've only trigger the problem with other changes to the btrfs
writeback code, but this condition seems worthwhile fixing anyway.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ updated comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc1f91b923 upstream.
Recently we've been having mysterious hangs while running generic/475 on
the CI system. This turned out to be something like this:
Task 1
dmsetup suspend --nolockfs
-> __dm_suspend
-> dm_wait_for_completion
-> dm_wait_for_bios_completion
-> Unable to complete because of IO's on a plug in Task 2
Task 2
wb_workfn
-> wb_writeback
-> blk_start_plug
-> writeback_sb_inodes
-> Infinite loop unable to make an allocation
Task 3
cache_block_group
->read_extent_buffer_pages
->Waiting for IO to complete that can't be submitted because Task 1
suspended the DM device
The problem here is that we need Task 2 to be scheduled completely for
the blk plug to flush. Normally this would happen, we normally wait for
the block group caching to finish (Task 3), and this schedule would
result in the block plug flushing.
However if there's enough free space available from the current caching
to satisfy the allocation we won't actually wait for the caching to
complete. This check however just checks that we have enough space, not
that we can make the allocation. In this particular case we were trying
to allocate 9MiB, and we had 10MiB of free space, but we didn't have
9MiB of contiguous space to allocate, and thus the allocation failed and
we looped.
We specifically don't cycle through the FFE loop until we stop finding
cached block groups because we don't want to allocate new block groups
just because we're caching, so we short circuit the normal loop once we
hit LOOP_CACHING_WAIT and we found a caching block group.
This is normally fine, except in this particular case where the caching
thread can't make progress because the DM device has been suspended.
Fix this by not only waiting for free space to >= the amount of space we
want to allocate, but also that we make some progress in caching from
the time we start waiting. This will keep us from busy looping when the
caching is taking a while but still theoretically has enough space for
us to allocate from, and fixes this particular case by forcing us to
actually sleep and wait for forward progress, which will flush the plug.
With this fix we're no longer hanging with generic/475.
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a78d5db9c upstream.
Simulated chips use a mutex for synchronization in driver callbacks so
they must not be called from interrupt context. Set the can_sleep field
of the GPIO chip to true to force users to only use threaded irqs.
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6db541ae27 upstream.
If a login request fails, the recovery process should be protected
against parallel resets. It is a known issue that freeing and
registering CRQ's in quick succession can result in a failover CRQ from
the VIOS. Processing a failover during login recovery is dangerous for
two reasons:
1. This will result in two parallel initialization processes, this can
cause serious issues during login.
2. It is possible that the failover CRQ is received but never executed.
We get notified of a pending failover through a transport event CRQ.
The reset is not performed until a INIT CRQ request is received.
Previously, if CRQ init fails during login recovery, then the ibmvnic
irq is freed and the login process returned error. If failover_pending
is true (a transport event was received), then the ibmvnic device
would never be able to process the reset since it cannot receive the
CRQ_INIT request due to the irq being freed. This leaved the device
in a inoperable state.
Therefore, the login failure recovery process must be hardened against
these possible issues. Possible failovers (due to quick CRQ free and
init) must be avoided and any issues during re-initialization should be
dealt with instead of being propagated up the stack. This logic is
similar to that of ibmvnic_probe().
Fixes: dff515a3e7 ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-5-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 23cc5f6674 upstream.
Perform a partial reset before sending a login request if any of the
following are true:
1. If a previous request times out. This can be dangerous because the
VIOS could still receive the old login request at any point after
the timeout. Therefore, it is best to re-register the CRQ's and
sub-CRQ's before retrying.
2. If the previous request returns an error that is not described in
PAPR. PAPR provides procedures if the login returns with partial
success or aborted return codes (section L.5.1) but other values
do not have a defined procedure. Previously, these conditions
just returned error from the login function rather than trying
to resolve the issue.
This can cause further issues since most callers of the login
function are not prepared to handle an error when logging in. This
improper cleanup can lead to the device being permanently DOWN'd.
For example, if the VIOS believes that the device is already logged
in then it will return INVALID_STATE (-7). If we never re-register
CRQ's then it will always think that the device is already logged
in. This leaves the device inoperable.
The partial reset involves freeing the sub-CRQs, freeing the CRQ then
registering and initializing a new CRQ and sub-CRQs. This essentially
restarts all communication with VIOS to allow for a fresh login attempt
that will be unhindered by any previous failed attempts.
Fixes: dff515a3e7 ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-4-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d78a671eb8 upstream.
Rather than leaving the DMA unmapping of the login buffers to the
login response handler, move this work into the login release functions.
Previously, these functions were only used for freeing the allocated
buffers. This could lead to issues if there are more than one
outstanding login buffer requests, which is possible if a login request
times out.
If a login request times out, then there is another call to send login.
The send login function makes a call to the login buffer release
function. In the past, this freed the buffers but did not DMA unmap.
Therefore, the VIOS could still write to the old login (now freed)
buffer. It is for this reason that it is a good idea to leave the DMA
unmap call to the login buffers release function.
Since the login buffer release functions now handle DMA unmapping,
remove the duplicate DMA unmapping in handle_login_rsp().
Fixes: dff515a3e7 ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-3-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db17ba719b upstream.
Ensure that all offsets in a login response buffer are within the size
of the allocated response buffer. Any offsets or lengths that surpass
the allocation are likely the result of an incomplete response buffer.
In these cases, a full reset is necessary.
When attempting to login, the ibmvnic device will allocate a response
buffer and pass a reference to the VIOS. The VIOS will then send the
ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled
with data. If the ibmvnic device does not get a response in 20 seconds,
the old buffer is freed and a new login request is sent. With 2
outstanding requests, any LOGIN_RSP CRQ's could be for the older
login request. If this is the case then the login response buffer (which
is for the newer login request) could be incomplete and contain invalid
data. Therefore, we must enforce strict sanity checks on the response
buffer values.
Testing has shown that the `off_rxadd_buff_size` value is filled in last
by the VIOS and will be the smoking gun for these circumstances.
Until VIOS can implement a mechanism for tracking outstanding response
buffers and a method for mapping a LOGIN_RSP CRQ to a particular login
response buffer, the best ibmvnic can do in this situation is perform a
full reset.
Fixes: dff515a3e7 ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-1-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aab8e1a200 upstream.
Handling pci errors should fully teardown and load back auxiliary
devices, same as done through mlx5 health recovery flow.
Fixes: 72ed5d5624 ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2dc2b3922d upstream.
When querying eswitch functions 0 is a valid number of host VFs. After
introducing ARM SRIOV falling through to getting the max value from PCI
results in using the total VFs allowed on the ARM for the host.
Fixes: 86eec50bea ("net/mlx5: Support querying max VFs from device");
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8bfe1e19fb upstream.
Fixing wrong calculation of the modify hdr pattern size,
where the previously calculated number would not be enough
to accommodate the required number of actions.
Fixes: da5d0027d6 ("net/mlx5: DR, Add cache for modify header pattern")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac5da544a3 upstream.
The flow rule can be splited, and the extra post_act rules are added
to post_act table. It's possible to trigger memleak when the rule
forwards packets from internal port and over tunnel, in the case that,
for example, CT 'new' state offload is allowed. As int_port object is
assigned to the flow attribute of post_act rule, and its refcnt is
incremented by mlx5e_tc_int_port_get(), but mlx5e_tc_int_port_put() is
not called, the refcnt is never decremented, then int_port is never
freed.
The kmemleak reports the following error:
unreferenced object 0xffff888128204b80 (size 64):
comm "handler20", pid 50121, jiffies 4296973009 (age 642.932s)
hex dump (first 32 bytes):
01 00 00 00 19 00 00 00 03 f0 00 00 04 00 00 00 ................
98 77 67 41 81 88 ff ff 98 77 67 41 81 88 ff ff .wgA.....wgA....
backtrace:
[<00000000e992680d>] kmalloc_trace+0x27/0x120
[<000000009e945a98>] mlx5e_tc_int_port_get+0x3f3/0xe20 [mlx5_core]
[<0000000035a537f0>] mlx5e_tc_add_fdb_flow+0x473/0xcf0 [mlx5_core]
[<0000000070c2cec6>] __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
[<000000005cc84048>] mlx5e_configure_flower+0xd40/0x4c40 [mlx5_core]
[<000000004f8a2031>] mlx5e_rep_indr_offload.isra.0+0x10e/0x1c0 [mlx5_core]
[<000000007df797dc>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core]
[<0000000016c15cc3>] tc_setup_cb_add+0x1cf/0x410
[<00000000a63305b4>] fl_hw_replace_filter+0x38f/0x670 [cls_flower]
[<000000008bc9e77c>] fl_change+0x1fd5/0x4430 [cls_flower]
[<00000000e7f766e4>] tc_new_tfilter+0x867/0x2010
[<00000000e101c0ef>] rtnetlink_rcv_msg+0x6fc/0x9f0
[<00000000e1111d44>] netlink_rcv_skb+0x12c/0x360
[<0000000082dd6c8b>] netlink_unicast+0x438/0x710
[<00000000fc568f70>] netlink_sendmsg+0x794/0xc50
[<0000000016e92590>] sock_sendmsg+0xc5/0x190
So fix this by moving int_port cleanup code to the flow attribute
free helper, which is used by all the attribute free cases.
Fixes: 8300f22526 ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 863676fe1a upstream.
Disabling IDXD device doesn't reset Page Request Service (PRS)
disable flag to its initial value 0. This may cause user confusion
because once PRS is disabled user will see PRS still remains the
previous setting (i.e. disabled) via sysfs interface even after the
device is disabled.
To eliminate user confusion, reset PRS disable flag to ensure that
the PRS flag bit reflects correct state after the device is disabled.
Additionally, simplify the code by setting wq->flags to 0, which clears
all flag bits, including any future additions.
Fixes: f2dc327131 ("dmaengine: idxd: add per wq PRS disable")
Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230712193505.3440752-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a46781c89 upstream.
When 'mcf_edma' is allocated, some space is allocated for a
flexible array at the end of the struct. 'chans' item are allocated, that is
to say 'pdata->dma_channels'.
Then, this number of item is stored in 'mcf_edma->n_chans'.
A few lines later, if 'mcf_edma->n_chans' is 0, then a default value of 64
is set.
This ends to no space allocated by devm_kzalloc() because chans was 0, but
64 items are read and/or written in some not allocated memory.
Change the logic to define a default value before allocating the memory.
Fixes: e7a3ff92ea ("dmaengine: fsl-edma: add ColdFire mcf5441x edma support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/f55d914407c900828f6fad3ea5fa791a5f17b9a4.1685172449.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e3d20617b upstream.
hns3_dbg_fill_content()/hclge_dbg_fill_content() is aim to integrate some
items to a string for content, and we add '\n' and '\0' in the last
two bytes of content.
strscpy() will add '\0' in the last byte of destination buffer(one of
items), it result in finishing content print ahead of schedule and some
dump content truncation.
One Error log shows as below:
cat mac_list/uc
UC MAC_LIST:
Expected:
UC MAC_LIST:
FUNC_ID MAC_ADDR STATE
pf 00:2b:19:05:03:00 ACTIVE
The destination buffer is length-bounded and not required to be
NUL-terminated, so just change strscpy() to memcpy() to fix it.
Fixes: 1cf3d5567f ("net: hns3: fix strncpy() not using dest-buf length as length issue")
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20230809020902.1941471-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8743aeff5b upstream.
A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.
The nexthop bucket dump callback always returns a positive number if
nexthop buckets were filled in the provided skb, even if the dump is
complete. This means that a dump will span at least two recvmsg() calls
as long as nexthop buckets are present. In the last recvmsg() call the
dump callback will not fill in any nexthop buckets because the previous
call indicated that the dump should restart from the last dumped nexthop
ID plus one.
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 2
# strace -e sendto,recvmsg -s 5 ip nexthop bucket
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
id 10 index 0 idle_time 6.66 nhid 1
id 10 index 1 idle_time 6.66 nhid 1
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
+++ exited with 0 +++
This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
# ip nexthop bucket
id 4294967295 index 0 idle_time 5.55 nhid 1
id 4294967295 index 1 idle_time 5.55 nhid 1
id 4294967295 index 0 idle_time 5.55 nhid 1
id 4294967295 index 1 idle_time 5.55 nhid 1
[...]
Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
# strace -e sendto,recvmsg -s 5 ip nexthop bucket
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
id 4294967295 index 0 idle_time 6.61 nhid 1
id 4294967295 index 1 idle_time 6.61 nhid 1
+++ exited with 0 +++
Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.
Add a test that fails before the fix:
# ./fib_nexthops.sh -t basic_res
[...]
TEST: Maximum nexthop ID dump [FAIL]
[...]
And passes after it:
# ./fib_nexthops.sh -t basic_res
[...]
TEST: Maximum nexthop ID dump [ OK ]
[...]
Fixes: 8a1bbabb03 ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f10d3d9df4 upstream.
rtm_dump_nexthop_bucket_nh() is used to dump nexthop buckets belonging
to a specific resilient nexthop group. The function returns a positive
return code (the skb length) upon both success and failure.
The above behavior is problematic. When a complete nexthop bucket dump
is requested, the function that walks the different nexthops treats the
non-zero return code as an error. This causes buckets belonging to
different resilient nexthop groups to be dumped using different buffers
even if they can all fit in the same buffer:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 1
# ip nexthop add id 20 group 1 type resilient buckets 1
# strace -e recvmsg -s 0 ip nexthop bucket
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
id 10 index 0 idle_time 10.27 nhid 1
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
id 20 index 0 idle_time 6.44 nhid 1
[...]
Fix by only returning a non-zero return code when an error occurred and
restarting the dump from the bucket index we failed to fill in. This
allows buckets belonging to different resilient nexthop groups to be
dumped using the same buffer:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 1
# ip nexthop add id 20 group 1 type resilient buckets 1
# strace -e recvmsg -s 0 ip nexthop bucket
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
id 10 index 0 idle_time 30.21 nhid 1
id 20 index 0 idle_time 26.7 nhid 1
[...]
While this change is more of a performance improvement change than an
actual bug fix, it is a prerequisite for a subsequent patch that does
fix a bug.
Fixes: 8a1bbabb03 ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 913f60cacd upstream.
A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.
The nexthop dump callback always returns a positive number if nexthops
were filled in the provided skb, even if the dump is complete. This
means that a dump will span at least two recvmsg() calls as long as
nexthops are present. In the last recvmsg() call the dump callback will
not fill in any nexthops because the previous call indicated that the
dump should restart from the last dumped nexthop ID plus one.
# ip nexthop add id 1 blackhole
# strace -e sendto,recvmsg -s 5 ip nexthop
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394315, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 36
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 1], {nla_len=4, nla_type=NHA_BLACKHOLE}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
id 1 blackhole
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
+++ exited with 0 +++
This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:
# ip nexthop add id $((2**32-1)) blackhole
# ip nexthop
id 4294967295 blackhole
id 4294967295 blackhole
[...]
Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOP response:
# ip nexthop add id $((2**32-1)) blackhole
# strace -e sendto,recvmsg -s 5 ip nexthop
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394080, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 56
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 4294967295], {nla_len=4, nla_type=NHA_BLACKHOLE}]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 56
id 4294967295 blackhole
+++ exited with 0 +++
Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.
Add a test that fails before the fix:
# ./fib_nexthops.sh -t basic
[...]
TEST: Maximum nexthop ID dump [FAIL]
[...]
And passes after it:
# ./fib_nexthops.sh -t basic
[...]
TEST: Maximum nexthop ID dump [ OK ]
[...]
Fixes: ab84be7e54 ("net: Initial nexthop code")
Reported-by: Petr Machata <petrm@nvidia.com>
Closes: https://lore.kernel.org/netdev/87sf91enuf.fsf@nvidia.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0168042a2 upstream.
The workaround implemented in commit 3222b5b613 ("net: enetc:
initialize RFS/RSS memories for unused ports too") is no longer
effective after commit 6fffbc7ae1 ("PCI: Honor firmware's device
disabled status"). Thus, it has introduced a regression and we see AER
errors being reported again:
$ ip link set sw2p0 up && dhclient -i sw2p0 && ip addr show sw2p0
fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode
fsl_enetc 0000:00:00.2 eno2: Link is Up - 2.5Gbps/Full - flow control rx/tx
mscc_felix 0000:00:00.5 swp2: configuring for fixed/sgmii link mode
mscc_felix 0000:00:00.5 swp2: Link is Up - 1Gbps/Full - flow control off
sja1105 spi2.2 sw2p0: configuring for phy/rgmii-id link mode
sja1105 spi2.2 sw2p0: Link is Up - 1Gbps/Full - flow control off
pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
pcieport 0000:00:1f.0: AER: can't find device of ID0000
Rob's suggestion is to reimplement the enetc driver workaround as a
PCI fixup, and to modify the PCI core to run the fixups for all PCI
functions. This change handles the first part.
We refactor the common code in enetc_psi_create() and enetc_psi_destroy(),
and use the PCI fixup only for those functions for which enetc_pf_probe()
won't get called. This avoids some work being done twice for the PFs
which are enabled.
Fixes: 6fffbc7ae1 ("PCI: Honor firmware's device disabled status")
Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac6257a3ae upstream.
When externel_lb and reset are executed together, a deadlock may
occur:
[ 3147.217009] INFO: task kworker/u321:0:7 blocked for more than 120 seconds.
[ 3147.230483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3147.238999] task:kworker/u321:0 state:D stack: 0 pid: 7 ppid: 2 flags:0x00000008
[ 3147.248045] Workqueue: hclge hclge_service_task [hclge]
[ 3147.253957] Call trace:
[ 3147.257093] __switch_to+0x7c/0xbc
[ 3147.261183] __schedule+0x338/0x6f0
[ 3147.265357] schedule+0x50/0xe0
[ 3147.269185] schedule_preempt_disabled+0x18/0x24
[ 3147.274488] __mutex_lock.constprop.0+0x1d4/0x5dc
[ 3147.279880] __mutex_lock_slowpath+0x1c/0x30
[ 3147.284839] mutex_lock+0x50/0x60
[ 3147.288841] rtnl_lock+0x20/0x2c
[ 3147.292759] hclge_reset_prepare+0x68/0x90 [hclge]
[ 3147.298239] hclge_reset_subtask+0x88/0xe0 [hclge]
[ 3147.303718] hclge_reset_service_task+0x84/0x120 [hclge]
[ 3147.309718] hclge_service_task+0x2c/0x70 [hclge]
[ 3147.315109] process_one_work+0x1d0/0x490
[ 3147.319805] worker_thread+0x158/0x3d0
[ 3147.324240] kthread+0x108/0x13c
[ 3147.328154] ret_from_fork+0x10/0x18
In externel_lb process, the hns3 driver call napi_disable()
first, then the reset happen, then the restore process of the
externel_lb will fail, and will not call napi_enable(). When
doing externel_lb again, napi_disable() will be double call,
cause a deadlock of rtnl_lock().
This patch use the HNS3_NIC_STATE_DOWN state to protect the
calling of napi_disable() and napi_enable() in externel_lb
process, just as the usage in ndo_stop() and ndo_start().
Fixes: 04b6ba1435 ("net: hns3: add support for external loopback test")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230807113452.474224-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a94c16a2fd upstream.
When the tagging protocol in current use is "ocelot-8021q" and we unbind
the driver, we see this splat:
$ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
mscc_felix 0000:00:00.5 swp0: left promiscuous mode
sja1105 spi2.0: Link is Down
DSA: tree 1 torn down
mscc_felix 0000:00:00.5 swp2: left promiscuous mode
sja1105 spi2.2: Link is Down
DSA: tree 3 torn down
fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
mscc_felix 0000:00:00.5: Link is Down
------------[ cut here ]------------
RTNL: assertion failed at net/dsa/tag_8021q.c (409)
WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0
Modules linked in:
CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771
pc : dsa_tag_8021q_unregister+0x12c/0x1a0
lr : dsa_tag_8021q_unregister+0x12c/0x1a0
Call trace:
dsa_tag_8021q_unregister+0x12c/0x1a0
felix_tag_8021q_teardown+0x130/0x150
felix_teardown+0x3c/0xd8
dsa_tree_teardown_switches+0xbc/0xe0
dsa_unregister_switch+0x168/0x260
felix_pci_remove+0x30/0x60
pci_device_remove+0x4c/0x100
device_release_driver_internal+0x188/0x288
device_links_unbind_consumers+0xfc/0x138
device_release_driver_internal+0xe0/0x288
device_driver_detach+0x24/0x38
unbind_store+0xd8/0x108
drv_attr_store+0x30/0x50
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
RTNL: assertion failed at net/8021q/vlan_core.c (376)
WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0
CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771
pc : vlan_vid_del+0x1b8/0x1f0
lr : vlan_vid_del+0x1b8/0x1f0
dsa_tag_8021q_unregister+0x8c/0x1a0
felix_tag_8021q_teardown+0x130/0x150
felix_teardown+0x3c/0xd8
dsa_tree_teardown_switches+0xbc/0xe0
dsa_unregister_switch+0x168/0x260
felix_pci_remove+0x30/0x60
pci_device_remove+0x4c/0x100
device_release_driver_internal+0x188/0x288
device_links_unbind_consumers+0xfc/0x138
device_release_driver_internal+0xe0/0x288
device_driver_detach+0x24/0x38
unbind_store+0xd8/0x108
drv_attr_store+0x30/0x50
DSA: tree 0 torn down
This was somewhat not so easy to spot, because "ocelot-8021q" is not the
default tagging protocol, and thus, not everyone who tests the unbinding
path may have switched to it beforehand. The default
felix_tag_npi_teardown() does not require rtnl_lock() to be held.
Fixes: 7c83a7c539 ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230803134253.2711124-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b47808f22 upstream.
TLS records end with a 16B tag. For TLS device offload we only
need to make space for this tag in the stream, the device will
generate and replace it with the actual calculated tag.
Long time ago the code would just re-reference the head frag
which mostly worked but was suboptimal because it prevented TCP
from combining the record into a single skb frag. I'm not sure
if it was correct as the first frag may be shorter than the tag.
The commit under fixes tried to replace that with using the page
frag and if the allocation failed rolling back the data, if record
was long enough. It achieves better fragment coalescing but is
also buggy.
We don't roll back the iterator, so unless we're at the end of
send we'll skip the data we designated as tag and start the
next record as if the rollback never happened.
There's also the possibility that the record was constructed
with MSG_MORE and the data came from a different syscall and
we already told the user space that we "got it".
Allocate a single dummy page and use it as fallback.
Found by code inspection, and proven by forcing allocation
failures.
Fixes: e7b159a48b ("net/tls: remove the record tail optimization")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 186b169cf1 upstream.
Fixing the ODP registration flow to set the iova correctly.
The calculation in ib_umem_num_dma_blocks() function assumes the iova of
the umem is set correctly.
When iova is not set, the calculation in ib_umem_num_dma_blocks() is
equivalent to length/page_size, which is true only when memory is aligned.
For unaligned memory, iova must be set for the ALIGN() in the
ib_umem_num_dma_blocks() to take effect and return a correct value.
mlx5_ib uses ib_umem_num_dma_blocks() to decide the mkey size to use for
the MR. Without this fix, when registering unaligned ODP MR, a wrong
size mkey might be chosen and this might cause the UMR to fail.
UMR would fail over insufficient size to update the mkey translation:
infiniband mlx5_0: dump_cqe:273:(pid 0): dump error cqe
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 0f 00 78 06 25 00 00 58 00 da ac d2
infiniband mlx5_0: mlx5_ib_post_send_wait:806:(pid 20311): reg umr
failed (6)
infiniband mlx5_0: pagefault_real_mr:661:(pid 20311): Failed to update
mkey page tables
Fixes: f0093fb1a7 ("RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr")
Fixes: a665aca89a ("RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()")
Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/3d4be7ca2155bf239dd8c00a2d25974a92c26ab8.1689757344.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fb1d8eb23 upstream.
Add fdir_fltr_lock locking in unprotected places.
The change in iavf_fdir_is_dup_fltr adds a spinlock around a loop which
iterates over all filters and looks for a duplicate. The filter can be
removed from list and freed from memory at the same time it's being
compared. All other places where filters are deleted are already
protected with spinlock.
The remaining changes protect adapter->fdir_active_fltr variable so now
all its uses are under a spinlock.
Fixes: 527691bf06 ("iavf: Support IPv4 Flow Director filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807205011.3129224-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1c936e9af upstream.
In case rhashtable_lookup_insert_fast() fails inside vxlan_vni_add(), the
allocated percpu vni stats are not freed on the error path.
Introduce vxlan_vni_free() which would work as a nice wrapper to free
vxlan_vni_node resources properly.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4095e0e132 ("drivers: vxlan: vnifilter: per vni stats")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01f4fd2708 upstream.
BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
following testcase:
# ip netns add ns1
# ip netns exec ns1 ip link add bond0 type bond mode 0
# ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
# ip netns exec ns1 ip link set bond_slave_1 master bond0
# ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link set bond_slave_1 nomaster
# ip netns del ns1
The logical analysis of the problem is as follows:
1. create ETH_P_8021AD protocol vlan10 for bond_slave_1:
register_vlan_dev()
vlan_vid_add()
vlan_info_alloc()
__vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1
2. create ETH_P_8021AD protocol bond0_vlan10 for bond0:
register_vlan_dev()
vlan_vid_add()
__vlan_vid_add()
vlan_add_rx_filter_info()
if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER
return 0;
if (netif_device_present(dev))
return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called
// The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.
3. detach bond_slave_1 from bond0:
__bond_release_one()
vlan_vids_del_by_dev()
list_for_each_entry(vid_info, &vlan_info->vid_list, list)
vlan_vid_del(dev, vid_info->proto, vid_info->vid);
// bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted.
// bond_slave_1->vlan_info will be assigned NULL.
4. delete vlan10 during delete ns1:
default_device_exit_batch()
dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10
vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1
BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!
Add S-VLAN tag related features support to bond driver. So the bond driver
will always propagate the VLAN info to its slaves.
Fixes: 8ad227ff89 ("net: vlan: add 802.1ad support")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85c2c79a07 upstream.
Fix a refcount underflow problem reported by syzbot that can happen
when a system is running out of memory. If xp_alloc_tx_descs() fails,
and it can only fail due to not having enough memory, then the error
path is triggered. In this error path, the refcount of the pool is
decremented as it has incremented before. However, the reference to
the pool in the socket was not nulled. This means that when the socket
is closed later, the socket teardown logic will think that there is a
pool attached to the socket and try to decrease the refcount again,
leading to a refcount underflow.
I chose this fix as it involved adding just a single line. Another
option would have been to move xp_get_pool() and the assignment of
xs->pool to after the if-statement and using xs_umem->pool instead of
xs->pool in the whole if-statement resulting in somewhat simpler code,
but this would have led to much more churn in the code base perhaps
making it harder to backport.
Fixes: ba3beec2ec ("xsk: Fix possible crash when multiple sockets are created")
Reported-by: syzbot+8ada0057e69293a05fd4@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230809142843.13944-1-magnus.karlsson@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a7ac3d205 upstream.
If we try to emit an icmp error in response to a nonliner skb, we get
BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220
Read of size 4 at addr ffff88811c50db00 by task iperf3/1691
CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309
[..]
kasan_report+0x105/0x140
ip_compute_csum+0x134/0x220
iptunnel_pmtud_build_icmp+0x554/0x1020
skb_tunnel_check_pmtu+0x513/0xb80
vxlan_xmit_one+0x139e/0x2ef0
vxlan_xmit+0x1867/0x2760
dev_hard_start_xmit+0x1ee/0x4f0
br_dev_queue_push_xmit+0x4d1/0x660
[..]
ip_compute_csum() cannot deal with nonlinear skbs, so avoid it.
After this change, splat is gone and iperf3 is no longer stuck.
Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a8c251cff upstream.
The blamed commit has broken probing on
arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi when &enetc_port0
(PCI function 0) has status = "disabled".
Background: pci_scan_slot() has logic to say that if the function 0 of a
device is absent, the entire device is absent and we can skip the other
functions entirely. Traditionally, this has meant that
pci_bus_read_dev_vendor_id() returns an error code for that function.
However, since the blamed commit, there is an extra confounding
condition: function 0 of the device exists and has a valid vendor id,
but it is disabled in the device tree. In that case, pci_scan_slot()
would incorrectly skip the entire device instead of just that function.
In the case of NXP LS1028A, status = "disabled" does not mean that the
PCI function's config space is not available for reading. It is, but the
Ethernet port is just not functionally useful with a particular SerDes
protocol configuration (0x9999) due to pinmuxing constraints of the Soc.
So, pci_scan_slot() skips all other functions on the ENETC ECAM
(enetc_port1, enetc_port2, enetc_mdio_pf3 etc) when just enetc_port0 had
to not be probed.
There is an additional regression introduced by the change, caused by
its fundamental premise. The enetc driver needs to run code for all PCI
functions, regardless of whether they're enabled or not in the device
tree. That is no longer possible if the driver's probe function is no
longer called. But Rob recommends that we move the of_device_is_available()
detection to dev->match_driver, and this makes the PCI fixups still run
on all functions, while just probing drivers for those functions that
are enabled. So, a separate change in the enetc driver will have to move
the workarounds to a PCI fixup.
Fixes: 6fffbc7ae1 ("PCI: Honor firmware's device disabled status")
Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30c3c4a449 upstream.
Tuning of the effective buffer size through setsockopts was working for
SMC traffic only but not for TCP fall-back connections even before
commit 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock and
make them tunable"). That change made it apparent that TCP fall-back
connections would use net.smc.[rw]mem as buffer size instead of
net.ipv4_tcp_[rw]mem.
Amend the code that copies attributes between the (TCP) clcsock and the
SMC socket and adjust buffer sizes appropriately:
- Copy over sk_userlocks so that both sockets agree on whether tuning
via setsockopt is active.
- When falling back to TCP use sk_sndbuf or sk_rcvbuf as specified with
setsockopt. Otherwise, use the sysctl value for TCP/IPv4.
- Likewise, use either values from setsockopt or from sysctl for SMC
(duplicated) on successful SMC connect.
In smc_tcp_listen_work() drop the explicit copy of buffer sizes as that
is taken care of by the attribute copy.
Fixes: 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock and make them tunable")
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 833bac7ec3 upstream.
Commit 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock
and make them tunable") introduced the net.smc.rmem and net.smc.wmem
sysctls to specify the size of buffers to be used for SMC type
connections. This created a regression for users that specified the
buffer size via setsockopt() as the effective buffer size was now
doubled.
Re-introduce the division by 2 in the SMC buffer create code and level
this out by duplicating the net.smc.[rw]mem values used for initializing
sk_rcvbuf/sk_sndbuf at socket creation time. This gives users of both
methods (setsockopt or sysctl) the effective buffer size that they
expect.
Initialize net.smc.[rw]mem from its own constant of 64kB, respectively.
Internal performance tests show that this value is a good compromise
between throughput/latency and memory consumption. Also, this decouples
it from any tuning that was done to net.ipv4.tcp_[rw]mem[1] before the
module for SMC protocol was loaded. Check that no more than INT_MAX / 2
is assigned to net.smc.[rw]mem, in order to avoid any overflow condition
when that is doubled for use in sk_sndbuf or sk_rcvbuf.
While at it, drop the confusing sk_buf_size variable from
__smc_buf_create and name "compressed" buffer size variables more
consistently.
Background:
Before the commit mentioned above, SMC's buffer allocator in
__smc_buf_create() always used half of the sockets' sk_rcvbuf/sk_sndbuf
value as initial value to search for appropriate buffers. If the search
resorted to using a bigger buffer when all buffers of the specified
size were busy, the duplicate of the used effective buffer size is
stored back to sk_rcvbuf/sk_sndbuf.
When available, buffers of exactly the size that a user had specified as
input to setsockopt() were used, despite setsockopt()'s documentation in
"man 7 socket" talking of a mandatory duplication:
[...]
SO_SNDBUF
Sets or gets the maximum socket send buffer in bytes.
The kernel doubles this value (to allow space for book‐
keeping overhead) when it is set using setsockopt(2),
and this doubled value is returned by getsockopt(2).
The default value is set by the
/proc/sys/net/core/wmem_default file and the maximum
allowed value is set by the /proc/sys/net/core/wmem_max
file. The minimum (doubled) value for this option is
2048.
[...]
Fixes: 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock and make them tunable")
Co-developed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52417a95ff upstream.
ionic_start_queues_reconfig returns an error code if txrx_init fails.
Handle this error code in the relevant places.
This fixes a corner case where the device could get left in a detached
state if the CMB reconfig fails and the attempt to clean up the mess
also fails. Note that calling netif_device_attach when the netdev is
already attached does not lead to unexpected behavior.
Change goto name "errout" to "err_out" to maintain consistency across
goto statements.
Fixes: 40bc471dc7 ("ionic: add tx/rx-push support with device Component Memory Buffers")
Fixes: 6f7d6f0fd7 ("ionic: pull reset_queues into tx_timeout handler")
Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32d0a49d36 upstream.
syzbot/KCSAN reported data-races in macsec whenever dev->stats fields
are updated.
It appears all of these updates can happen from multiple cpus.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1696ec8654 upstream.
When booting a kernel with CONFIG_MISDN_DSP=y and CONFIG_CFI_CLANG=y,
there is a failure when dsp_cmx_send() is called indirectly from
call_timer_fn():
[ 0.371412] CFI failure at call_timer_fn+0x2f/0x150 (target: dsp_cmx_send+0x0/0x530; expected type: 0x92ada1e9)
The function pointer prototype that call_timer_fn() expects is
void (*fn)(struct timer_list *)
whereas dsp_cmx_send() has a parameter type of 'void *', which causes
the control flow integrity checks to fail because the parameter types do
not match.
Change dsp_cmx_send()'s parameter type to be 'struct timer_list' to
match the expected prototype. The argument is unused anyways, so this
has no functional change, aside from avoiding the CFI failure.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202308020936.58787e6c-oliver.sang@intel.com
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Fixes: e313ac12eb ("mISDN: Convert timers to use timer_setup()")
Link: https://lore.kernel.org/r/20230802-fix-dsp_cmx_send-cfi-failure-v1-1-2f2e79b0178d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56b930dcd8 upstream.
Add a 200ms delay after sending a ctrl report to Quadro,
Octo, D5 Next and Aquaero to give them enough time to
process the request and save the data to memory. Otherwise,
under heavier userspace loads where multiple sysfs entries
are usually set in quick succession, a new ctrl report could
be requested from the device while it's still processing the
previous one and fail with -EPIPE. The delay is only applied
if two ctrl report operations are near each other in time.
Reported by a user on Github [1] and tested by both of us.
[1] https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/issues/82
Fixes: 752b927951 ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Octo")
Signed-off-by: Aleksa Savic <savicaleksa83@gmail.com>
Link: https://lore.kernel.org/r/20230807172004.456968-1-savicaleksa83@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 809e4dc71a upstream.
strp_done is only called when psock->progs.stream_parser is not NULL,
but stream_parser was set to NULL by sk_psock_stop_strp(), called
by sk_psock_drop() earlier. So, strp_done can never be called.
Introduce SK_PSOCK_RX_ENABLED to mark whether there is strp on psock.
Change the condition for calling strp_done from judging whether
stream_parser is set to judging whether this flag is set. This flag is
only set once when strp_init() succeeds, and will never be cleared later.
Fixes: c0d95d3380 ("bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230804073740.194770-3-xukuohai@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d14eea09ed upstream.
Syzkaller reported the following issue:
=======================================
Too BIG xdp->frame_sz = 131072
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
...
Call Trace:
<TASK>
bpf_prog_4add87e5301a4105+0x1a/0x1c
__bpf_prog_run include/linux/filter.h:600 [inline]
bpf_prog_run_xdp include/linux/filter.h:775 [inline]
bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
netif_receive_generic_xdp net/core/dev.c:4807 [inline]
do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
call_write_iter include/linux/fs.h:1871 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x650/0xe40 fs/read_write.c:584
ksys_write+0x12f/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe
("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But Jesper
Dangaard Brouer <jbrouer@redhat.com> noted that after introducing the
xdp_init_buff() which all XDP driver use - it's safe to remove this
check. The original intend was to catch cases where XDP drivers have
not been updated to use xdp.frame_sz, but that is not longer a concern
(since xdp_init_buff).
Running the initial syzkaller repro it was discovered that the
contiguous physical memory allocation is used for both xdp paths in
tun_get_user(), e.g. tun_build_skb() and tun_alloc_skb(). It was also
stated by Jesper Dangaard Brouer <jbrouer@redhat.com> that XDP can
work on higher order pages, as long as this is contiguous physical
memory (e.g. a page).
Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/
Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a
Link: https://lore.kernel.org/all/20230725155403.796-1-andrew.kanner@gmail.com/T/
Fixes: 43b5169d83 ("net, xdp: Introduce xdp_init_buff utility routine")
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20230803190316.2380231-1-andrew.kanner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e8670610b upstream.
The test relies on 'nc' being the netcat version from the nmap project.
While this seems to be the case on Fedora, it is not the case on Ubuntu,
resulting in failures such as [1].
Fix by explicitly using the 'ncat' utility from the nmap project and the
skip the test in case it is not installed.
[1]
# timeout set to 0
# selftests: net/forwarding: tc_actions.sh
# TEST: gact drop and ok (skip_hw) [ OK ]
# TEST: mirred egress flower redirect (skip_hw) [ OK ]
# TEST: mirred egress flower mirror (skip_hw) [ OK ]
# TEST: mirred egress matchall mirror (skip_hw) [ OK ]
# TEST: mirred_egress_to_ingress (skip_hw) [ OK ]
# nc: invalid option -- '-'
# usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl]
# [-m minttl] [-O length] [-P proxy_username] [-p source_port]
# [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit]
# [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]]
# [destination] [port]
# nc: invalid option -- '-'
# usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl]
# [-m minttl] [-O length] [-P proxy_username] [-p source_port]
# [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit]
# [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]]
# [destination] [port]
# TEST: mirred_egress_to_ingress_tcp (skip_hw) [FAIL]
# server output check failed
# INFO: Could not test offloaded functionality
not ok 80 selftests: net/forwarding: tc_actions.sh # exit=1
Fixes: ca22da2fbd ("act_mirred: use the backlog for nested calls to mirred ingress")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.unizg.hr/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230808141503.4060661-12-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0529883ad1 upstream.
The default timeout for selftests is 45 seconds, but it is not enough
for forwarding selftests which can takes minutes to finish depending on
the number of tests cases:
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests
TAP version 13
1..102
# timeout set to 45
# selftests: net/forwarding: bridge_igmp.sh
# TEST: IGMPv2 report 239.10.10.10 [ OK ]
# TEST: IGMPv2 leave 239.10.10.10 [ OK ]
# TEST: IGMPv3 report 239.10.10.10 is_include [ OK ]
# TEST: IGMPv3 report 239.10.10.10 include -> allow [ OK ]
#
not ok 1 selftests: net/forwarding: bridge_igmp.sh # TIMEOUT 45 seconds
Fix by switching off the timeout and setting it to 0. A similar change
was done for BPF selftests in commit 6fc5916cc2 ("selftests: bpf:
Switch off timeout").
Fixes: 81573b18f2 ("selftests/net/forwarding: add Makefile to install tests")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/8d149f8c-818e-d141-a0ce-a6bae606bc22@alu.unizg.hr/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230808141503.4060661-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d72c83b1e4 upstream.
As explained in [1], the forwarding selftests are meant to be run with
either physical loopbacks or veth pairs. The interfaces are expected to
be specified in a user-provided forwarding.config file or as command
line arguments. By default, this file is not present and the tests fail:
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests
[...]
TAP version 13
1..102
# timeout set to 45
# selftests: net/forwarding: bridge_igmp.sh
# Command line is not complete. Try option "help"
# Failed to create netif
not ok 1 selftests: net/forwarding: bridge_igmp.sh # exit=1
[...]
Fix by skipping a test if interfaces are not provided either via the
configuration file or command line arguments.
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests
[...]
TAP version 13
1..102
# timeout set to 45
# selftests: net/forwarding: bridge_igmp.sh
# SKIP: Cannot create interface. Name not specified
ok 1 selftests: net/forwarding: bridge_igmp.sh # SKIP
[1] tools/testing/selftests/net/forwarding/README
Fixes: 81573b18f2 ("selftests/net/forwarding: add Makefile to install tests")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/856d454e-f83c-20cf-e166-6dc06cbc1543@alu.unizg.hr/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230808141503.4060661-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5ad9aae13 upstream.
Commit 3bcbc20942 ("selftests/rseq: Play nice with binaries statically
linked against glibc 2.35+") which is now in Linus' tree introduced uses
of __weak but did nothing to ensure that a definition is provided for it
resulting in build failures for the rseq tests:
rseq.c:41:1: error: unknown type name '__weak'
__weak ptrdiff_t __rseq_offset;
^
rseq.c:41:17: error: expected ';' after top level declarator
__weak ptrdiff_t __rseq_offset;
^
;
rseq.c:42:1: error: unknown type name '__weak'
__weak unsigned int __rseq_size;
^
rseq.c:43:1: error: unknown type name '__weak'
__weak unsigned int __rseq_flags;
Fix this by using the definition from tools/include compiler.h.
Fixes: 3bcbc20942 ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
Signed-off-by: Mark Brown <broonie@kernel.org>
Message-Id: <20230804-kselftest-rseq-build-v1-1-015830b66aa9@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90f0074cd9 upstream.
BPF CI has reported the following failure:
Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir
Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir
./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected
vsock_unix_redir_connectible:FAIL:1506
./test_progs:vsock_unix_redir_connectible:1506: ingress: write: Transport endpoint is not connected
vsock_unix_redir_connectible:FAIL:1506
./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected
vsock_unix_redir_connectible:FAIL:1506
./test_progs:vsock_unix_redir_connectible:1514: ingress: recv() err, errno=11
vsock_unix_redir_connectible:FAIL:1514
./test_progs:vsock_unix_redir_connectible:1518: ingress: vsock socket map failed, a != b
vsock_unix_redir_connectible:FAIL:1518
./test_progs:vsock_unix_redir_connectible:1525: ingress: want pass count 1, have 0
It’s because the recv(... MSG_DONTWAIT) syscall in the test case is
called before the queued work sk_psock_backlog() in the kernel finishes
executing. So the data to be read is still queued in psock->ingress_skb
and cannot be read by the user program. Therefore, the non-blocking
recv() reads nothing and reports an EAGAIN error.
So replace recv(... MSG_DONTWAIT) with xrecv_nonblock(), which calls
select() to wait for data to be readable or timeout before calls recv().
Fixes: d61bd8c1fd ("selftests/bpf: add a test case for vsock sockmap")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20230804073740.194770-4-xukuohai@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c718ca0e99 upstream.
When running in protected mode, the hyp stub is disabled after pKVM is
initialized, meaning the host cannot enable/disable the hyp at
runtime. As such, kvm_arm_hardware_enabled is always 1 after
initialization, and kvm_arch_hardware_enable() never enables the vgic
maintenance irq or timer irqs.
Unconditionally enable/disable the vgic + timer irqs in the respective
calls, instead relying on the percpu bookkeeping in the generic code
to keep track of which cpus have the interrupts unmasked.
Fixes: 466d27e48d ("KVM: arm64: Simplify the CPUHP logic")
Reported-by: Oliver Upton <oliver.upton@linux.dev>
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20230719175400.647154-1-rananta@google.com
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2a6996990 upstream.
Commit 813665564b ("iio: core: Convert to use firmware node handle
instead of OF node") switched the kind of nodes to use for label
retrieval in device registration. Probably an unwanted change in that
commit was that if the device has no parent then NULL pointer is
accessed. This is what happens in the stock IIO dummy driver when a
new entry is created in configfs:
# mkdir /sys/kernel/config/iio/devices/dummy/foo
BUG: kernel NULL pointer dereference, address: ...
...
Call Trace:
__iio_device_register
iio_dummy_probe
Since there seems to be no reason to make a parent device of an IIO
dummy device mandatory, let’s prevent the invalid memory access in
__iio_device_register when the parent device is NULL. With this
change, the IIO dummy driver works fine with configfs.
Fixes: 813665564b ("iio: core: Convert to use firmware node handle instead of OF node")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Link: https://lore.kernel.org/r/20230719083208.88149-1-mzamazal@redhat.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c92db30304 upstream.
Set on the NFT_SET_ELEM_DEAD_BIT flag on this element, instead of
performing element removal which might race with an ongoing transaction.
Enable gc when dynamic flag is set on since dynset deletion requires
garbage collection after this patch.
Fixes: d0a8d877da ("netfilter: nft_dynset: support for element deletion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6c383b8c3 upstream.
Use the GC transaction API to replace the old and buggy gc API and the
busy mark approach.
No set elements are removed from async garbage collection anymore,
instead the _DEAD bit is set on so the set element is not visible from
lookup path anymore. Async GC enqueues transaction work that might be
aborted and retried later.
rbtree and pipapo set backends does not set on the _DEAD bit from the
sync GC path since this runs in control plane path where mutex is held.
In this case, set elements are deactivated, removed and then released
via RCU callback, sync GC never fails.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f68718b34 upstream.
The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.
The major caveat here is that the nft transaction mutex is not held.
This causes a race between control plane and GC when they attempt to
delete the same element.
We cannot grab the netlink mutex from the work queue, because the
control plane has to wait for the GC work queue in case the set is to be
removed, so we get following deadlock:
cpu 1 cpu2
GC work transaction comes in , lock nft mutex
`acquire nft mutex // BLOCKS
transaction asks to remove the set
set destruction calls cancel_work_sync()
cancel_work_sync will now block forever, because it is waiting for the
mutex the caller already owns.
This patch adds a new API that deals with garbage collection in two
steps:
1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT
so they are not visible via lookup. Annotate current GC sequence in
the GC transaction. Enqueue GC transaction work as soon as it is
full. If ruleset is updated, then GC transaction is aborted and
retried later.
2) GC work grabs the mutex. If GC sequence has changed then this GC
transaction lost race with control plane, abort it as it contains
stale references to objects and let GC try again later. If the
ruleset is intact, then this GC transaction deactivates and removes
the elements and it uses call_rcu() to destroy elements.
Note that no elements are removed from GC lockless path, the _DEAD bit
is set and pointers are collected. GC catchall does not remove the
elements anymore too. There is a new set->dead flag that is set on to
abort the GC transaction to deal with set->ops->destroy() path which
removes the remaining elements in the set from commit_release, where no
mutex is held.
To deal with GC when mutex is held, which allows safe deactivate and
removal, add sync GC API which releases the set element object via
call_rcu(). This is used by rbtree and pipapo backends which also
perform garbage collection from control plane path.
Since element removal from sets can happen from control plane and
element garbage collection/timeout, it is necessary to keep the set
structure alive until all elements have been deactivated and destroyed.
We cannot do a cancel_work_sync or flush_work in nft_set_destroy because
its called with the transaction mutex held, but the aforementioned async
work queue might be blocked on the very mutex that nft_set_destroy()
callchain is sitting on.
This gives us the choice of ABBA deadlock or UaF.
To avoid both, add set->refs refcount_t member. The GC API can then
increment the set refcount and release it once the elements have been
free'd.
Set backends are adapted to use the GC transaction API in a follow up
patch entitled:
("netfilter: nf_tables: use gc transaction API in set backends")
This is joint work with Florian Westphal.
Fixes: cfed7e1b1f ("netfilter: nf_tables: add set garbage collection helpers")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24138933b9 upstream.
There is an asymmetry between commit/abort and preparation phase if the
following conditions are met:
1. set is a verdict map ("1.2.3.4 : jump foo")
2. timeouts are enabled
In this case, following sequence is problematic:
1. element E in set S refers to chain C
2. userspace requests removal of set S
3. kernel does a set walk to decrement chain->use count for all elements
from preparation phase
4. kernel does another set walk to remove elements from the commit phase
(or another walk to do a chain->use increment for all elements from
abort phase)
If E has already expired in 1), it will be ignored during list walk, so its use count
won't have been changed.
Then, when set is culled, ->destroy callback will zap the element via
nf_tables_set_elem_destroy(), but this function is only safe for
elements that have been deactivated earlier from the preparation phase:
lack of earlier deactivate removes the element but leaks the chain use
count, which results in a WARN splat when the chain gets removed later,
plus a leak of the nft_chain structure.
Update pipapo_get() not to skip expired elements, otherwise flush
command reports bogus ENOENT errors.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bee6cf1a80 upstream.
Tao Liu reported a boot hang on an Intel Atom machine due to an unmapped
EFI config table. The reason being that the CC blob which contains the
CPUID page for AMD SNP guests is parsed for before even checking
whether the machine runs on AMD hardware.
Usually that's not a problem on !AMD hw - it simply won't find the CC
blob's GUID and return. However, if any parts of the config table
pointers array is not mapped, the kernel will #PF very early in the
decompressor stage without any opportunity to recover.
Therefore, do a superficial CPUID check before poking for the CC blob.
This will fix the current issue on real hardware. It would also work as
a guest on a non-lying hypervisor.
For the lying hypervisor, the check is done again, *after* parsing the
CC blob as the real CPUID page will be present then.
Clear the #VC handler in case SEV-{ES,SNP} hasn't been detected, as
a precaution.
Fixes: c01fce9cef ("x86/compressed: Add SEV-SNP feature detection/setup")
Reported-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Tao Liu <ltao@redhat.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230601072043.24439-1-ltao@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b8b1aa90c upstream.
Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR
VMAs are placed above the 47-bit border:
8000001a9000-8000001ad000 r--p 00000000 00:00 0 [vvar]
8000001ad000-8000001af000 r-xp 00000000 00:00 0 [vdso]
This might confuse users who are not aware of 5-level paging and expect
all userspace addresses to be under the 47-bit border.
So far problem has only been triggered with ASLR disabled, although it
may also occur with ASLR enabled if the layout is randomized in a just
right way.
The problem happens due to custom placement for the VMAs in the VDSO
code: vdso_addr() tries to place them above the stack and checks the
result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to
the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW
instead.
Fixes: b569bab78d ("x86/mm: Prepare to expose larger address space to userspace")
Reported-by: Yingcong Wu <yingcong.wu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a5ccd61cf upstream.
When connecting to some DisplayPort partners, the initial status update
after entering DisplayPort Alt Mode notifies that the DFP_D/UFP_D is not in
the connected state. This leads to sending a configure message that keeps
the device in USB mode. The port partner then sets DFP_D/UFP_D to the
connected state and HPD to high in the same Attention message. Currently,
the HPD signal is dropped in order to handle configuration.
This patch saves changes to the HPD signal when the device chooses to
configure during dp_altmode_status_update, and invokes sysfs_notify if
necessary for HPD after configuring.
Fixes: 0e3bb7d689 ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230726020903.1409072-1-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4270d2b484 upstream.
Do not transition to SNK_UNATTACHED state when receiving vsafe0v event
while in SNK_HARD_RESET_WAIT_VBUS. Ignore VBUS off events as well as
in some platforms VBUS off can be signalled more than once.
[143515.364753] Requesting mux state 1, usb-role 2, orientation 2
[143515.365520] pending state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_SINK_ON @ 650 ms [rev3 HARD_RESET]
[143515.632281] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_HARD_RESET_SINK_OFF, polarity 1, disconnected]
[143515.637214] VBUS on
[143515.664985] VBUS off
[143515.664992] state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_WAIT_VBUS [rev3 HARD_RESET]
[143515.665564] VBUS VSAFE0V
[143515.665566] state change SNK_HARD_RESET_WAIT_VBUS -> SNK_UNATTACHED [rev3 HARD_RESET]
Fixes: 28b43d3d74 ("usb: typec: tcpm: Introduce vsafe0v for vbus")
Cc: <stable@vger.kernel.org>
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230712085722.1414743-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65dadb2bee upstream.
Avichal Rakesh reported a kernel panic that occurred when the UVC
gadget driver was removed from a gadget's configuration. The panic
involves a somewhat complicated interaction between the kernel driver
and a userspace component (as described in the Link tag below), but
the analysis did make one thing clear: The Gadget core should
accomodate gadget drivers calling usb_gadget_deactivate() as part of
their unbind procedure.
Currently this doesn't work. gadget_unbind_driver() calls
driver->unbind() while holding the udc->connect_lock mutex, and
usb_gadget_deactivate() attempts to acquire that mutex, which will
result in a deadlock.
The simple fix is for gadget_unbind_driver() to release the mutex when
invoking the ->unbind() callback. There is no particular reason for
it to be holding the mutex at that time, and the mutex isn't held
while the ->bind() callback is invoked. So we'll drop the mutex
before performing the unbind callback and reacquire it afterward.
We'll also add a couple of comments to usb_gadget_activate() and
usb_gadget_deactivate(). Because they run in process context they
must not be called from a gadget driver's ->disconnect() callback,
which (according to the kerneldoc for struct usb_gadget_driver in
include/linux/usb/gadget.h) may run in interrupt context. This may
help prevent similar bugs from arising in the future.
Reported-and-tested-by: Avichal Rakesh <arakesh@google.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: 286d9975a8 ("usb: gadget: udc: core: Prevent soft_connect_store() race")
Link: https://lore.kernel.org/linux-usb/4d7aa3f4-22d9-9f5a-3d70-1bd7148ff4ba@google.com/
Cc: Badhri Jagan Sridharan <badhri@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/48b2f1f1-0639-46bf-bbfc-98cb05a24914@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ddaa6a274 upstream.
If dwc3 is runtime suspended we defer processing the event buffer
until resume, by setting the pending_events flag. Set this flag before
triggering resume to avoid race with the runtime resume callback.
While handling the pending events, in addition to checking the event
buffer we also need to process it. Handle this by explicitly calling
dwc3_thread_interrupt(). Also balance the runtime pm get() operation
that triggered this processing.
Cc: stable@vger.kernel.org
Fixes: fc8bb91bc8 ("usb: dwc3: implement runtime PM")
Signed-off-by: Elson Roy Serrao <quic_eserrao@quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230801192658.19275-1-quic_eserrao@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6ff6e7a9d upstream.
Syzbot got KMSAN to complain about access to an uninitialized value in
the alauda subdriver of usb-storage:
BUG: KMSAN: uninit-value in alauda_transport+0x462/0x57f0
drivers/usb/storage/alauda.c:1137
CPU: 0 PID: 12279 Comm: usb-storage Not tainted 5.3.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x191/0x1f0 lib/dump_stack.c:113
kmsan_report+0x13a/0x2b0 mm/kmsan/kmsan_report.c:108
__msan_warning+0x73/0xe0 mm/kmsan/kmsan_instr.c:250
alauda_check_media+0x344/0x3310 drivers/usb/storage/alauda.c:460
The problem is that alauda_check_media() doesn't verify that its USB
transfer succeeded before trying to use the received data. What
should happen if the transfer fails isn't entirely clear, but a
reasonably conservative approach is to pretend that no media is
present.
A similar problem exists in a usb_stor_dbg() call in
alauda_get_media_status(). In this case, when an error occurs the
call is redundant, because usb_stor_ctrl_transfer() already will print
a debugging message.
Finally, unrelated to the uninitialized memory access, is the fact
that alauda_check_media() performs DMA to a buffer on the stack.
Fortunately usb-storage provides a general purpose DMA-able buffer for
uses like this. We'll use it instead.
Reported-and-tested-by: syzbot+e7d46eb426883fb97efd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000007d25ff059457342d@google.com/T/
Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: e80b0fade0 ("[PATCH] USB Storage: add alauda support")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/693d5d5e-f09b-42d0-8ed9-1f96cd30bcce@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 596a5123cc upstream.
The memory allocated in tb_queue_dp_bandwidth_request() needs to be
released once the request is handled to avoid leaking it.
Fixes: 6ce3563520 ("thunderbolt: Add support for DisplayPort bandwidth allocation mode")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a41e19cc0d upstream.
The affected lines were resulting in a NULL pointer dereference on our
platform because the device tree contained the following list of
compatible strings:
power-sensor@40 {
compatible = "ti,ina232", "ti,ina231";
...
};
Since the driver doesn't declare a compatible string "ti,ina232", the OF
matching succeeds on "ti,ina231". But the I2C device ID info is
populated via the first compatible string, cf. modalias population in
of_i2c_get_board_info(). Since there is no "ina232" entry in the legacy
I2C device ID table either, the struct i2c_device_id *id pointer in the
probe function is NULL.
Fix this by using the already populated type variable instead, which
points to the proper driver data. Since the name is also wanted, add a
generic one to the ina2xx_config table.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Fixes: c43a102e67 ("iio: ina2xx: add support for TI INA2xx Power Monitors")
Link: https://lore.kernel.org/r/20230619141239.2257392-1-alvin@pqrs.dk
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bc471b6c3 upstream.
AC excitation enable feature exposed to user on AD7192, allowing a bit
which should be 0 to be set. This feature is specific only to AD7195. AC
excitation attribute moved accordingly.
In the AD7195 documentation, the AC excitation enable bit is on position
22 in the Configuration register. ACX macro changed to match correct
register and bit.
Note that the fix tag is for the commit that moved the driver out of
staging.
Fixes: b581f748cc ("staging: iio: adc: ad7192: move out of staging")
Signed-off-by: Alisa Roman <alisa.roman@analog.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20230614155242.160296-1-alisa.roman@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 507397d19b upstream.
The regulator_get_voltage() function returns negative error codes.
This function saves it to an unsigned int and then does some range
checking and, since the error code falls outside the correct range,
it returns -EINVAL.
Beyond the messiness, this is bad because the regulator_get_voltage()
function can return -EPROBE_DEFER and it's important to propagate that
back properly so it can be handled.
Fixes: da35a7b526 ("iio: frequency: admv1013: add support for ADMV1013")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/ce75aac3-2aba-4435-8419-02e59fdd862b@moroto.mountain
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f29623e4a5 upstream.
If unpoison_memory() fails to clear page hwpoisoned flag, return value ret
is expected to be -EBUSY. But when get_hwpoison_page() returns 1 and
fails to clear page hwpoisoned flag due to races, return value will be
unexpected 1 leading to users being confused. And there's a code smell
that the variable "ret" is used not only to save the return value of
unpoison_memory(), but also the return value from get_hwpoison_page().
Make a further cleanup by using another auto-variable solely to save the
return value of get_hwpoison_page() as suggested by Naoya.
Link: https://lkml.kernel.org/r/20230727115643.639741-3-linmiaohe@huawei.com
Fixes: bf181c5825 ("mm/hwpoison: fix unpoison_memory()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65294de30c upstream.
A missing break in kms_tests leads to kselftest hang when the parameter -s
is used.
In current code flow because of missing break in -s, -t parses args
spilled from -s and as -t accepts only valid values as 0,1 so any arg in
-s >1 or <0, gets in ksm_test failure
This went undetected since, before the addition of option -t, the next
case -M would immediately break out of the switch statement but that is no
longer the case
Add the missing break statement.
----Before----
./ksm_tests -H -s 100
Invalid merge type
----After----
./ksm_tests -H -s 100
Number of normal pages: 0
Number of huge pages: 50
Total size: 100 MiB
Total time: 0.401732682 s
Average speed: 248.922 MiB/s
Link: https://lkml.kernel.org/r/20230728163952.4634-1-ayush.jain3@amd.com
Fixes: 07115fcc15 ("selftests/mm: add new selftests for KSM")
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f1fc67f2c upstream.
damos_new_filter() is not initializing the list field of newly allocated
filter object. However, DAMON sysfs interface and DAMON_RECLAIM are not
initializing it after calling damos_new_filter(). As a result, accessing
uninitialized memory is possible. Actually, adding multiple DAMOS filters
via DAMON sysfs interface caused NULL pointer dereferencing. Initialize
the field just after the allocation from damos_new_filter().
Link: https://lkml.kernel.org/r/20230729203733.38949-2-sj@kernel.org
Fixes: 98def236f6 ("mm/damon/core: implement damos filter")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32c877191e upstream.
Patch series "Fix hugetlb free path race with memory errors".
In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on
HWPOISON hugepages" the race window was discovered.
https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/
Freeing a hugetlb page back to low level memory allocators is performed
in two steps.
1) Under hugetlb lock, remove page from hugetlb lists and clear destructor
2) Outside lock, allocate vmemmap if necessary and call low level free
Between these two steps, the hugetlb page will appear as a normal
compound page. However, vmemmap for tail pages could be missing.
If a memory error occurs at this time, we could try to update page
flags non-existant page structs.
A much more detailed description is in the first patch.
The first patch addresses the race window. However, it adds a
hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page
free operation. This could lead to slowdowns if one is freeing a large
number of hugetlb pages.
The second path optimizes the update_and_free_pages_bulk routine to only
take the lock once in bulk operations.
The second patch is technically not a bug fix, but includes a Fixes tag
and Cc stable to avoid a performance regression. It can be combined with
the first, but was done separately make reviewing easier.
This patch (of 2):
Freeing a hugetlb page and releasing base pages back to the underlying
allocator such as buddy or cma is performed in two steps:
- remove_hugetlb_folio() is called to remove the folio from hugetlb
lists, get a ref on the page and remove hugetlb destructor. This
all must be done under the hugetlb lock. After this call, the page
can be treated as a normal compound page or a collection of base
size pages.
- update_and_free_hugetlb_folio() is called to allocate vmemmap if
needed and the free routine of the underlying allocator is called
on the resulting page. We can not hold the hugetlb lock here.
One issue with this scheme is that a memory error could occur between
these two steps. In this case, the memory error handling code treats
the old hugetlb page as a normal compound page or collection of base
pages. It will then try to SetPageHWPoison(page) on the page with an
error. If the page with error is a tail page without vmemmap, a write
error will occur when trying to set the flag.
Address this issue by modifying remove_hugetlb_folio() and
update_and_free_hugetlb_folio() such that the hugetlb destructor is not
cleared until after allocating vmemmap. Since clearing the destructor
requires holding the hugetlb lock, the clearing is done in
remove_hugetlb_folio() if the vmemmap is present. This saves a
lock/unlock cycle. Otherwise, destructor is cleared in
update_and_free_hugetlb_folio() after allocating vmemmap.
Note that this will leave hugetlb pages in a state where they are marked
free (by hugetlb specific page flag) and have a ref count. This is not
a normal state. The only code that would notice is the memory error
code, and it is set up to retry in such a case.
A subsequent patch will create a routine to do bulk processing of
vmemmap allocation. This will eliminate a lock/unlock cycle for each
hugetlb page in the case where we are freeing a large number of pages.
Link: https://lkml.kernel.org/r/20230711220942.43706-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com
Fixes: ad2fa3717b ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8654743a0 upstream.
During unmount process of nilfs2, nothing holds nilfs_root structure after
nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously,
nilfs_evict_inode() could cause use-after-free read for nilfs_root if
inodes are left in "garbage_list" and released by nilfs_dispose_list at
the end of nilfs_detach_log_writer(), and this bug was fixed by commit
9b5a04ac3a ("nilfs2: fix use-after-free bug of nilfs_root in
nilfs_evict_inode()").
However, it turned out that there is another possibility of UAF in the
call path where mark_inode_dirty_sync() is called from iput():
nilfs_detach_log_writer()
nilfs_dispose_list()
iput()
mark_inode_dirty_sync()
__mark_inode_dirty()
nilfs_dirty_inode()
__nilfs_mark_inode_dirty()
nilfs_load_inode_block() --> causes UAF of nilfs_root struct
This can happen after commit 0ae45f63d4 ("vfs: add support for a
lazytime mount option"), which changed iput() to call
mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME
flag and i_nlink is non-zero.
This issue appears after commit 28a65b49eb ("nilfs2: do not write dirty
data after degenerating to read-only") when using the syzbot reproducer,
but the issue has potentially existed before.
Fix this issue by adding a "purging flag" to the nilfs structure, setting
that flag while disposing the "garbage_list" and checking it in
__nilfs_mark_inode_dirty().
Unlike commit 9b5a04ac3a ("nilfs2: fix use-after-free bug of nilfs_root
in nilfs_evict_inode()"), this patch does not rely on ns_writer to
determine whether to skip operations, so as not to break recovery on
mount. The nilfs_salvage_orphan_logs routine dirties the buffer of
salvaged data before attaching the log writer, so changing
__nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL
will cause recovery write to fail. The purpose of using the cleanup-only
flag is to allow for narrowing of such conditions.
Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com
Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org> # 4.0+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12acb348fa upstream.
A switch from OSI to PC mode is only possible if all CPUs other than the
calling one are OFF, either through a call to CPU_OFF or not yet booted.
Currently OSI mode is enabled before power domains are created. In cases
where CPUidle states are not using hierarchical CPU topology the bail out
path tries to switch back to PC mode which gets denied by firmware since
other CPUs are online at this point and creates inconsistent state as
firmware is in OSI mode and Linux in PC mode.
This change moves enabling OSI mode after power domains are created,
this would makes sure that hierarchical CPU topology is used before
switching firmware to OSI mode.
Cc: stable@vger.kernel.org
Fixes: 70c179b498 ("cpuidle: psci: Allow PM domain to be initialized even if no OSI mode")
Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4060dad25 upstream.
Currently we use the drm_dp_dpcd_read_caps() helper in the DRM side of
nouveau in order to read the DPCD of a DP connector, which makes sure we do
the right thing and also check for extended DPCD caps. However, it turns
out we're not currently doing this on the nvkm side since we don't have
access to the drm_dp_aux structure there - which means that the DRM side of
the driver and the NVKM side can end up with different DPCD capabilities
for the same connector.
Ideally in order to fix this, we just want to use the
drm_dp_read_dpcd_caps() helper in nouveau. That's not currently possible
though, and is going to depend on having a bunch of the DP code moved out
of nvkm and into the DRM side of things as part of the GSP enablement work.
Until then however, let's workaround this problem by porting a copy of
drm_dp_read_dpcd_caps() into NVKM - which should fix this issue.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/211
Link: https://patchwork.freedesktop.org/patch/msgid/20230728225858.350581-1-lyude@redhat.com
(cherry picked from commit cc4adf3a73 in drm-misc-next)
Cc: <stable@vger.kernel.org> # 6.3+
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 688b419c57 upstream.
The Samsung PM9B1 512G SSD found in some Lenovo Yoga 7 14ARB7 laptop units
reports eui as 0001000200030004 when resuming from s2idle, causing the
device to be removed with this error in dmesg:
nvme nvme0: identifiers changed for nsid 1
To fix this, add a quirk to ignore namespace identifiers for this device.
Signed-off-by: August Wikerfors <git@augustwikerfors.se>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29b434d1e4 upstream.
Move start_freeze into nvme_rdma_configure_io_queues(), and there is
at least two benefits:
1) fix unbalanced freeze and unfreeze, since re-connection work may
fail or be broken by removal
2) IO during error recovery can be failfast quickly because nvme fabrics
unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting
because of queue topo change, but that looks not one big deal:
1) same problem exists with current code base
2) compared with !mpath, mpath use case is dominant
Fixes: 9f98772ba3 ("nvme-rdma: fix controller reset hang during traffic")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99dc264014 upstream.
Move start_freeze into nvme_tcp_configure_io_queues(), and there is
at least two benefits:
1) fix unbalanced freeze and unfreeze, since re-connection work may
fail or be broken by removal
2) IO during error recovery can be failfast quickly because nvme fabrics
unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting
because of queue topo change, but that looks not one big deal:
1) same problem exists with current code base
2) compared with !mpath, mpath use case is dominant
Fixes: 2875b0aeca ("nvme-tcp: fix controller reset hang during traffic")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2402048bc upstream.
I'm looking to enable -Wmissing-variable-declarations behind W=1. 0day
bot spotted the following instance in ARCH=riscv builds:
arch/riscv/mm/init.c:276:7: warning: no previous extern declaration
for non-static variable 'trampoline_pg_dir'
[-Wmissing-variable-declarations]
276 | pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
| ^
arch/riscv/mm/init.c:276:1: note: declare 'static' if the variable is
not intended to be used outside of this translation unit
276 | pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
| ^
arch/riscv/mm/init.c:279:7: warning: no previous extern declaration
for non-static variable 'early_pg_dir'
[-Wmissing-variable-declarations]
279 | pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
| ^
arch/riscv/mm/init.c:279:1: note: declare 'static' if the variable is
not intended to be used outside of this translation unit
279 | pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
| ^
These symbols are referenced by more than one translation unit, so make
sure they're both declared and include the correct header for their
declarations. Finally, sort the list of includes to help keep them tidy.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/llvm/202308081000.tTL1ElTr-lkp@intel.com/
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230808-riscv_static-v2-1-2a1e2d2c7a4f@google.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0f4b7879f upstream.
The lightweight spinlock checks verify that a spinlock has either value
0 (spinlock locked) and that not any other bits than in
__ARCH_SPIN_LOCK_UNLOCKED_VAL is set.
This breaks the current LWS code, which writes the address of the lock
into the lock word to unlock it, which was an optimization to save one
assembler instruction.
Fix it by making spinlock_types.h accessible for asm code, change the
LWS spinlock-unlocking code to write __ARCH_SPIN_LOCK_UNLOCKED_VAL into
the lock word, and add some missing lightweight spinlock checks to the
LWS path. Finally, make the spinlock checks dependend on DEBUG_KERNEL.
Noticed-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v6.4+
Fixes: 15e64ef652 ("parisc: Add lightweight spinlock checks")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56675f8b9f upstream.
The changes from commit 32832a407a ("io_uring: Fix io_uring mmap() by
using architecture-provided get_unmapped_area()") to the parisc
implementation of get_unmapped_area() broke glibc's locale-gen
executable when running on parisc.
This patch reverts those architecture-specific changes, and instead
adjusts in io_uring_mmu_get_unmapped_area() the pgoff offset which is
then given to parisc's get_unmapped_area() function. This is much
cleaner than the previous approach, and we still will get a coherent
addresss.
This patch has no effect on other architectures (SHM_COLOUR is only
defined on parisc), and the liburing testcase stil passes on parisc.
Cc: stable@vger.kernel.org # 6.4
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Fixes: 32832a407a ("io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()")
Fixes: d808459b2e ("io_uring: Adjust mapping wrt architecture aliasing requirements")
Link: https://lore.kernel.org/r/ZNEyGV0jyI8kOOfz@p100
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95848dcb9d upstream.
Commit af8b04c637 ("zram: simplify bvec iteration in
__zram_make_request") changed the bio iteration in zram to rely on the
implicit capping to page boundaries in bio_for_each_segment. But it
failed to care for the fact zram not only care about the page alignment
of the bio payload, but also the page alignment into the device. For
buffered I/O and swap those are the same, but for direct I/O or kernel
internal I/O like XFS log buffer writes they can differ.
Fix this by open coding bio_for_each_segment and limiting the bvec len
so that it never crosses over a page alignment boundary in the device
in addition to the payload boundary already taken care of by
bio_iter_iovec.
Cc: stable@vger.kernel.org
Fixes: af8b04c637 ("zram: simplify bvec iteration in __zram_make_request")
Reported-by: Dusty Mabe <dusty@dustymabe.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Link: https://lore.kernel.org/r/20230805055537.147835-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56fec0051a upstream.
The PCSpecialist Elimina Pro 16 M laptop model is a Zen laptop which
needs to use the MADT IRQ settings override and which does not have
an INT_SRC_OVR entry for IRQ 1 in its MADT.
So this model needs a DMI quirk to enable the MADT IRQ settings override
to fix its keyboard not working.
Fixes: a9c4a912b7 ("ACPI: resource: Remove "Zen" specific match and quirks")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217394#c18
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9728ac2211 upstream.
All the cases, were the DSDT IRQ settings should be used instead of
the MADT override, are for IRQ 1 or 12, the PS/2 kbd resp. mouse IRQs.
Simplify things by always honering the override for other legacy IRQs
(for non DMI quirked cases).
This allows removing the DMI quirks to honor the override for
some non i8042 IRQs on some AMD ZEN based Lenovo models.
Fixes: a9c4a912b7 ("ACPI: resource: Remove "Zen" specific match and quirks")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7dfeda6fd upstream.
When unloading the MANA driver, mana_dealloc_queues() waits for the MANA
hardware to complete any inflight packets and set the pending send count
to zero. But if the hardware has failed, mana_dealloc_queues()
could wait forever.
Fix this by adding a timeout to the wait. Set the timeout to 120 seconds,
which is a somewhat arbitrary value that is more than long enough for
functional hardware to complete any sends.
Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Link: https://lore.kernel.org/r/1691576525-24271-1-git-send-email-schakrabarti@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96891e90d1 upstream.
A couple of hardware registers need to be set to reflect which
interrupts have been allocated to the device. Each register is 32-bit
wide and can receive four 8-bit values. If we provide any other interrupt
number than four, the irq_num variable will never be 0 within the while
check and the while block will loop forever.
There is an easy way to prevent this: just break the for loop
when we reach "irq_num == 0", which anyway means all interrupts have
been processed.
Cc: stable@vger.kernel.org
Fixes: 17ce252266 ("dmaengine: xilinx: xdma: Add xilinx xdma driver")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20230731101442.792514-2-miquel.raynal@bootlin.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 511b90e392 upstream.
Despite commit 0ad529d9fd ("mptcp: fix possible divide by zero in
recvmsg()"), the mptcp protocol is still prone to a race between
disconnect() (or shutdown) and accept.
The root cause is that the mentioned commit checks the msk-level
flag, but mptcp_stream_accept() does acquire the msk-level lock,
as it can rely directly on the first subflow lock.
As reported by Christoph than can lead to a race where an msk
socket is accepted after that mptcp_subflow_queue_clean() releases
the listener socket lock and just before it takes destructive
actions leading to the following splat:
BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 5a4ca067 P4D 5a4ca067 PUD 37d4c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 10955 Comm: syz-executor.5 Not tainted 6.5.0-rc1-gdc7b257ee5dd #37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:mptcp_stream_accept+0x1ee/0x2f0 include/net/inet_sock.h:330
Code: 0a 09 00 48 8b 1b 4c 39 e3 74 07 e8 bc 7c 7f fe eb a1 e8 b5 7c 7f fe 4c 8b 6c 24 08 eb 05 e8 a9 7c 7f fe 49 8b 85 d8 09 00 00 <0f> b6 40 12 88 44 24 07 0f b6 6c 24 07 bf 07 00 00 00 89 ee e8 89
RSP: 0018:ffffc90000d07dc0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888037e8d020 RCX: ffff88803b093300
RDX: 0000000000000000 RSI: ffffffff833822c5 RDI: ffffffff8333896a
RBP: 0000607f82031520 R08: ffff88803b093300 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000003e83 R12: ffff888037e8d020
R13: ffff888037e8c680 R14: ffff888009af7900 R15: ffff888009af6880
FS: 00007fc26d708640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000012 CR3: 0000000066bc5001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
do_accept+0x1ae/0x260 net/socket.c:1872
__sys_accept4+0x9b/0x110 net/socket.c:1913
__do_sys_accept4 net/socket.c:1954 [inline]
__se_sys_accept4 net/socket.c:1951 [inline]
__x64_sys_accept4+0x20/0x30 net/socket.c:1951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Address the issue by temporary removing the pending request socket
from the accept queue, so that racing accept() can't touch them.
After depleting the msk - the ssk still exists, as plain TCP sockets,
re-insert them into the accept queue, so that later inet_csk_listen_stop()
will complete the tcp socket disposal.
Fixes: 2a6a870e44 ("mptcp: stops worker on unaccepted sockets at listener close")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/423
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-4-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8c101ae39 upstream.
mptcp_join 'implicit EP' test currently fails when using ip mptcp:
$ ./mptcp_join.sh -iI
<snip>
001 implicit EP creation[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 '
Error: too many addresses or duplicate one: -22.
ID change is prevented[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 '
modif is allowed[fail] expected '10.0.2.2 10.0.2.2 id 1 signal' found '10.0.2.2 id 1 signal '
This happens because of two reasons:
- iproute v6.3.0 does not support the implicit flag, fixed with
iproute2-next commit 3a2535a41854 ("mptcp: add support for implicit
flag")
- pm_nl_check_endpoint wrongly expects the ip address to be repeated two
times in iproute output, and does not account for a final whitespace
in it.
This fixes the issue trimming the whitespace in the output string and
removing the double address in the expected string.
Fixes: 69c6ce7b6e ("selftests: mptcp: add implicit endpoint test case")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-2-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aaf2123a5c upstream.
mptcp_join 'delete and re-add' test fails when using ip mptcp:
$ ./mptcp_join.sh -iI
<snip>
002 delete and re-add before delete[ ok ]
mptcp_info subflows=1 [ ok ]
Error: argument "ADDRESS" is wrong: invalid for non-zero id address
after delete[fail] got 2:2 subflows expected 1
This happens because endpoint delete includes an ip address while id is
not 0, contrary to what is indicated in the ip mptcp man page:
"When used with the delete id operation, an IFADDR is only included when
the ID is 0."
This fixes the issue using the $addr variable in pm_nl_del_endpoint()
only when id is 0.
Fixes: 34aa6e3bcc ("selftests: mptcp: add ip mptcp wrappers")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-1-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 048c796beb upstream.
The upcoming (and nearly finalized):
https://datatracker.ietf.org/doc/draft-collink-6man-pio-pflag/
will update the IPv6 RA to include a new flag in the PIO field,
which will serve as a hint to perform DHCPv6-PD.
As we don't want DHCPv6 related logic inside the kernel, this piece of
information needs to be exposed to userspace. The simplest option is to
simply expose the entire PIO through the already existing mechanism.
Even without this new flag, the already existing PIO R (router address)
flag (from RFC6275) cannot AFAICT be handled entirely in kernel,
and provides useful information that should be exposed to userspace
(the router's global address, for use by Mobile IPv6).
Also cc'ing stable@ for inclusion in LTS, as while technically this is
not quite a bugfix, and instead more of a feature, it is absolutely
trivial and the alternative is manually cherrypicking into all Android
Common Kernel trees - and I know Greg will ask for it to be sent in via
LTS instead...
Cc: Jen Linkova <furry@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Cc: stable@vger.kernel.org
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20230807102533.1147559-1-maze@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46622219aa upstream.
In the allowedips self-test, nodes are inserted into the tree, but it
generated an even amount of nodes, but for checking maximum node depth,
there is of course the root node, which makes the total number
necessarily odd. With two few nodes added, it never triggered the
maximum depth check like it should have. So, add 129 nodes instead of
128 nodes, and do so with a more straightforward scheme, starting with
all the bits set, and shifting over one each time. Then increase the
maximum depth to 129, and choose a better name for that variable to
make it clear that it represents depth as opposed to bits.
Cc: stable@vger.kernel.org
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230807132146.2191597-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6311071a05 upstream.
nl80211_parse_mbssid_elems() uses a u8 variable num_elems to count the
number of MBSSID elements in the nested netlink attribute attrs, which can
lead to an integer overflow if a user of the nl80211 interface specifies
256 or more elements in the corresponding attribute in userspace. The
integer overflow can lead to a heap buffer overflow as num_elems determines
the size of the trailing array in elems, and this array is thereafter
written to for each element in attrs.
Note that this vulnerability only affects devices with the
wiphy->mbssid_max_interfaces member set for the wireless physical device
struct in the device driver, and can only be triggered by a process with
CAP_NET_ADMIN capabilities.
Fix this by checking for a maximum of 255 elements in attrs.
Cc: stable@vger.kernel.org
Fixes: dc1e3cb8da ("nl80211: MBSSID and EMA support in AP mode")
Signed-off-by: Keith Yeo <keithyjy@gmail.com>
Link: https://lore.kernel.org/r/20230731034719.77206-1-keithyjy@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7588dbcebc upstream.
A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger
a double fetch race condition vulnerability and invoke the VMGEXIT
handler recursively.
sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then
fetches the exit code using ghcb_get_sw_exit_code(). Soon after,
sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB
page is shared with the guest, the guest is able to quickly swap the
values with another vCPU and hence bypass the validation. One vmexit code
that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT;
if sev_handle_vmgexit() observes it in the second fetch, the call
to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again
recursively.
To avoid the race, always fetch the GHCB data from the places where
sev_es_sync_from_ghcb stores it.
Exploiting recursions on linux kernel has been proven feasible
in the past, but the impact is mitigated by stack guard pages
(CONFIG_VMAP_STACK). Still, if an attacker manages to call the handler
multiple times, they can theoretically trigger a stack overflow and
cause a denial-of-service, or potentially guest-to-host escape in kernel
configurations without stack guard pages.
Note that winning the race reliably in every iteration is very tricky
due to the very tight window of the fetches; depending on the compiler
settings, they are often consecutive because of optimization and inlining.
Tested by booting an SEV-ES RHEL9 guest.
Fixes: CVE-2023-4155
Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Reported-by: Andy Nguyen <theflow@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e15a0ddc3 upstream.
Validation of the GHCB is susceptible to time-of-check/time-of-use vulnerabilities.
To avoid them, we would like to always snapshot the fields that are read in
sev_es_validate_vmgexit(), and not use the GHCB anymore after it returns.
This means:
- invoking sev_es_sync_from_ghcb() before any GHCB access, including before
sev_es_validate_vmgexit()
- snapshotting all fields including the valid bitmap and the sw_scratch field,
which are currently not caching anywhere.
The valid bitmap is the first thing to be copied out of the GHCB; then,
further accesses will use the copy in svm->sev_es.
Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79ed288cef upstream.
There are multiple smb2_ea_info buffers in FILE_FULL_EA_INFORMATION request
from client. ksmbd find next smb2_ea_info using ->NextEntryOffset of
current smb2_ea_info. ksmbd need to validate buffer length Before
accessing the next ea. ksmbd should check buffer length using buf_len,
not next variable. next is the start offset of current ea that got from
previous ea.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21598
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cacc6e2293 upstream.
The same checks are repeated in three places to decide whether to use
hwrng. Consolidate these into a helper.
Also this fixes a case that one of them was missing a check in the
cleanup path.
Fixes: 554b841d47 ("tpm: Disable RNG for all AMD fTPMs")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e117e7adc6 upstream.
The Lenovo ThinkStation P620 suffers from an irq storm issue like various
other Lenovo machines, so add an entry for it to tpm_tis_dmi_table and
force polling.
It is worth noting that 481c2d1462 (tpm,tpm_tis: Disable interrupts after
1000 unhandled IRQs) does not seem to fix the problem on this machine, but
setting 'tpm_tis.interrupts=0' on the kernel command line does.
[jarkko@kernel.org: truncated the commit ID in the description to 12
characters]
Cc: stable@vger.kernel.org # v6.4+
Fixes: e644b2f498 ("tpm, tpm_tis: Enable interrupt test")
Signed-off-by: Jonathan McDowell <noodles@meta.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 554b841d47 upstream.
The TPM RNG functionality is not necessary for entropy when the CPU
already supports the RDRAND instruction. The TPM RNG functionality
was previously disabled on a subset of AMD fTPM series, but reports
continue to show problems on some systems causing stutter root caused
to TPM RNG functionality.
Expand disabling TPM RNG use for all AMD fTPMs whether they have versions
that claim to have fixed or not. To accomplish this, move the detection
into part of the TPM CRB registration and add a flag indicating that
the TPM should opt-out of registration to hwrng.
Cc: stable@vger.kernel.org # 6.1.y+
Fixes: b006c439d5 ("hwrng: core - start hwrng kthread also for untrusted sources")
Fixes: f1324bbc40 ("tpm: disable hwrng for fTPM on some AMD designs")
Reported-by: daniil.stas@posteo.net
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217719
Reported-by: bitlord0xff@gmail.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217212
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77245f1c3c upstream.
Under certain circumstances, an integer division by 0 which faults, can
leave stale quotient data from a previous division operation on Zen1
microarchitectures.
Do a dummy division 0/1 before returning from the #DE exception handler
in order to avoid any leaks of potentially sensitive data.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3bcbc20942 ]
To allow running rseq and KVM's rseq selftests as statically linked
binaries, initialize the various "trampoline" pointers to point directly
at the expect glibc symbols, and skip the dlysm() lookups if the rseq
size is non-zero, i.e. the binary is statically linked *and* the libc
registered its own rseq.
Define weak versions of the symbols so as not to break linking against
libc versions that don't support rseq in any capacity.
The KVM selftests in particular are often statically linked so that they
can be run on targets with very limited runtime environments, i.e. test
machines.
Fixes: 233e667e1a ("selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35")
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721223352.2333911-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit db3b5cb64a upstream.
Use the generic term fw_reserved_memory for FW reserve region. This
region may also hold discovery TMR in addition to other reserve
regions. This region size could be larger than discovery tmr size, hence
don't change the discovery tmr size based on this.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This change fixes reading IP discovery from debugfs.
It needed to be hand modified because GC 9.4.3 support isn't
introduced in older kernels until 228ce17643 ("drm/amdgpu: Handle
VRAM dependencies on GFXIP9.4.3")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2748
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 71c8f9cf26 ]
gcc gets confused when -ftrivial-auto-var-init=pattern is used on sparse
bit fields such as 'struct spi_mem_op', which caused the previous false
positive warning about an uninitialized variable:
drivers/mtd/spi-nor/spansion.c: error: 'op' is used uninitialized [-Werror=uninitialized]
In fact, the variable is fully initialized and gcc does not see it being
used, so the warning is entirely bogus. The problem appears to be
a misoptimization in the initialization of single bit fields when the
rest of the bytes are not initialized.
A previous workaround added another initialization, which ended up
shutting up the warning in spansion.c, though it apparently still happens
in other files as reported by Peter Foley in the gcc bugzilla. The
workaround of adding a fake initialization seems particularly bad
because it would set values that can never be correct but prevent the
compiler from warning about actually missing initializations.
Revert the broken workaround and instead pad the structure to only
have bitfields that add up to full bytes, which should avoid this
behavior in all drivers.
I also filed a new bug against gcc with what I found, so this can
hopefully be addressed in future gcc releases. At the moment, only
gcc-12 and gcc-13 are affected.
Cc: Peter Foley <pefoley2@pefoley.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110743
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108402
Link: https://godbolt.org/z/efMMsG1Kx
Fixes: 420c4495b5 ("mtd: spi-nor: spansion: make sure local struct does not contain garbage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230719190045.4007391-1-arnd@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1eb8d61ac5 ]
This reverts commit 860690a93e.
On the MT8183, the SSPM related clocks were removed claiming a lack of
usage. This however causes some issues when the driver was converted to
the new simple-probe mechanism. This mechanism allocates enough space
for all the clocks defined in the clock driver, not the highest index
in the DT binding. This leads to out-of-bound writes if their are holes
in the DT binding or the driver (due to deprecated or unimplemented
clocks). These errors can go unnoticed and cause memory corruption,
leading to crashes in unrelated areas, or nothing at all. KASAN will
detect them.
Add the SSPM related clocks back to the MT8183 clock driver to fully
implement the DT binding. The SSPM clocks are for the power management
co-processor, and should never be turned off. They are marked as such.
Fixes: 3f37ba7cc3 ("clk: mediatek: mt8183: Convert all remaining clocks to common probe")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://lore.kernel.org/r/20230719074251.1219089-1-wenst@chromium.org
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea690ad78d ]
Currently, read/write_page_hwecc() and read/write_page_raw() are not
aligned: there is a mismatch in the OOB bytes which are not
read/written at the same offset in both cases (raw vs. hwecc).
This is a real problem when relying on the presence of the Page
Addresses (PA) when using the NAND chip as a boot device, as the
BootROM expects additional data in the OOB area at specific locations.
Rockchip boot blocks are written per 4 x 512 byte sectors per page.
Each page with boot blocks must have a page address (PA) pointer in OOB
to the next page. Pages are written in a pattern depending on the NAND chip ID.
Generate boot block page address and pattern for hwecc in user space
and copy PA data to/from the already reserved last 4 bytes before ECC
in the chip->oob_poi data layout.
Align the different helpers. This change breaks existing jffs2 users.
Fixes: 058e0e847d ("mtd: rawnand: rockchip: NFC driver for RK3308, RK2928 and others")
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/5e782c08-862b-51ae-47ff-3299940928ca@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d0ca3b92b7 ]
Rockchip boot blocks are written per 4 x 512 byte sectors per page.
Each page with boot blocks must have a page address (PA) pointer in OOB
to the next page.
The currently advertised free OOB area starts at offset 6, like
if 4 PA bytes were located right after the BBM. This is wrong as the
PA bytes are located right before the ECC bytes.
Fix the layout by allowing access to all bytes between the BBM and the
PA bytes instead of reserving 4 bytes right after the BBM.
This change breaks existing jffs2 users.
Fixes: 058e0e847d ("mtd: rawnand: rockchip: NFC driver for RK3308, RK2928 and others")
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/d202f12d-188c-20e8-f2c2-9cc874ad4d22@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a6ec83786a upstream.
syzbot reports below bug:
BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000
CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
print_report mm/kasan/report.c:462 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:572
f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944
f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154
f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721
f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749
f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799
f2fs_truncate include/linux/fs.h:825 [inline]
f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006
notify_change+0xb2c/0x1180 fs/attr.c:483
do_truncate+0x143/0x200 fs/open.c:66
handle_truncate fs/namei.c:3295 [inline]
do_open fs/namei.c:3640 [inline]
path_openat+0x2083/0x2750 fs/namei.c:3791
do_filp_open+0x1ba/0x410 fs/namei.c:3818
do_sys_openat2+0x16d/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_creat fs/open.c:1448 [inline]
__se_sys_creat fs/open.c:1442 [inline]
__x64_sys_creat+0xcd/0x120 fs/open.c:1442
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The root cause is, inodeA references inodeB via inodeB's ino, once inodeA
is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's
node page, it traverse mapping data from node->i.i_addr[0] to
node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.
This patch fixes to add sanity check on dnode page in truncate_dnode(),
so that, it can help to avoid triggering such issue, and once it encounters
such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE
error into superblock, later fsck can detect such issue and try repairing.
Also, it removes f2fs_truncate_data_blocks() for cleanup due to the
function has only one caller, and uses f2fs_truncate_data_blocks_range()
instead.
Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8ccbd2191 upstream.
At add_new_free_space() we have these BUG_ON()'s that are there to deal
with any failure to add free space to the in memory free space cache.
Such failures are mostly -ENOMEM that should be very rare. However there's
no need to have these BUG_ON()'s, we can just return any error to the
caller and all callers and their upper call chain are already dealing with
errors.
So just make add_new_free_space() return any errors, while removing the
BUG_ON()'s, and returning the total amount of added free space to an
optional u64 pointer argument.
Reported-by: syzbot+3ba856e07b7127889d8c@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/000000000000e9cb8305ff4e8327@google.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c541dce86c upstream.
The reconfigure / remount code takes a lot of effort to protect
filesystem's reconfiguration code from racing writes on remounting
read-only. However during remounting read-only filesystem to read-write
mode userspace writes can start immediately once we clear SB_RDONLY
flag. This is inconvenient for example for ext4 because we need to do
some writes to the filesystem (such as preparation of quota files)
before we can take userspace writes so we are clearing SB_RDONLY flag
before we are fully ready to accept userpace writes and syzbot has found
a way to exploit this [1]. Also as far as I'm reading the code
the filesystem remount code was protected from racing writes in the
legacy mount path by the mount's MNT_READONLY flag so this is relatively
new problem. It is actually fairly easy to protect remount read-write
from racing writes using sb->s_readonly_remount flag so let's just do
that instead of having to workaround these races in the filesystem code.
[1] https://lore.kernel.org/all/00000000000006a0df05f6667499@google.com/T/
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230615113848.8439-1-jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 797964253d upstream.
In commit 20ea1e7d13 ("file: always lock position for
FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because
pidfd_getfd() could get a reference to the file even when it didn't have
an elevated file count due to threading of other sharing cases.
But Mateusz Guzik reports that the extra locking is actually measurable,
so let's re-introduce the optimization, and only force the locking for
directory traversal.
Directories need the lock for correctness reasons, while regular files
only need it for "POSIX semantics". Since pidfd_getfd() is about
debuggers etc special things that are _way_ outside of POSIX, we can
relax the rules for that case.
Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9ffa069e0 upstream.
After merging the net-next tree, today's linux-next build (sparc64
defconfig) failed like this:
drivers/net/ethernet/sun/sunvnet_common.c: In function 'vnet_handle_offloads':
drivers/net/ethernet/sun/sunvnet_common.c:1277:16: error: implicit declaration of function 'skb_gso_segment'; did you mean 'skb_gso_reset'? [-Werror=implicit-function-declaration]
1277 | segs = skb_gso_segment(skb, dev->features & ~NETIF_F_TSO);
| ^~~~~~~~~~~~~~~
| skb_gso_reset
drivers/net/ethernet/sun/sunvnet_common.c:1277:14: warning: assignment to 'struct sk_buff *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
1277 | segs = skb_gso_segment(skb, dev->features & ~NETIF_F_TSO);
| ^
Fixes: d457a0e329 ("net: move gso declarations and functions to their own files")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230613164639.164b2991@canb.auug.org.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a337b64f0d upstream.
Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.
After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.
We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.
We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.
When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.
Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.
v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d858 ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 946e047a3d)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 045aecdfcb upstream.
Systems which implement SME without also implementing SVE are
architecturally valid but were not initially supported by the kernel,
unfortunately we missed one issue in the ptrace code.
The SVE register setting code is shared between SVE and streaming mode
SVE. When we set full SVE register state we currently enable TIF_SVE
unconditionally, in the case where streaming SVE is being configured on a
system that supports vanilla SVE this is not an issue since we always
initialise enough state for both vector lengths but on a system which only
support SME it will result in us attempting to restore the SVE vector
length after having set streaming SVE registers.
Fix this by making the enabling of SVE conditional on setting SVE vector
state. If we set streaming SVE state and SVE was not already enabled this
will result in a SVE access trap on next use of normal SVE, this will cause
us to flush our register state but this is fine since the only way to
trigger a SVE access trap would be to exit streaming mode which will cause
the in register state to be flushed anyway.
Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-1-49df214bfb3e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9bb40b7f7 upstream.
When setting SME vector lengths we clear TIF_SME to reenable SME traps,
doing a reallocation of the backing storage on next use. We do this using
clear_thread_flag() which operates on the current thread, meaning that when
setting the vector length via ptrace we may both not force traps for the
target task and force a spurious flush of any SME state that the tracing
task may have.
Clear the flag in the target task.
Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-tif-sme-v1-1-88312fd6fbfd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11260c3d60 upstream.
Customer reported that they couldn't mount their DFS link that was
seen by the client as a DFS interlink -- special form of DFS link
where its single target may point to a different DFS namespace -- and
it turned out that it was just a regular DFS link where its referral
header flags missed the StorageServers bit thus making the client
think it couldn't tree connect to target directly without requiring
further referrals.
When the DFS link referral header flags misses the StoraServers bit
and its target doesn't respond to any referrals, then tree connect to
it.
Fixes: a1c0d00572 ("cifs: share dfs connections and supers")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d62cc390c2 upstream.
We received report [1] of kernel crash, which is caused by
using nesting protection without disabled preemption.
The bpf_event_output can be called by programs executed by
bpf_prog_run_array_cg function that disabled migration but
keeps preemption enabled.
This can cause task to be preempted by another one inside the
nesting protection and lead eventually to two tasks using same
perf_sample_data buffer and cause crashes like:
BUG: kernel NULL pointer dereference, address: 0000000000000001
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
...
? perf_output_sample+0x12a/0x9a0
? finish_task_switch.isra.0+0x81/0x280
? perf_event_output+0x66/0xa0
? bpf_event_output+0x13a/0x190
? bpf_event_output_data+0x22/0x40
? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
? xa_load+0x87/0xe0
? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
? release_sock+0x3e/0x90
? sk_setsockopt+0x1a1/0x12f0
? udp_pre_connect+0x36/0x50
? inet_dgram_connect+0x93/0xa0
? __sys_connect+0xb4/0xe0
? udp_setsockopt+0x27/0x40
? __pfx_udp_push_pending_frames+0x10/0x10
? __sys_setsockopt+0xdf/0x1a0
? __x64_sys_connect+0xf/0x20
? do_syscall_64+0x3a/0x90
? entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixing this by disabling preemption in bpf_event_output.
[1] https://github.com/cilium/cilium/issues/26756
Cc: stable@vger.kernel.org
Reported-by: Oleg "livelace" Popov <o.popov@livelace.ru>
Closes: https://github.com/cilium/cilium/issues/26756
Fixes: 2a916f2f54 ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.")
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d01e07fd1 upstream.
Due to rbd_try_acquire_lock() effectively swallowing all but
EBLOCKLISTED error from rbd_try_lock() ("request lock anyway") and
rbd_request_lock() returning ETIMEDOUT error not only for an actual
notify timeout but also when the lock owner doesn't respond, a busy
loop inside of rbd_acquire_lock() between rbd_try_acquire_lock() and
rbd_request_lock() is possible.
Requesting the lock on EBUSY error (returned by get_lock_owner_info()
if an incompatible lock or invalid lock owner is detected) makes very
little sense. The same goes for ETIMEDOUT error (might pop up pretty
much anywhere if osd_request_timeout option is set) and many others.
Just fail I/O requests on rbd_dev->acquiring_list immediately on any
error from rbd_try_lock().
Cc: stable@vger.kernel.org # 588159009d: rbd: retrieve and check lock owner twice before blocklisting
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5ace2a776 upstream.
On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
with ConfigVersion 9.3 or later support IBT in the guest. However,
current versions of Hyper-V have a bug in that there's not an ENDBR64
instruction at the beginning of the hypercall page. Since hypercalls are
made with an indirect call to the hypercall page, all hypercall attempts
fail with an exception and Linux panics.
A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
with ENDBR. The VM will boot and run without IBT.
If future Linux 32-bit kernels were to support IBT, additional hypercall
page hackery would be needed to make IBT work for such kernels in a
Hyper-V VM.
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1690001476-98594-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c9241f3ce upstream.
Commit 66b2c338ad initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/tapX" device node's owner UID. Per original
commit 86741ec254 ("net: core: Add a UID field to struct sock.",
2016-11-04), that's wrong: the idea is to cache the UID of the userspace
process that creates the socket. Commit 86741ec254 mentions socket() and
accept(); with "tap", the action that creates the socket is
open("/dev/tapX").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/tapX" will be owned by root, so in practice, commit 66b2c338ad has
no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 66b2c338ad ("tap: tap_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9bc3047374 upstream.
Commit a096ccca6e initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/net/tun" device node's owner UID. Per
original commit 86741ec254 ("net: core: Add a UID field to struct
sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the
userspace process that creates the socket. Commit 86741ec254 mentions
socket() and accept(); with "tun", the action that creates the socket is
open("/dev/net/tun").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e
has no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: a096ccca6e ("tun: tun_chr_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2c67a3e60 upstream.
The nesting protection in bpf_perf_event_output relies on disabled
preemption, which is guaranteed for kprobes and tracepoints.
However bpf_perf_event_output can be also called from uprobes context
through bpf_prog_run_array_sleepable function which disables migration,
but keeps preemption enabled.
This can cause task to be preempted by another one inside the nesting
protection and lead eventually to two tasks using same perf_sample_data
buffer and cause crashes like:
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle page fault for address: ffffffff82be3eea
...
Call Trace:
? __die+0x1f/0x70
? page_fault_oops+0x176/0x4d0
? exc_page_fault+0x132/0x230
? asm_exc_page_fault+0x22/0x30
? perf_output_sample+0x12b/0x910
? perf_event_output+0xd0/0x1d0
? bpf_perf_event_output+0x162/0x1d0
? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87
? __uprobe_perf_func+0x12b/0x540
? uprobe_dispatcher+0x2c4/0x430
? uprobe_notify_resume+0x2da/0xce0
? atomic_notifier_call_chain+0x7b/0x110
? exit_to_user_mode_prepare+0x13e/0x290
? irqentry_exit_to_user_mode+0x5/0x30
? asm_exc_int3+0x35/0x40
Fixing this by disabling preemption in bpf_perf_event_output.
Cc: stable@vger.kernel.org
Fixes: 8c7dcb84e3 ("bpf: implement sleepable uprobes by chaining gps")
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230725084206.580930-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d42334578e upstream.
exfat_extract_uni_name copies characters from a given file name entry into
the 'uniname' variable. This variable is actually defined on the stack of
the exfat_readdir() function. According to the definition of
the 'exfat_uni_name' type, the file name should be limited 255 characters
(+ null teminator space), but the exfat_get_uniname_from_ext_entry()
function can write more characters because there is no check if filename
entries exceeds max filename length. This patch add the check not to copy
filename characters when exceeding max filename length.
Cc: stable@vger.kernel.org
Cc: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reported-by: Maxim Suhanov <dfirblog@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit daf60d6cca upstream.
The call stack shown below is a scenario in the Linux 4.19 kernel.
Allocating memory failed where exfat fs use kmalloc_array due to
system memory fragmentation, while the u-disk was inserted without
recognition.
Devices such as u-disk using the exfat file system are pluggable and
may be insert into the system at any time.
However, long-term running systems cannot guarantee the continuity of
physical memory. Therefore, it's necessary to address this issue.
Binder:2632_6: page allocation failure: order:4,
mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
Call trace:
[242178.097582] dump_backtrace+0x0/0x4
[242178.097589] dump_stack+0xf4/0x134
[242178.097598] warn_alloc+0xd8/0x144
[242178.097603] __alloc_pages_nodemask+0x1364/0x1384
[242178.097608] kmalloc_order+0x2c/0x510
[242178.097612] kmalloc_order_trace+0x40/0x16c
[242178.097618] __kmalloc+0x360/0x408
[242178.097624] load_alloc_bitmap+0x160/0x284
[242178.097628] exfat_fill_super+0xa3c/0xe7c
[242178.097635] mount_bdev+0x2e8/0x3a0
[242178.097638] exfat_fs_mount+0x40/0x50
[242178.097643] mount_fs+0x138/0x2e8
[242178.097649] vfs_kern_mount+0x90/0x270
[242178.097655] do_mount+0x798/0x173c
[242178.097659] ksys_mount+0x114/0x1ac
[242178.097665] __arm64_sys_mount+0x24/0x34
[242178.097671] el0_svc_common+0xb8/0x1b8
[242178.097676] el0_svc_handler+0x74/0x90
[242178.097681] el0_svc+0x8/0x340
By analyzing the exfat code,we found that continuous physical memory
is not required here,so kvmalloc_array is used can solve this problem.
Cc: stable@vger.kernel.org
Signed-off-by: gaoming <gaoming20@hihonor.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6e2843230 upstream.
If the cluster becomes unavailable, ceph_osdc_notify() may hang even
with osd_request_timeout option set because linger_notify_finish_wait()
waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD
request in flight -- it's completely asynchronous.
Introduce an additional timeout, derived from the specified notify
timeout. While at it, switch both waits to killable which is more
correct.
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 010c1e1c57 upstream.
The Hyper-V host is queried to get the max transfer size that it supports,
and this value is used to set max_sectors for the synthetic SCSI
controller. However, this max transfer size may be too large for virtual
Fibre Channel devices, which are limited to 512 Kbytes. If a larger
transfer size is used with a vFC device, Hyper-V always returns an error,
and storvsc logs a message like this where the SRB status and SCSI status
are both zero:
hv_storvsc <GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001
Add logic to limit the max transfer size to 512 Kbytes for vFC devices.
Fixes: 1d3e098078 ("scsi: storvsc: Correct reporting of Hyper-V I/O size limits")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1689887102-32806-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e658519890 upstream.
Storage devices are free to send RSCNs, e.g. for internal state changes. If
this happens on all connected paths, zfcp risks temporarily losing all
paths at the same time. This has strong requirements on multipath
configuration such as "no_path_retry queue".
Avoid such situations by deferring fc_rport blocking until after the ADISC
response, when any actual state change of the remote port became clear.
The already existing port recovery triggers explicitly block the fc_rport.
The triggers are: on ADISC reject or timeout (typical cable pull case), and
on ADISC indicating that the remote port has changed its WWPN or
the port is meanwhile no longer open.
As a side effect, this also removes a confusing direct function call to
another work item function zfcp_scsi_rport_work() instead of scheduling
that other work item. It was probably done that way to have the rport block
side effect immediate and synchronous to the caller.
Fixes: a2fa0aede0 ("[SCSI] zfcp: Block FC transport rports early on errors")
Cc: stable@vger.kernel.org #v2.6.30+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Link: https://lore.kernel.org/r/20230724145156.3920244-1-maier@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3d8aa84bb upstream.
Currently the rust allocator simply passes the size of the type Layout
to krealloc(), and in theory the alignment requirement from the type
Layout may be larger than the guarantee provided by SLAB, which means
the allocated object is mis-aligned.
Fix this by adjusting the allocation size to the nearest power of two,
which SLAB always guarantees a size-aligned allocation. And because Rust
guarantees that the original size must be a multiple of alignment and
the alignment must be a power of two, then the alignment requirement is
satisfied.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
Signed-off-by: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Cc: stable@vger.kernel.org # v6.1+
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 247b365dc8 ("rust: add `kernel` crate")
Link: https://github.com/Rust-for-Linux/linux/issues/974
Link: https://lore.kernel.org/r/20230730012905.643822-2-boqun.feng@gmail.com
[ Applied rewording of comment as discussed in the mailing list. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ddf251fa2b ]
Whenever tcpm_new() reclaims an old entry, tcpm_suck_dst()
would overwrite data that could be read from tcp_fastopen_cache_get()
or tcp_metrics_fill_info().
We need to acquire fastopen_seqlock to maintain consistency.
For newly allocated objects, tcpm_new() can switch to kzalloc()
to avoid an extra fastopen_seqlock acquisition.
Fixes: 1fe4c481ba ("net-tcp: Fast Open client - cookie cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c4d04f6b4 ]
tm->tcpm_vals[] values can be read or written locklessly.
Add needed READ_ONCE()/WRITE_ONCE() to document this,
and force use of tcp_metric_get() and tcp_metric_set()
Fixes: 51c5d0c4b1 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6638094d7 ]
Because v4 and v6 families use separate inetpeer trees (respectively
net->ipv4.peers and net->ipv6.peers), inetpeer_addr_cmp(a, b) assumes
a & b share the same family.
tcp_metrics use a common hash table, where entries can have different
families.
We must therefore make sure to not call inetpeer_addr_cmp()
if the families do not match.
Fixes: d39d14ffa2 ("net: Add helper function to compare inetpeer addresses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b755c25fbc ]
When both supported and previous version have the same major version,
and the firmwares are missing, the driver ends in a loop requesting the
same (previous) version over and over again:
[ 76.327413] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
[ 76.339802] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.352162] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.364502] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.376848] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.389183] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.401522] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.413860] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.426199] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
...
Fix this by inverting the check to that we aren't yet at the previous
version, and also check the minor version.
This also catches the case where both versions are the same, as it was
after commit bb5dbf2cc6 ("net: marvell: prestera: add firmware v4.0
support").
With this fix applied:
[ 88.499622] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
[ 88.511995] Prestera DX 0000:01:00.0: failed to request previous firmware: mrvl/prestera/mvsw_prestera_fw-v4.0.img
[ 88.522403] Prestera DX: probe of 0000:01:00.0 failed with error -2
Fixes: 47f26018a4 ("net: marvell: prestera: try to load previous fw version")
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Acked-by: Elad Nachman <enachman@marvell.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Taras Chornyi <taras.chornyi@plvision.eu>
Link: https://lore.kernel.org/r/20230802092357.163944-1-jonas.gorski@bisdn.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c635ca45a7 ]
In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added
to support multiple parallel namespaces for multi-chains. And we skip
all the flow tables under the fs_node of this type unconditionally,
when searching for the next or previous flow table to connect for a
new table.
As this search function is also used for find new root table when the
old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS
fs_node next to the old root. However, new root table should be chosen
from it if there is any table in it. Fix it by skipping only the flow
tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the
closest FT for a fs_node.
Besides, complete the connecting from FTs of previous priority of prio
because there should be multiple prevs after this fs_prio type is
introduced. And also the next FT should be chosen from the first flow
table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if
this prio is the first child.
Fixes: 328edb499f ("net/mlx5: Split FDB fast path prio to multiple namespaces")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cfef80d4c ]
dev_close() and dev_open() are issued to change the interface state to DOWN
or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its
Ipv6 addresses and routes. We don't want this in cases of device recovery
(triggered by hardware or software) or when the qeth device is set
offline.
Setting a qeth device offline or online and device recovery actions call
netif_device_detach() and/or netif_device_attach(). That will reset or
set the LOWER_UP indication i.e. change the dev->state Bit
__LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and
still preserves the interface settings that are handled by the network
stack.
Don't call dev_open() nor dev_close() from the qeth device driver. Let the
network stack handle this.
Fixes: d4560150cb ("s390/qeth: call dev_close() during recovery")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31d49ba033 ]
The dcbnl_bcn_setcfg uses erroneous policy to parse tb[DCB_ATTR_BCN],
which is introduced in commit 859ee3c438 ("DCB: Add support for DCB
BCN"). Please see the comment in below code
static int dcbnl_bcn_setcfg(...)
{
...
ret = nla_parse_nested_deprecated(..., dcbnl_pfc_up_nest, .. )
// !!! dcbnl_pfc_up_nest for attributes
// DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_ALL in enum dcbnl_pfc_up_attrs
...
for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) {
// !!! DCB_BCN_ATTR_RP_0 to DCB_BCN_ATTR_RP_7 in enum dcbnl_bcn_attrs
...
value_byte = nla_get_u8(data[i]);
...
}
...
for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) {
// !!! DCB_BCN_ATTR_BCNA_0 to DCB_BCN_ATTR_RI in enum dcbnl_bcn_attrs
...
value_int = nla_get_u32(data[i]);
...
}
...
}
That is, the nla_parse_nested_deprecated uses dcbnl_pfc_up_nest
attributes to parse nlattr defined in dcbnl_pfc_up_attrs. But the
following access code fetch each nlattr as dcbnl_bcn_attrs attributes.
By looking up the associated nla_policy for dcbnl_bcn_attrs. We can find
the beginning part of these two policies are "same".
static const struct nla_policy dcbnl_pfc_up_nest[...] = {
[DCB_PFC_UP_ATTR_0] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_1] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_2] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_3] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_4] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_5] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_6] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_7] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_ALL] = {.type = NLA_FLAG},
};
static const struct nla_policy dcbnl_bcn_nest[...] = {
[DCB_BCN_ATTR_RP_0] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_1] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_2] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_3] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_4] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_5] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_6] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_7] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_ALL] = {.type = NLA_FLAG},
// from here is somewhat different
[DCB_BCN_ATTR_BCNA_0] = {.type = NLA_U32},
...
[DCB_BCN_ATTR_ALL] = {.type = NLA_FLAG},
};
Therefore, the current code is buggy and this
nla_parse_nested_deprecated could overflow the dcbnl_pfc_up_nest and use
the adjacent nla_policy to parse attributes from DCB_BCN_ATTR_BCNA_0.
Hence use the correct policy dcbnl_bcn_nest to parse the nested
tb[DCB_ATTR_BCN] TLV.
Fixes: 859ee3c438 ("DCB: Add support for DCB BCN")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230801013248.87240-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3bb7759a9 ]
As documented in acd7aaf51b ("netsec: ignore 'phy-mode' device
property on ACPI systems") the SocioNext SynQuacer platform ships with
firmware defining the PHY mode as RGMII even though the physical
configuration of the PHY is for TX and RX delays. Since bbc4d71d63
("net: phy: realtek: fix rtl8211e rx/tx delay config") this has caused
misconfiguration of the PHY, rendering the network unusable.
This was worked around for ACPI by ignoring the phy-mode property but
the system is also used with DT. For DT instead if we're running on a
SynQuacer force a working PHY mode, as well as the standard EDK2
firmware with DT there are also some of these systems that use u-boot
and might not initialise the PHY if not netbooting. Newer firmware
imagaes for at least EDK2 are available from Linaro so print a warning
when doing this.
Fixes: 533dd11a12 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230731-synquacer-net-v3-1-944be5f06428@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 13d2618b48 ]
Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC
allocation later in sk_psock_init_link on PREEMPT_RT kernels, since
GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist
patchset notes for details).
This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to
BUG (sleeping function called from invalid context) on RT kernels.
preempt_disable was introduced together with lock_sk and rcu_read_lock
in commit 99ba2b5aba ("bpf: sockhash, disallow bpf_tcp_close and update
in parallel"), probably to match disabled migration of BPF programs, and
is no longer necessary.
Remove preempt_disable to fix BUG in sock_map_update_common on RT.
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/
Fixes: 99ba2b5aba ("bpf: sockhash, disallow bpf_tcp_close and update in parallel")
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230728064411.305576-1-tglozar@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b80b829e9e ]
When route4_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: 1109c00547 ("net: sched: RCU cls_route")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76e42ae831 ]
When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: e35a8ee599 ("net: sched: fw use RCU")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3044b16e7c ]
When u32_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: de5df63228 ("net: sched: cls_u32 changes to knode must appear atomic to readers")
Reported-by: valis <sec@valis.email>
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c62b75cd1 ]
The following warning was reported when running xdp_redirect_cpu with
both skb-mode and stress-mode enabled:
------------[ cut here ]------------
Incorrect XDP memory type (-2128176192) usage
WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405
Modules linked in:
CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G 6.5.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: events __cpu_map_entry_free
RIP: 0010:__xdp_return+0x1e4/0x4a0
......
Call Trace:
<TASK>
? show_regs+0x65/0x70
? __warn+0xa5/0x240
? __xdp_return+0x1e4/0x4a0
......
xdp_return_frame+0x4d/0x150
__cpu_map_entry_free+0xf9/0x230
process_one_work+0x6b0/0xb80
worker_thread+0x96/0x720
kthread+0x1a5/0x1f0
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
</TASK>
The reason for the warning is twofold. One is due to the kthread
cpu_map_kthread_run() is stopped prematurely. Another one is
__cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in
ptr_ring as XDP frames.
Prematurely-stopped kthread will be fixed by the preceding patch and
ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But
as the comments in __cpu_map_ring_cleanup() said, handling and freeing
skbs in ptr_ring as well to "catch any broken behaviour gracefully".
Fixes: 11941f8a85 ("bpf: cpumap: Implement generic cpumap")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 640a604585 ]
The following warning was reported when running stress-mode enabled
xdp_redirect_cpu with some RT threads:
------------[ cut here ]------------
WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135
CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: events cpu_map_kthread_stop
RIP: 0010:put_cpu_map_entry+0xda/0x220
......
Call Trace:
<TASK>
? show_regs+0x65/0x70
? __warn+0xa5/0x240
......
? put_cpu_map_entry+0xda/0x220
cpu_map_kthread_stop+0x41/0x60
process_one_work+0x6b0/0xb80
worker_thread+0x96/0x720
kthread+0x1a5/0x1f0
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
</TASK>
The root cause is the same as commit 4369016497 ("bpf: cpumap: Fix memory
leak in cpu_map_update_elem"). The kthread is stopped prematurely by
kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call
cpu_map_kthread_run() at all but XDP program has already queued some
frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks
the ptr_ring, it will find it was not emptied and report a warning.
An alternative fix is to use __cpu_map_ring_cleanup() to drop these
pending frames or skbs when kthread_stop() returns -EINTR, but it may
confuse the user, because these frames or skbs have been handled
correctly by XDP program. So instead of dropping these frames or skbs,
just make sure the per-cpu kthread is running before
__cpu_map_entry_alloc() returns.
After apply the fix, the error handle for kthread_stop() will be
unnecessary because it will always return 0, so just remove it.
Fixes: 6710e11269 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c3eba1c5e ]
This allows to do more centralized decisions later on, and generally
makes it very explicit which maps are privileged and which are not
(e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants,
as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit
and easy to verify).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230613223533.3689589-4-andrii@kernel.org
Stable-dep-of: 640a604585 ("bpf, cpumap: Make sure kthread is running before map update returns")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22db41226b ]
Currently find_and_alloc_map() performs two separate functions: some
argument sanity checking and partial map creation workflow hanling.
Neither of those functions are self-sufficient and are augmented by
further checks and initialization logic in the caller (map_create()
function). So unify all the sanity checks, permission checks, and
creation and initialization logic in one linear piece of code in
map_create() instead. This also make it easier to further enhance
permission checks and keep them located in one place.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230613223533.3689589-3-andrii@kernel.org
Stable-dep-of: 640a604585 ("bpf, cpumap: Make sure kthread is running before map update returns")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d28635abc ]
Make each bpf() syscall command a bit more self-contained, making it
easier to further enhance it. We move sysctl_unprivileged_bpf_disabled
handling down to map_create() and bpf_prog_load(), two special commands
in this regard.
Also swap the order of checks, calling bpf_capable() only if
sysctl_unprivileged_bpf_disabled is true, avoiding unnecessary audit
messages.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230613223533.3689589-2-andrii@kernel.org
Stable-dep-of: 640a604585 ("bpf, cpumap: Make sure kthread is running before map update returns")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 611e1b016c ]
The two mbox-related mutexes are destroyed in octep_ctrl_mbox_uninit(),
but the corresponding mutex_init calls were missing.
A "DEBUG_LOCKS_WARN_ON(lock->magic != lock)" warning was emitted with
CONFIG_DEBUG_MUTEXES on.
Initialize the two mutexes in octep_ctrl_mbox_init().
Fixes: 577f0d1b1c ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230729151516.24153-1-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 37b61cda9c ]
Similarly to other recently fixed drivers make sure we don't
try to access XDP or page pool APIs when NAPI budget is 0.
NAPI budget of 0 may mean that we are in netpoll.
This may result in running software IRQs in hard IRQ context,
leading to deadlocks or crashes.
To make sure bnapi->tx_pkts don't get wiped without handling
the events, move clearing the field into the handler itself.
Remember to clear tx_pkts after reset (bnxt_enable_napi())
as it's technically possible that netpoll will accumulate
some tx_pkts and then a reset will happen, leaving tx_pkts
out of sync with reality.
Fixes: 322b87ca55 ("bnxt_en: add page_pool support")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230728205020.2784844-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e7417c188 ]
The timer dev->stat_monitor can schedule the delayed work dev->wq and
the delayed work dev->wq can also arm the dev->stat_monitor timer.
When the device is detaching, the net_device will be deallocated. but
the net_device private data could still be dereferenced in delayed work
or timer handler. As a result, the UAF bugs will happen.
One racy situation is shown below:
(Thread 1) | (Thread 2)
lan78xx_stat_monitor() |
... | lan78xx_disconnect()
lan78xx_defer_kevent() | ...
... | cancel_delayed_work_sync(&dev->wq);
schedule_delayed_work() | ...
(wait some time) | free_netdev(net); //free net_device
lan78xx_delayedwork() |
//use net_device private data |
dev-> //use |
Although we use cancel_delayed_work_sync() to cancel the delayed work
in lan78xx_disconnect(), it could still be scheduled in timer handler
lan78xx_stat_monitor().
Another racy situation is shown below:
(Thread 1) | (Thread 2)
lan78xx_delayedwork |
mod_timer() | lan78xx_disconnect()
| cancel_delayed_work_sync()
(wait some time) | if (timer_pending(&dev->stat_monitor))
| del_timer_sync(&dev->stat_monitor);
lan78xx_stat_monitor() | ...
lan78xx_defer_kevent() | free_netdev(net); //free
//use net_device private data|
dev-> //use |
Although we use del_timer_sync() to delete the timer, the function
timer_pending() returns 0 when the timer is activated. As a result,
the del_timer_sync() will not be executed and the timer could be
re-armed.
In order to mitigate this bug, We use timer_shutdown_sync() to shutdown
the timer and then use cancel_delayed_work_sync() to cancel the delayed
work. As a result, the net_device could be deallocated safely.
What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in
lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE
in lan78xx_stat_monitor(). So this patch put the set_bit() behind
timer_shutdown_sync().
Fixes: 77dfff5bb7 ("lan78xx: Fix race condition in disconnect handling")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e739718444 ]
syzkaller found zero division error [0] in div_s64_rem() called from
get_cycle_time_elapsed(), where sched->cycle_time is the divisor.
We have tests in parse_taprio_schedule() so that cycle_time will never
be 0, and actually cycle_time is not 0 in get_cycle_time_elapsed().
The problem is that the types of divisor are different; cycle_time is
s64, but the argument of div_s64_rem() is s32.
syzkaller fed this input and 0x100000000 is cast to s32 to be 0.
@TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME={0xc, 0x8, 0x100000000}
We use s64 for cycle_time to cast it to ktime_t, so let's keep it and
set max for cycle_time.
While at it, we prevent overflow in setup_txtime() and add another
test in parse_taprio_schedule() to check if cycle_time overflows.
Also, we add a new tdc test case for this issue.
[0]:
divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 103 Comm: kworker/1:3 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:div_s64_rem include/linux/math64.h:42 [inline]
RIP: 0010:get_cycle_time_elapsed net/sched/sch_taprio.c:223 [inline]
RIP: 0010:find_entry_to_transmit+0x252/0x7e0 net/sched/sch_taprio.c:344
Code: 3c 02 00 0f 85 5e 05 00 00 48 8b 4c 24 08 4d 8b bd 40 01 00 00 48 8b 7c 24 48 48 89 c8 4c 29 f8 48 63 f7 48 99 48 89 74 24 70 <48> f7 fe 48 29 d1 48 8d 04 0f 49 89 cc 48 89 44 24 20 49 8d 85 10
RSP: 0018:ffffc90000acf260 EFLAGS: 00010206
RAX: 177450e0347560cf RBX: 0000000000000000 RCX: 177450e0347560cf
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000100000000
RBP: 0000000000000056 R08: 0000000000000000 R09: ffffed10020a0934
R10: ffff8880105049a7 R11: ffff88806cf3a520 R12: ffff888010504800
R13: ffff88800c00d800 R14: ffff8880105049a0 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0edf84f0e8 CR3: 000000000d73c002 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
<TASK>
get_packet_txtime net/sched/sch_taprio.c:508 [inline]
taprio_enqueue_one+0x900/0xff0 net/sched/sch_taprio.c:577
taprio_enqueue+0x378/0xae0 net/sched/sch_taprio.c:658
dev_qdisc_enqueue+0x46/0x170 net/core/dev.c:3732
__dev_xmit_skb net/core/dev.c:3821 [inline]
__dev_queue_xmit+0x1b2f/0x3000 net/core/dev.c:4169
dev_queue_xmit include/linux/netdevice.h:3088 [inline]
neigh_resolve_output net/core/neighbour.c:1552 [inline]
neigh_resolve_output+0x4a7/0x780 net/core/neighbour.c:1532
neigh_output include/net/neighbour.h:544 [inline]
ip6_finish_output2+0x924/0x17d0 net/ipv6/ip6_output.c:135
__ip6_finish_output+0x620/0xaa0 net/ipv6/ip6_output.c:196
ip6_finish_output net/ipv6/ip6_output.c:207 [inline]
NF_HOOK_COND include/linux/netfilter.h:292 [inline]
ip6_output+0x206/0x410 net/ipv6/ip6_output.c:228
dst_output include/net/dst.h:458 [inline]
NF_HOOK.constprop.0+0xea/0x260 include/linux/netfilter.h:303
ndisc_send_skb+0x872/0xe80 net/ipv6/ndisc.c:508
ndisc_send_ns+0xb5/0x130 net/ipv6/ndisc.c:666
addrconf_dad_work+0xc14/0x13f0 net/ipv6/addrconf.c:4175
process_one_work+0x92c/0x13a0 kernel/workqueue.c:2597
worker_thread+0x60f/0x1240 kernel/workqueue.c:2748
kthread+0x2fe/0x3f0 kernel/kthread.c:389
ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
</TASK>
Modules linked in:
Fixes: 4cfd5779bd ("taprio: Add support for txtime-assist mode")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Co-developed-by: Eric Dumazet <edumazet@google.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8bf43be799 ]
sk_getsockopt() runs locklessly. This means sk->sk_priority
can be read while other threads are changing its value.
Other reads also happen without socket lock being held.
Add missing annotations where needed.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e5f0d2dd3c ]
In a prior commit I forgot that sk_getsockopt() reads
sk->sk_ll_usec without holding a lock.
Fixes: 0dbffbb533 ("net: annotate data race around sk_ll_usec")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11695c6e96 ]
sk_getsockopt() runs locklessly, thus we need to annotate the read
of sk->sk_peek_off.
While we are at it, add corresponding annotations to sk_set_peek_off()
and unix_set_peek_off().
Fixes: b9bb53f383 ("sock: convert sk_peek_offset functions to WRITE_ONCE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c5b4d69c3 ]
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800 ("[NET]: Introducing socket mark socket option.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b4b5532530 ]
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvbuf locklessly.
Fixes: ebb3b78db7 ("tcp: annotate sk->sk_rcvbuf lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 74bc084327 ]
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_sndbuf locklessly.
Fixes: e292f05e0d ("tcp: annotate sk->sk_sndbuf lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6d12bdb43 ]
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvlowat locklessly.
Fixes: eac66402d1 ("net: annotate sk->sk_rcvlowat lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea7f45ef77 ]
sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate
can be read while other threads are changing its value.
Fixes: 62748f32d5 ("net: introduce SO_MAX_PACING_RATE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c76a032889 ]
sk_getsockopt() runs locklessly. This means sk->sk_txrehash
can be read while other threads are changing its value.
Other locations were handled in commit cb6cd2cec7
("tcp: Change SYN ACK retransmit behaviour to account for rehash")
Fixes: 26859240e4 ("txhash: Add socket option to control TX hash rethink behavior")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe11fdcb42 ]
sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem
can be read while other threads are changing its value.
Add missing annotations where they are needed.
Fixes: 2bb2f5fb21 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7938cd1543 ]
This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
`udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
device from CB. The fix changes it to fetch the device from `skb->dev`.
l3mdev case requires special attention since it has a master and a slave
device.
Fixes: a6024562ff ("udp: Add GRO functions to UDP socket")
Reported-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e346e231b4 ]
Here we've got to a situation when tasklet called usleep_range() in PTT
acquire logic, thus welcome to the "scheduling while atomic" BUG().
BUG: scheduling while atomic: swapper/24/0/0x00000100
[<ffffffffb41c6199>] schedule+0x29/0x70
[<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150
[<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20
[<ffffffffb41c3bcf>] usleep_range+0x4f/0x70
[<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed]
[<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed]
[<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed]
[<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed]
qed_mcp_send_protocol_stats
case MFW_DRV_MSG_GET_LAN_STATS:
case MFW_DRV_MSG_GET_FCOE_STATS:
case MFW_DRV_MSG_GET_ISCSI_STATS:
case MFW_DRV_MSG_GET_RDMA_STATS:
[<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed]
qed_int_assertion
qed_int_attentions
[<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed]
[<ffffffffb3aa7623>] tasklet_action+0x83/0x140
[<ffffffffb41d9125>] __do_softirq+0x125/0x2bb
[<ffffffffb41d560c>] call_softirq+0x1c/0x30
[<ffffffffb3a30645>] do_softirq+0x65/0xa0
[<ffffffffb3aa78d5>] irq_exit+0x105/0x110
[<ffffffffb41d8996>] do_IRQ+0x56/0xf0
Fix this by making caller to provide the context whether it could be in
atomic context flow or not when getting stats from QED driver.
QED driver based on the context provided decide to schedule out or not
when acquiring the PTT BAR window.
We faced the BUG_ON() while getting vport stats, but according to the
code same issue could happen for fcoe and iscsi statistics as well, so
fixing them too.
Fixes: 6c75424612 ("qed: Add support for NCSI statistics.")
Fixes: 1e128c8129 ("qed: Add support for hardware offloaded FCoE.")
Fixes: 2f2b2614e8 ("qed: Provide iSCSI statistics to management")
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: David Miller <davem@davemloft.net>
Cc: Manish Chopra <manishc@marvell.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56c6be35fc ]
As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq
hfcpci_int(), the timer should disable irq before lock acquisition
otherwise deadlock could happen if the timmer is preemtped by the hadr irq.
Possible deadlock scenario:
hfcpci_softirq() (timer)
-> _hfcpci_softirq()
-> spin_lock(&hc->lock);
<irq interruption>
-> hfcpci_int()
-> spin_lock(&hc->lock); (deadlock here)
This flaw was found by an experimental static analysis tool I am developing
for irq-related deadlock.
The tentative patch fixes the potential deadlock by spin_lock_irq()
in timer.
Fixes: b36b654a7e ("mISDN: Create /sys/class/mISDN")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230727085619.7419-1-dg573847474@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e68409db99 ]
A match entry is uniquely identified with an "address" or "path" in the
form of: hashtable ID(12b):bucketid(8b):nodeid(12b).
When creating table match entries all of hash table id, bucket id and
node (match entry id) are needed to be either specified by the user or
reasonable in-kernel defaults are used. The in-kernel default for a table id is
0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there
was none for a nodeid i.e. the code assumed that the user passed the correct
nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what
was used. But nodeid of 0 is reserved for identifying the table. This is not
a problem until we dump. The dump code notices that the nodeid is zero and
assumes it is referencing a table and therefore references table struct
tc_u_hnode instead of what was created i.e match entry struct tc_u_knode.
Ming does an equivalent of:
tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \
protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok
Essentially specifying a table id 0, bucketid 1 and nodeid of zero
Tableid 0 is remapped to the default of 0x800.
Bucketid 1 is ignored and defaults to 0x00.
Nodeid was assumed to be what Ming passed - 0x000
dumping before fix shows:
~$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591
Note that the last line reports a table instead of a match entry
(you can tell this because it says "ht divisor...").
As a result of reporting the wrong data type (misinterpretting of struct
tc_u_knode as being struct tc_u_hnode) the divisor is reported with value
of -30591. Ming identified this as part of the heap address
(physmap_base is 0xffff8880 (-30591 - 1)).
The fix is to ensure that when table entry matches are added and no
nodeid is specified (i.e nodeid == 0) then we get the next available
nodeid from the table's pool.
After the fix, this is what the dump shows:
$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw
match 0a000001/ffffffff at 12
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
Reported-by: Mingi Cho <mgcho.minic@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230726135151.416917-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d73ef2d69c ]
There are totally 9 ndo_bridge_setlink handlers in the current kernel,
which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.
By investigating the code, we find that 1-7 parse and use nlattr
IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
length 0) to be viewed as a 2 byte integer.
To avoid such issues, also for other ndo_bridge_setlink handlers in the
future. This patch adds the nla_len check in rtnl_bridge_setlink and
does an early error return if length mismatches. To make it works, the
break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
this nla_for_each_nested iterates every attribute.
Fixes: b1edc14a3f ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Fixes: 51616018dd ("i40e: Add support for getlink, setlink ndo ops")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bcc29b7f5a ]
The nla_for_each_nested parsing in function bpf_sk_storage_diag_alloc
does not check the length of the nested attribute. This can lead to an
out-of-attribute read and allow a malformed nlattr (e.g., length 0) to
be viewed as a 4 byte integer.
This patch adds an additional check when the nlattr is getting counted.
This makes sure the latter nla_get_u32 can access the attributes with
the correct length.
Fixes: 1ed4d92458 ("bpf: INET_DIAG support in bpf_sk_storage")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230725023330.422856-1-linma@zju.edu.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61eab651f6 ]
The cited commit sets ft prio to fs_base_prio. But if
ignore_flow_level it not supported, ft prio must be set based on
tc filter prio. Otherwise, all the ft prio are the same on the same
chain. It is invalid if ignore_flow_level is not supported.
Fix it by setting ft prio based on tc filter prio and setting
fs_base_prio to 0 for fdb.
Fixes: 8e80e56480 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e4cf1dd2c ]
There are DEK objects cached in DEK pool after kTLS is used, and they
are freed only in mlx5e_ktls_cleanup().
mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to
free mdev resources, including protection domain (PD). However, PD is
still referenced by the cached DEK objects in this case, because
profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called
after mlx5e_suspend() during devlink reload. So the following FW
syndrome is generated:
mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed,
status bad resource state(0x9), syndrome (0xef0c8a), err(-22)
To avoid this syndrome, move DEK pool destruction to
mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And
move pool creation to mlx5e_ktls_init_tx() for symmetry.
Fixes: f741db1a51 ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39646d9bcd ]
When the regular rq is reactivated after the XSK socket is closed
it could be reading stale cqes which eventually corrupts the rq.
This leads to no more traffic being received on the regular rq and a
crash on the next close or deactivation of the rq.
Kal Cuttler Conely reported this issue as a crash on the release
path when the xdpsock sample program is stopped (killed) and restarted
in sequence while traffic is running.
This patch flushes all cqes when during the rq flush. The cqe flushing
is done in the reset state of the rq. mlx5e_rq_to_ready code is moved
into the flush function to allow for this.
Fixes: 082a9edf12 ("net/mlx5e: xsk: Flush RQ on XSK activation to save memory")
Reported-by: Kal Cutter Conley <kal.conley@dectris.com>
Closes: https://lore.kernel.org/xdp-newbies/CAHApi-nUAs4TeFWUDV915CZJo07XVg2Vp63-no7UDfj6wur9nQ@mail.gmail.com
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93a331939d ]
The cited commit holds encap tbl lock unconditionally when setting
up dests. But it may cause the following deadlock:
PID: 1063722 TASK: ffffa062ca5d0000 CPU: 13 COMMAND: "handler8"
#0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
#1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
#2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
#3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
#4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
#5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
#6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
#7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
#8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
#9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
#10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
#11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
#12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
#13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
#14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
#15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
#16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0
[1031218.028143] wait_for_completion+0x24/0x30
[1031218.028589] mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
[1031218.029256] mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
[1031218.029885] process_one_work+0x24e/0x510
Actually no need to hold encap tbl lock if there is no encap action.
Fix it by checking if encap action exists or not before holding
encap tbl lock.
Fixes: 37c3b9fa7c ("net/mlx5e: Prevent encap offload when neigh update is running")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0507f2c8be ]
Currently, whenever a user is setting migratable port fn attr, the
driver is always turn migratable capability on.
Fix it by honor the user input
Fixes: e5b9642a33 ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c6cf0b6097 ]
The memory pointed to by the priv->rx_res pointer is not freed in the error
path of mlx5e_init_rep_rx, which can lead to a memory leak. Fix by freeing
the memory in the error path, thereby making the error path identical to
mlx5e_cleanup_rep_rx().
Fixes: af8bbf7300 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5dd77585dd ]
when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory
pointed by 'in' is not released, which will cause memory leak. Move memory
release after mlx5_cmd_exec.
Fixes: 1d9186476e ("net/mlx5: DR, Add direct rule command utilities")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aeb660171b ]
In function macsec_fs_tx_create_crypto_table_groups(), when the ft->g
memory is successfully allocated but the 'in' memory fails to be
allocated, the memory pointed to by ft->g is released once. And in function
macsec_fs_tx_create(), macsec_fs_tx_destroy() is called to release the
memory pointed to by ft->g again. This will cause double free problem.
Fixes: e467b283ff ("net/mlx5e: Add MACsec TX steering rules")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94c43de735 ]
When handling deduplicated compressed data, there can be multiple
decompressed extents pointing to the same compressed data in one shot.
In such cases, the bvecs which belong to the longest extent will be
selected as the primary bvecs for real decompressors to decode and the
other duplicated bvecs will be directly copied from the primary bvecs.
Previously, only relative offsets of the longest extent were checked to
decompress the primary bvecs. On rare occasions, it can be incorrect
if there are several extents with the same start relative offset.
As a result, some short bvecs could be selected for decompression and
then cause data corruption.
For example, as Shijie Sun reported off-list, considering the following
extents of a file:
117: 903345.. 915250 | 11905 : 385024.. 389120 | 4096
...
119: 919729.. 930323 | 10594 : 385024.. 389120 | 4096
...
124: 968881.. 980786 | 11905 : 385024.. 389120 | 4096
The start relative offset is the same: 2225, but extent 119 (919729..
930323) is shorter than the others.
Let's restrict the bvec length in addition to the start offset if bvecs
are not full.
Reported-by: Shijie Sun <sunshijie@xiaomi.com>
Fixes: 5c2a64252c ("erofs: introduce partial-referenced pclusters")
Tested-by Shijie Sun <sunshijie@xiaomi.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230719065459.60083-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c02cc576e ]
Commit 9fb6c9b3fe ("s390/sthyi: add cache to store hypervisor info")
added cache handling for store hypervisor info. This also changed the
possible return code for sthyi_fill().
Instead of only returning a condition code like the sthyi instruction would
do, it can now also return a negative error value (-ENOMEM). handle_styhi()
was not changed accordingly. In case of an error, the negative error value
would incorrectly injected into the guest PSW.
Add proper error handling to prevent this, and update the comment which
describes the possible return values of sthyi_fill().
Fixes: 9fb6c9b3fe ("s390/sthyi: add cache to store hypervisor info")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20230727182939.2050744-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79e8328e5a ]
Compiling big-endian targets with Clang produces the diagnostic:
fs/namei.c:2173:13: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
} while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants)));
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
fs/namei.c:2173:13: note: cast one or both operands to int to silence this warning
It appears that when has_zero was introduced, two definitions were
produced with different signatures (in particular different return
types).
Looking at the usage in hash_name() in fs/namei.c, I suspect that
has_zero() is meant to be invoked twice per while loop iteration; using
logical-or would not update `bdata` when `a` did not have zeros. So I
think it's preferred to always return an unsigned long rather than a
bool than update the while loop in hash_name() to use a logical-or
rather than bitwise-or.
[ Also changed powerpc version to do the same - Linus ]
Link: https://github.com/ClangBuiltLinux/linux/issues/1832
Link: https://lore.kernel.org/lkml/20230801-bitwise-v1-1-799bec468dc4@google.com/
Fixes: 36126f8f2e ("word-at-a-time: make the interfaces truly generic")
Debugged-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1ff11d7ad ]
SCMI transport based on SMC can optionally use an additional IRQ to
signal message completion. The associated interrupt handler is currently
allocated using devres but on shutdown the core SCMI stack will call
.chan_free() well before any managed cleanup is invoked by devres.
As a consequence, the arrival of a late reply to an in-flight pending
transaction could still trigger the interrupt handler well after the
SCMI core has cleaned up the channels, with unpleasant results.
Inhibit further message processing on the IRQ path by explicitly freeing
the IRQ inside .chan_free() callback itself.
Fixes: dd820ee21d ("firmware: arm_scmi: Augment SMC/HVC to allow optional interrupt")
Reported-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20230719173533.2739319-1-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53cab4d871 ]
The blk-ctrl device is deliberately placed outside of the GPC power
domain as it needs to control the power sequencing of the blk-ctrl
domains together with the GPC domains.
Clock runtime PM works by operating on the clock parent device, which
doesn't translate into the neccessary GPC power domain action if the
clk parent is not part of the GPC power domain. Use the bus_power_device
as the parent for the clock to trigger the proper GPC domain actions on
clock runtime power management.
Fixes: 2cbee26e5d ("soc: imx: imx8mp-blk-ctrl: expose high performance PLL clock")
Reported-by: Yannic Moog <Y.Moog@phytec.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Yannic Moog <y.moog@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c486762fb1 ]
The SK-IMX53 board, bearing i.MX536A CPU, is not stable when running at
1.2 GHz (default iMX53 maximum). The SoC is only rated up to 800 MHz.
Disable 1.2 GHz and 1 GHz frequencies.
Fixes: 0b8576d844 ("ARM: dts: imx: Add support for SK-iMX53 board")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2356d198d2 ]
When building with Clang, and when KASAN and GCOV_PROFILE_ALL are both
enabled, the test fails to build [1]:
>> lib/test_bitmap.c:920:2: error: call to '__compiletime_assert_239' declared with 'error' attribute: BUILD_BUG_ON failed: !__builtin_constant_p(res)
BUILD_BUG_ON(!__builtin_constant_p(res));
^
include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
^
include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^
include/linux/compiler_types.h:352:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:340:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:333:4: note: expanded from macro '__compiletime_assert'
prefix ## suffix(); \
^
<scratch space>:185:1: note: expanded from here
__compiletime_assert_239
Originally it was attributed to s390, which now looks seemingly wrong. The
issue is not related to bitmap code itself, but it breaks build for a given
configuration.
Disabling the const_eval test under that config may potentially hide other
bugs. Instead, workaround it by disabling GCOV for the test_bitmap unless
the compiler will get fixed.
[1] https://github.com/ClangBuiltLinux/linux/issues/1874
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307171254.yFcH97ej-lkp@intel.com/
Fixes: dc34d50366 ("lib: test_bitmap: add compile-time optimization/evaluations assertions")
Co-developed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b27bfc5103 ]
Set VPU G2 clock to 300MHz like described in documentation.
This fixes pixels error occurring with large resolution ( >= 2560x1600)
HEVC test stream when using the postprocessor to produce NV12.
Fixes: 4ac7e4a812 ("arm64: dts: imx8mq: Enable both G1 and G2 VPU's with vpu-blk-ctrl")
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 253be5b53c ]
For SOMs with an onboard PHY, the RESET_N pull-up resistor is
currently deactivated in the pinmux configuration. When the pinmux
code selects the GPIO function for this pin, with a default direction
of input, this prevents the RESET_N pin from being taken to the proper
3.3V level (deasserted), and this results in the PHY being not
detected since it is held in reset.
Taken from RESET_N pin description in ADIN13000 datasheet:
This pin requires a 1K pull-up resistor to AVDD_3P3.
Activate the pull-up resistor to fix the issue.
Fixes: ade0176dd8 ("arm64: dts: imx8mn-var-som: Add Variscite VAR-SOM-MX8MN System on Module")
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7a0b57524 ]
The GW7904 does not connect the VDD_MIPI power rails thus MIPI is
disabled. However we must also disable disp_blk_ctrl as it uses the
pgc_mipi power domain and without it being disabled imx8m-blk-ctrl will
fail to probe:
imx8m-blk-ctrl 32e28000.blk-ctrl: error -ETIMEDOUT: failed to attach
power domain "mipi-dsi"
imx8m-blk-ctrl: probe of 32e28000.blk-ctrl failed with error -110
Fixes: b999bdaf05 ("arm64: dts: imx: Add i.mx8mm Gateworks gw7904 dts support")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e7d3c5e13 ]
The GW7903 does not connect the VDD_MIPI power rails thus MIPI is
disabled. However we must also disable disp_blk_ctrl as it uses the
pgc_mipi power domain and without it being disabled imx8m-blk-ctrl will
fail to probe:
imx8m-blk-ctrl 32e28000.blk-ctrl: error -ETIMEDOUT: failed to attach power domain "mipi-dsi"
imx8m-blk-ctrl: probe of 32e28000.blk-ctrl failed with error -110
Fixes: a72ba91e5b ("arm64: dts: imx: Add i.mx8mm Gateworks gw7903 dts support")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 309a15cb16 upstream
To work around MMU-700 erratum 2812531 we need to ensure that certain
sequences of commands cannot be issued without an intervening sync. In
practice this falls out of our current command-batching machinery
anyway - each batch only contains a single type of invalidation command,
and ends with a sync. The only exception is when a batch is sufficiently
large to need issuing across multiple command queue slots, wherein the
earlier slots will not contain a sync and thus may in theory interleave
with another batch being issued in parallel to create an affected
sequence across the slot boundary.
Since MMU-700 supports range invalidate commands and thus we will prefer
to use them (which also happens to avoid conditions for other errata),
I'm not entirely sure it's even possible for a single high-level
invalidate call to generate a batch of more than 63 commands, but for
the sake of robustness and documentation, wire up an option to enforce
that a sync is always inserted for every slot issued.
The other aspect is that the relative order of DVM commands cannot be
controlled, so DVM cannot be used. Again that is already the status quo,
but since we have at least defined ARM_SMMU_FEAT_BTM, we can explicitly
disable it for documentation purposes even if it's not wired up anywhere
yet.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/330221cdfd0003cd51b6c04e7ff3566741ad8374.1683731256.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 657b514695 upstream.
lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't
be called in the VMA-locked page fault path by ensuring that
vma->anon_vma is set.
However, this check happens before the VMA is locked, which means a
concurrent move_vma() can concurrently call unlink_anon_vmas(), which
disassociates the VMA's anon_vma.
This means we can get UAF in the following scenario:
THREAD 1 THREAD 2
======== ========
<page fault>
lock_vma_under_rcu()
rcu_read_lock()
mas_walk()
check vma->anon_vma
mremap() syscall
move_vma()
vma_start_write()
unlink_anon_vmas()
<syscall end>
handle_mm_fault()
__handle_mm_fault()
handle_pte_fault()
do_pte_missing()
do_anonymous_page()
anon_vma_prepare()
__anon_vma_prepare()
find_mergeable_anon_vma()
mas_walk() [looks up VMA X]
munmap() syscall (deletes VMA X)
reusable_anon_vma() [called on freed VMA X]
This is a security bug if you can hit it, although an attacker would
have to win two races at once where the first race window is only a few
instructions wide.
This patch is based on some previous discussion with Linus Torvalds on
the security list.
Cc: stable@vger.kernel.org
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-tree-only change.
Due to the way the GDS and SRSO patches flowed into the stable tree, it
was a 50% chance that the merge of the which value GDS and SRSO should
be. Of course, I lost that bet, and chose the opposite of what Linus
chose in commit 64094e7e31 ("Merge tag 'gds-for-linus-2023-08-01' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
Fix this up by switching the values to match what is now in Linus's tree
as that is the correct value to mirror.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 534fc31d09 upstream.
It is possible that a guest can send a packet that contains a head + 18
slots and yet has a len <= XEN_NETBACK_TX_COPY_LEN. This causes nr_slots
to underflow in xenvif_get_requests() which then causes the subsequent
loop's termination condition to be wrong, causing a buffer overrun of
queue->tx_map_ops.
Rework the code to account for the extra frag_overflow slots.
This is CVE-2023-34319 / XSA-432.
Fixes: ad7f402ae4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a15d83488 upstream.
The SBPB bit in MSR_IA32_PRED_CMD is supported only after a microcode
patch has been applied so set X86_FEATURE_SBPB only then. Otherwise,
guests would attempt to set that bit and #GP on the MSR write.
While at it, make SMT detection more robust as some guests - depending
on how and what CPUID leafs their report - lead to cpu_smt_control
getting set to CPU_SMT_NOT_SUPPORTED but SRSO_NO should be set for any
guest incarnation where one simply cannot do SMT, for whatever reason.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit: d893832d0e
Add the option to flush IBPB only on VMEXIT in order to protect from
malicious guests but one otherwise trusts the software that runs on the
hypervisor.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit: 233d6f68b9
Add the option to mitigate using IBPB on a kernel entry. Pull in the
Retbleed alternative so that the IBPB call from there can be used. Also,
if Retbleed mitigation is done using IBPB, the same mitigation can and
must be used here.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit: 1b5277c0ea
Add support for the CPUID flag which denotes that the CPU is not
affected by SRSO.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit: 79113e4060
Add support for the synthetic CPUID flag which "if this bit is 1,
it indicates that MSR 49h (PRED_CMD) bit 0 (IBPB) flushes all branch
type predictions from the CPU branch predictor."
This flag is there so that this capability in guests can be detected
easily (otherwise one would have to track microcode revisions which is
impossible for guests).
It is also needed only for Zen3 and -4. The other two (Zen1 and -2)
always flush branch type predictions by default.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit: fb3bd914b3
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.
The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.
To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit: 0e52740ffd
There was never a doubt in my mind that they would not fit into a single
u32 eventually.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a9567ac5e upstream.
Moving mem_encrypt_init() broke the AMD_MEM_ENCRYPT=n because the
declaration of that function was under #ifdef CONFIG_AMD_MEM_ENCRYPT and
the obvious placement for the inline stub was the #else path.
This is a leftover of commit 20f07a044a ("x86/sev: Move common memory
encryption code to mem_encrypt.c") which made mem_encrypt_init() depend on
X86_MEM_ENCRYPT without moving the prototype. That did not fail back then
because there was no stub inline as the core init code had a weak function.
Move both the declaration and the stub out of the CONFIG_AMD_MEM_ENCRYPT
section and guard it with CONFIG_X86_MEM_ENCRYPT.
Fixes: 439e17576e ("init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/oe-kbuild-all/202306170247.eQtCJPE8-lkp@intel.com/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81ac7e5d74 upstream
Gather Data Sampling (GDS) is a transient execution attack using
gather instructions from the AVX2 and AVX512 extensions. This attack
allows malicious code to infer data that was previously stored in
vector registers. Systems that are not vulnerable to GDS will set the
GDS_NO bit of the IA32_ARCH_CAPABILITIES MSR. This is useful for VM
guests that may think they are on vulnerable systems that are, in
fact, not affected. Guests that are running on affected hosts where
the mitigation is enabled are protected as if they were running
on an unaffected system.
On all hosts that are not affected or that are mitigated, set the
GDS_NO bit.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53cf5797f1 upstream
Gather Data Sampling (GDS) is mitigated in microcode. However, on
systems that haven't received the updated microcode, disabling AVX
can act as a mitigation. Add a Kconfig option that uses the microcode
mitigation if available and disables AVX otherwise. Setting this
option has no effect on systems not affected by GDS. This is the
equivalent of setting gather_data_sampling=force.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 553a5c03e9 upstream
The Gather Data Sampling (GDS) vulnerability allows malicious software
to infer stale data previously stored in vector registers. This may
include sensitive data such as cryptographic keys. GDS is mitigated in
microcode, and systems with up-to-date microcode are protected by
default. However, any affected system that is running with older
microcode will still be vulnerable to GDS attacks.
Since the gather instructions used by the attacker are part of the
AVX2 and AVX512 extensions, disabling these extensions prevents gather
instructions from being executed, thereby mitigating the system from
GDS. Disabling AVX2 is sufficient, but we don't have the granularity
to do this. The XCR0[2] disables AVX, with no option to just disable
AVX2.
Add a kernel parameter gather_data_sampling=force that will enable the
microcode mitigation if available, otherwise it will disable AVX on
affected systems.
This option will be ignored if cmdline mitigations=off.
This is a *big* hammer. It is known to break buggy userspace that
uses incomplete, buggy AVX enumeration. Unfortunately, such userspace
does exist in the wild:
https://www.mail-archive.com/bug-coreutils@gnu.org/msg33046.html
[ dhansen: add some more ominous warnings about disabling AVX ]
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8974eb5882 upstream
Gather Data Sampling (GDS) is a hardware vulnerability which allows
unprivileged speculative access to data which was previously stored in
vector registers.
Intel processors that support AVX2 and AVX512 have gather instructions
that fetch non-contiguous data elements from memory. On vulnerable
hardware, when a gather instruction is transiently executed and
encounters a fault, stale data from architectural or internal vector
registers may get transiently stored to the destination vector
register allowing an attacker to infer the stale data using typical
side channel techniques like cache timing attacks.
This mitigation is different from many earlier ones for two reasons.
First, it is enabled by default and a bit must be set to *DISABLE* it.
This is the opposite of normal mitigation polarity. This means GDS can
be mitigated simply by updating microcode and leaving the new control
bit alone.
Second, GDS has a "lock" bit. This lock bit is there because the
mitigation affects the hardware security features KeyLocker and SGX.
It needs to be enabled and *STAY* enabled for these features to be
mitigated against GDS.
The mitigation is enabled in the microcode by default. Disable it by
setting gather_data_sampling=off or by disabling all mitigations with
mitigations=off. The mitigation status can be checked by reading:
/sys/devices/system/cpu/vulnerabilities/gather_data_sampling
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b81fac906a upstream
Initializing the FPU during the early boot process is a pointless
exercise. Early boot is convoluted and fragile enough.
Nothing requires that the FPU is set up early. It has to be initialized
before fork_init() because the task_struct size depends on the FPU register
buffer size.
Move the initialization to arch_cpu_finalize_init() which is the perfect
place to do so.
No functional change.
This allows to remove quite some of the custom early command line parsing,
but that's subject to the next installment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.902376621@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9df9d2f047 upstream
X86 is reworking the boot process so that initializations which are not
required during early boot can be moved into the late boot process and out
of the fragile and restricted initial boot phase.
arch_cpu_finalize_init() is the obvious place to do such initializations,
but arch_cpu_finalize_init() is invoked too late in start_kernel() e.g. for
initializing the FPU completely. fork_init() requires that the FPU is
initialized as the size of task_struct on X86 depends on the size of the
required FPU register buffer.
Fortunately none of the init calls between calibrate_delay() and
arch_cpu_finalize_init() is relevant for the functionality of
arch_cpu_finalize_init().
Invoke it right after calibrate_delay() where everything which is relevant
for arch_cpu_finalize_init() has been set up already.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://lore.kernel.org/r/20230613224545.612182854@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7725acaa4f upstream
check_bugs() has become a dumping ground for all sorts of activities to
finalize the CPU initialization before running the rest of the init code.
Most are empty, a few do actual bug checks, some do alternative patching
and some cobble a CPU advertisement string together....
Aside of that the current implementation requires duplicated function
declaration and mostly empty header files for them.
Provide a new function arch_cpu_finalize_init(). Provide a generic
declaration if CONFIG_ARCH_HAS_CPU_FINALIZE_INIT is selected and a stub
inline otherwise.
This requires a temporary #ifdef in start_kernel() which will be removed
along with check_bugs() once the architectures are converted over.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224544.957805717@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c21e066f9 upstream.
mbind() calls down into vma_replace_policy() without taking the per-VMA
locks, replaces the VMA's vma->vm_policy pointer, and frees the old
policy. That's bad; a concurrent page fault might still be using the
old policy (in vma_alloc_folio()), resulting in use-after-free.
Normally this will manifest as a use-after-free read first, but it can
result in memory corruption, including because vma_alloc_folio() can
call mpol_cond_put() on the freed policy, which conditionally changes
the policy's refcount member.
This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA
systems as long as the kernel was built with CONFIG_NUMA.
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1f02b9575 upstream.
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.
A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().
userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):
userfaultfd_register
userfaultfd_set_vm_flags
vm_flags_reset
vma_start_write
down_write(&vma->vm_lock->lock)
vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
up_write(&vma->vm_lock->lock)
vm_flags_init
[sets VM_UFFD_* in __vm_flags]
vma->vm_userfaultfd_ctx.ctx = ctx
mmap_write_unlock
vma_end_write_all
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked. That's bad, we definitely need a
store-release for the unlock operation.
The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock. There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:
lock_vma_under_rcu
vma_start_read
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
down_read_trylock(&vma->vm_lock->lock)
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
userfaultfd_armed
checks vma->vm_flags & __VM_UFFD_FLAGS
Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked. To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.
Some of the comment wording is based on suggestions by Suren.
BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything. I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write(). If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.
Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8ab9f7b64 upstream.
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA. This currently happens while `dst` is not write-locked.
This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock. This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.
This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.
This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.
I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.
For a kernel-assisted reproducer, see the notes section of the patch mail.
Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 588159009d upstream.
An attempt to acquire exclusive lock can race with the current lock
owner closing the image:
1. lock is held by client123, rbd_lock() returns -EBUSY
2. get_lock_owner_info() returns client123 instance details
3. client123 closes the image, lock is released
4. find_watcher() returns 0 as there is no matching watcher anymore
5. client123 instance gets erroneously blocklisted
Particularly impacted is mirror snapshot scheduler in snapshot-based
mirroring since it happens to open and close images a lot (images are
opened only for as long as it takes to take the next mirror snapshot,
the same client instance is used for all images).
To reduce the potential for erroneous blocklisting, retrieve the lock
owner again after find_watcher() returns 0. If it's still there, make
sure it matches the previously detected lock owner.
Cc: stable@vger.kernel.org # f38cb9d9c2: rbd: make get_lock_owner_info() return a single locker or NULL
Cc: stable@vger.kernel.org # 8ff2c64c97: rbd: harden get_lock_owner_info() a bit
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ff2c64c97 upstream.
- we want the exclusive lock type, so test for it directly
- use sscanf() to actually parse the lock cookie and avoid admitting
invalid handles
- bail if locker has a blank address
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f38cb9d9c2 upstream.
Make the "num_lockers can be only 0 or 1" assumption explicit and
simplify the API by getting rid of output parameters in preparation
for calling get_lock_owner_info() twice before blocklisting.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e4ab7b4c8 upstream.
When using the cleaner policy to decommission the cache, there is
never any writeback started from the cache as it is constantly delayed
due to normal I/O keeping the device busy. Meaning @idle=false was
always being passed to clean_target_met()
Fix this by adding a specific 'cleaner' flag that is set when the
cleaner policy is configured. This flag serves to always allow the
cleaner's writeback work to be queued until the cache is
decommissioned (even if the cache isn't idle).
Reported-by: David Jeffery <djeffery@redhat.com>
Fixes: b29d4986d0 ("dm cache: significant rework to leverage dm-bio-prison-v2")
Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac4436a5b2 upstream.
Since commit 3d439b1a2a ("thermal/core: Alloc-copy-free the thermal
zone parameters structure"), thermal_zone_device_register() allocates
a copy of the tzp argument and frees it when unregistering, so
thermal_of_zone_register() now ends up leaking its original tzp and
double-freeing the tzp copy. Fix this by locating tzp on stack instead.
Fixes: 3d439b1a2a ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: 6.4+ <stable@vger.kernel.org> # 6.4+: 8bcbb18c61d6: thermal: core: constify params in thermal_zone_device_register
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8527beb120 upstream.
The decision whether to enable a wake irq during suspend can not be done
based on the runtime PM state directly as a driver may use wake irqs
without implementing runtime PM. Such drivers specifically leave the
state set to the default 'suspended' and the wake irq is thus never
enabled at suspend.
Add a new wake irq flag to track whether a dedicated wake irq has been
enabled at runtime suspend and therefore must not be enabled at system
suspend.
Note that pm_runtime_enabled() can not be used as runtime PM is always
disabled during late suspend.
Fixes: 69728051f5 ("PM / wakeirq: Fix unbalanced IRQ enable for wakeirq")
Cc: 4.16+ <stable@vger.kernel.org> # 4.16+
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05d881b85b upstream.
As part of fixing the allocation of the buffer for SVE state when changing
SME vector length we introduced an immediate reallocation of the SVE state,
this is also done when changing the SVE vector length for consistency.
Unfortunately this reallocation is done prior to writing the new vector
length to the task struct, meaning the allocation is done with the old
vector length and can lead to memory corruption due to an undersized buffer
being used.
Move the update of the vector length before the allocation to ensure that
the new vector length is taken into account.
For some reason this isn't triggering any problems when running tests on
the arm64 fixes branch (even after repeated tries) but is triggering
issues very often after merge into mainline.
Fixes: d4d5be94a8 ("arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230726-arm64-fix-sme-fix-v1-1-7752ec58af27@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 856d8e3c63 upstream.
The DASD driver has certain types of requests that might be rejected by
the storage server or z/VM because they are not supported. Since the
missing support of the command is not a real issue there is no user
visible kernel error message for this.
For copy pair setups there is a specific error that IO is not allowed on
secondary devices. This error case is explicitly handled and an error
message is printed.
The code checking for the error did use a bitwise 'and' that is used to
check for specific bits. But in this case the whole sense byte has to
match.
This leads to the problem that the copy pair related error message is
erroneously printed for other error cases that are usually not reported.
This might heavily confuse users and lead to follow on actions that might
disrupt application processing.
Fix by checking the sense byte for the exact value and not single bits.
Cc: stable@vger.kernel.org # 6.1+
Fixes: 1fca631a11 ("s390/dasd: suppress generic error messages for PPRC secondary devices")
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Link: https://lore.kernel.org/r/20230721193647.3889634-5-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05f1d8ed03 upstream.
Quiesce and resume are functions that tell the DASD driver to stop/resume
issuing I/Os to a specific DASD.
On resume dasd_schedule_block_bh() is called to kick handling of IO
requests again. This does unfortunately not cover internal requests which
are used for path verification for example.
This could lead to a hanging device when a path event or anything else
that triggers internal requests occurs on a quiesced device.
Fix by also calling dasd_schedule_device_bh() which triggers handling of
internal requests on resume.
Fixes: 8e09f21574 ("[S390] dasd: add hyper PAV support to DASD device driver, part 1")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Link: https://lore.kernel.org/r/20230721193647.3889634-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 350cd9b959 upstream.
There was an invalidate_inode_pages2 added to readonly mmap path
that is unnecessary since that path is only entered when writeback
cache is disabled on mount.
Cc: stable@vger.kernel.org
Fixes: 1543b4c507 ("fs/9p: remove writeback fid and fix per-file modes")
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09430aba3a upstream.
There were two flags (s_flags and s_cache) which had incorrect signed
type in the parameters of the file cache mode helper function.
Cc: stable@vger.kernel.org
Fixes: 1543b4c507 ("fs/9p: remove writeback fid and fix per-file modes")
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eee4a119e9 upstream.
retval from filemap_fdatawrite was immediately overwritten by the
following p9_fid_put: preserve any error in fdatawrite if there
was any first.
This fixes the following scan-build warning:
fs/9p/vfs_dir.c:220:4: warning: Value stored to 'retval' is never read [deadcode.DeadStores]
retval = filemap_fdatawrite(inode->i_mapping);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 89c58cb395 ("fs/9p: fix error reporting in v9fs_dir_release")
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de0e30bee8 upstream.
Currently nettrace does not work on LoongArch due to missing
bpf_probe_read{,str}() support, with the error message:
ERROR: failed to load kprobe-based eBPF
ERROR: failed to load kprobe-based bpf
According to commit 0ebeea8ca8 ("bpf: Restrict bpf_probe_read{,
str}() only to archs where they work"), we only need to select
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE to add said support,
because LoongArch does have non-overlapping address ranges for kernel
and userspace.
Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c40d6b3249 upstream.
The soundwire subsystem uses two completion structures that allow
drivers to wait for soundwire device to become enumerated on the bus and
initialised by their drivers, respectively.
The code implementing the signalling is currently broken as it does not
signal all current and future waiters and also uses the wrong
reinitialisation function, which can potentially lead to memory
corruption if there are still waiters on the queue.
Not signalling future waiters specifically breaks sound card probe
deferrals as codec drivers can not tell that the soundwire device is
already attached when being reprobed. Some codec runtime PM
implementations suffer from similar problems as waiting for enumeration
during resume can also timeout despite the device already having been
enumerated.
Fixes: fb9469e54f ("soundwire: bus: fix race condition with enumeration_complete signaling")
Fixes: a90def0681 ("soundwire: bus: fix race condition with initialization_complete signaling")
Cc: stable@vger.kernel.org # 5.7
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Rander Wang <rander.wang@linux.intel.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20230705123018.30903-2-johan+linaro@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfd739f182 upstream.
The qca8k switch doesn't support using 0 as VID and require a default
VID to be always set. MDB add/del function doesn't currently handle
this and are currently setting the default VID.
Fix this by correctly handling this corner case and internally use the
default VID for VID 0 case.
Fixes: ba8f870dfa ("net: dsa: qca8k: add support for mdb_add/del")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae70dcb9d9 upstream.
On deleting an MDB entry for a port, fdb_search_and_del is used.
An FDB entry can't be modified so it needs to be deleted and readded
again with the new portmap (and the port deleted as requested)
We use the SEARCH operator to search the entry to edit by vid and mac
address and then we check the aging if we actually found an entry.
Currently the code suffer from a bug where the searched fdb entry is
never read again with the found values (if found) resulting in the code
always returning -EINVAL as aging was always 0.
Fix this by correctly read the fdb entry after it was searched.
Fixes: ba8f870dfa ("net: dsa: qca8k: add support for mdb_add/del")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80248d4160 upstream.
On inserting a mdb entry, fdb_search_and_insert is used to add a port to
the qca8k target entry in the FDB db.
A FDB entry can't be modified so it needs to be removed and insert again
with the new values.
To detect if an entry already exist, the SEARCH operation is used and we
check the aging of the entry. If the entry is not 0, the entry exist and
we proceed to delete it.
Current code have 2 main problem:
- The condition to check if the FDB entry exist is wrong and should be
the opposite.
- When a FDB entry doesn't exist, aging was never actually set to the
STATIC value resulting in allocating an invalid entry.
Fix both problem by adding aging support to the function, calling the
function with STATIC as aging by default and finally by correct the
condition to check if the entry actually exist.
Fixes: ba8f870dfa ("net: dsa: qca8k: add support for mdb_add/del")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c39dd025d upstream.
The qca8xxx switch supports 2 way to write reg values, a slow way using
mdio and a fast way by sending specially crafted mgmt packet to
read/write reg.
The fast way can support up to 32 bytes of data as eth packet are used
to send/receive.
This correctly works for almost the entire regmap of the switch but with
the use of some kernel selftests for dsa drivers it was found a funny
and interesting hw defect/limitation.
For some specific reg, bulk write won't work and will result in writing
only part of the requested regs resulting in half data written. This was
especially hard to track and discover due to the total strangeness of
the problem and also by the specific regs where this occurs.
This occurs in the specific regs of the ATU table, where multiple entry
needs to be written to compose the entire entry.
It was discovered that with a bulk write of 12 bytes on
QCA8K_REG_ATU_DATA0 only QCA8K_REG_ATU_DATA0 and QCA8K_REG_ATU_DATA2
were written, but QCA8K_REG_ATU_DATA1 was always zero.
Tcpdump was used to make sure the specially crafted packet was correct
and this was confirmed.
The problem was hard to track as the lack of QCA8K_REG_ATU_DATA1
resulted in an entry somehow possible as the first bytes of the mac
address are set in QCA8K_REG_ATU_DATA0 and the entry type is set in
QCA8K_REG_ATU_DATA2.
Funlly enough writing QCA8K_REG_ATU_DATA1 results in the same problem
with QCA8K_REG_ATU_DATA2 empty and QCA8K_REG_ATU_DATA1 and
QCA8K_REG_ATU_FUNC correctly written.
A speculation on the problem might be that there are some kind of
indirection internally when accessing these regs and they can't be
accessed all together, due to the fact that it's really a table mapped
somewhere in the switch SRAM.
Even more funny is the fact that every other reg was tested with all
kind of combination and they are not affected by this problem. Read
operation was also tested and always worked so it's not affected by this
problem.
The problem is not present if we limit writing a single reg at times.
To handle this hardware defect, enable use_single_write so that bulk
api can correctly split the write in multiple different operation
effectively reverting to a non-bulk write.
Cc: Mark Brown <broonie@kernel.org>
Fixes: c766e077d9 ("net: dsa: qca8k: convert to regmap read/write API")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e11ec2b868 upstream.
Last year, the code that manages GSI channel transactions switched
from using spinlock-protected linked lists to using indexes into the
ring buffer used for a channel. Recently, Google reported seeing
transaction reference count underflows occasionally during shutdown.
Doug Anderson found a way to reproduce the issue reliably, and
bisected the issue to the commit that eliminated the linked lists
and the lock. The root cause was ultimately determined to be
related to unused transactions being committed as part of the modem
shutdown cleanup activity. Unused transactions are not normally
expected (except in error cases).
The modem uses some ranges of IPA-resident memory, and whenever it
shuts down we zero those ranges. In ipa_filter_reset_table() a
transaction is allocated to zero modem filter table entries. If
hashing is not supported, hashed table memory should not be zeroed.
But currently nothing prevents that, and the result is an unused
transaction. Something similar occurs when we zero routing table
entries for the modem.
By preventing any attempt to clear hashed tables when hashing is not
supported, the reference count underflow is avoided in this case.
Note that there likely remains an issue with properly freeing unused
transactions (if they occur due to errors). This patch addresses
only the underflows that Google originally reported.
Cc: <stable@vger.kernel.org> # 6.1.x
Fixes: d338ae28d8 ("net: ipa: kill all other transaction lists")
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230724224055.1688854-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c04e989484 upstream.
When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list. Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first. However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first. As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.
To partially solve this problem, make the number of entries that the VM
will attempt to free at each iteration tunable. The default is still
10, but it can be overridden via a module parameter.
This is Cc: stable because (when combined with appropriate userspace
changes) it fixes a severe performance and stability problem for Qubes
OS users.
Cc: stable@vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230726165354.1252-1-demi@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f7853c3424 ]
Henry reported that rt_mutex_adjust_prio_check() has an ordering
problem and puts the lie to the comment in [7]. Sharing the sort key
between lock->waiters and owner->pi_waiters *does* create problems,
since unlike what the comment claims, holding [L] is insufficient.
Notably, consider:
A
/ \
M1 M2
| |
B C
That is, task A owns both M1 and M2, B and C block on them. In this
case a concurrent chain walk (B & C) will modify their resp. sort keys
in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L]
is meaningless, they're different Ls.
This then gives rise to a race condition between [7] and [11], where
the requeue of pi_waiters will observe an inconsistent tree order.
B C
(holds M1->wait_lock, (holds M2->wait_lock,
holds B->pi_lock) holds A->pi_lock)
[7]
waiter_update_prio();
...
[8]
raw_spin_unlock(B->pi_lock);
...
[10]
raw_spin_lock(A->pi_lock);
[11]
rt_mutex_enqueue_pi();
// observes inconsistent A->pi_waiters
// tree order
Fixing this means either extending the range of the owner lock from
[10-13] to [6-13], with the immediate problem that this means [6-8]
hold both blocked and owner locks, or duplicating the sort key.
Since the locking in chain walk is horrible enough without having to
consider pi_lock nesting rules, duplicate the sort key instead.
By giving each tree their own sort key, the above race becomes
harmless, if C sees B at the old location, then B will correct things
(if they need correcting) when it walks up the chain and reaches A.
Fixes: fb00aca474 ("rtmutex: Turn the plist into an rb-tree")
Reported-by: Henry Wu <triangletrap12@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Henry Wu <triangletrap12@gmail.com>
Link: https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 55ad248573 ]
The irq to block mapping is fixed, and interrupts from the first block
will always be routed to the first parent IRQ. But the parent interrupts
themselves can be routed to any available CPU.
This is used by the bootloader to map the first parent interrupt to the
boot CPU, regardless wether the boot CPU is the first one or the second
one.
When booting from the second CPU, the assumption that the first block's
IRQ is mapped to the first CPU breaks, and the system hangs because
interrupts do not get routed correctly.
Fix this by passing the appropriate bcm6434_l1_cpu to the interrupt
handler instead of the chip itself, so the handler always has the right
block.
Fixes: c7c42ec2ba ("irqchips/bmips: Add bcm6345-l1 interrupt controller")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230629072620.62527-1-jonas.gorski@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 513253f8c2 upstream.
recv_data either returns the number of received bytes, or a negative value
representing an error code. Adding the return value directly to the total
number of received bytes therefore looks a little weird, since it might add
a negative error code to a sum of bytes.
The following check for size < expected usually makes the function return
ETIME in that case, so it does not cause too many problems in practice. But
to make the code look cleaner and because the caller might still be
interested in the original error code, explicitly check for the presence of
an error code and pass that through.
Cc: stable@vger.kernel.org
Fixes: cb5354253a ("[PATCH] tpm: spacing cleanups 2")
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b57a4322b upstream.
Since commit 74d7970feb ("ksmbd: fix racy issue from using ->d_parent and
->d_name"), ksmbd can not lookup cross mount points. If last component is
a cross mount point during path lookup, check if it is crossed to follow it
down. And allow path lookup to cross a mount point when a crossmnt
parameter is set to 'yes' in smb.conf.
Cc: stable@vger.kernel.org
Fixes: 74d7970feb ("ksmbd: fix racy issue from using ->d_parent and ->d_name")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f75546f58a upstream.
If the client is calling TEST_STATEID, then it is because some event
occurred that requires it to check all the stateids for validity and
call FREE_STATEID on the ones that have been revoked. In this case,
either the stateid exists in the list of stateids associated with that
nfs4_client, in which case it should be tested, or it does not. There
are no additional conditions to be considered.
Reported-by: "Frank Ch. Eigler" <fche@redhat.com>
Fixes: 7df302f75e ("NFSD: TEST_STATEID should not return NFS4ERR_STALE_STATEID")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20ea1e7d13 upstream.
The pidfd_getfd() system call allows a caller with ptrace_may_access()
abilities on another process to steal a file descriptor from this
process. This system call is used by debuggers, container runtimes,
system call supervisors, networking proxies etc. So while it is a
special interest system call it is used in common tools.
That ability ends up breaking our long-time optimization in fdget_pos(),
which "knew" that if we had exclusive access to the file descriptor
nobody else could access it, and we didn't need the lock for the file
position.
That check for file_count(file) was always fairly subtle - it depended
on __fdget() not incrementing the file count for single-threaded
processes and thus included that as part of the rule - but it did mean
that we didn't need to take the lock in all those traditional unix
process contexts.
So it's sad to see this go, and I'd love to have some way to re-instate
the optimization. At the same time, the lock obviously isn't ever
contended in the case we optimized, so all we were optimizing away is
the atomics and the cacheline dirtying. Let's see if anybody even
notices that the optimization is gone.
Link: https://lore.kernel.org/linux-fsdevel/20230724-vfs-fdget_pos-v1-1-a4abfd7103f3@kernel.org/
Fixes: 8649c322f7 ("pid: Implement pidfd_getfd syscall")
Cc: stable@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ba2e83334 upstream.
AMD systems from Family 10h to 16h share MCA bank 4 across multiple CPUs.
Therefore, the threshold_bank structure for bank 4, and its threshold_block
structures, will be initialized once at boot time. And the kobject for the
shared bank will be added to each of the CPUs that share it. Furthermore,
the threshold_blocks for the shared bank will be added again to the bank's
kobject. These additions will increase the refcount for the bank's kobject.
For example, a shared bank with two blocks and shared across two CPUs will
be set up like this:
CPU0 init
bank create and add; bank refcount = 1; threshold_create_bank()
block 0 init and add; bank refcount = 2; allocate_threshold_blocks()
block 1 init and add; bank refcount = 3; allocate_threshold_blocks()
CPU1 init
bank add; bank refcount = 3; threshold_create_bank()
block 0 add; bank refcount = 4; __threshold_add_blocks()
block 1 add; bank refcount = 5; __threshold_add_blocks()
Currently in threshold_remove_bank(), if the bank is shared then
__threshold_remove_blocks() is called. Here the shared bank's kobject and
the bank's blocks' kobjects are deleted. This is done on the first call
even while the structures are still shared. Subsequent calls from other
CPUs that share the structures will attempt to delete the kobjects.
During kobject_del(), kobject->sd is removed. If the kobject is not part of
a kset with default_groups, then subsequent kobject_del() calls seem safe
even with kobject->sd == NULL.
Originally, the AMD MCA thresholding structures did not use default_groups.
And so the above behavior was not apparent.
However, a recent change implemented default_groups for the thresholding
structures. Therefore, kobject_del() will go down the sysfs_remove_groups()
code path. In this case, the first kobject_del() may succeed and remove
kobject->sd. But subsequent kobject_del() calls will give a WARNing in
kernfs_remove_by_name_ns() since kobject->sd == NULL.
Use kobject_put() on the shared bank's kobject when "removing" blocks. This
decrements the bank's refcount while keeping kobjects enabled until the
bank is no longer shared. At that point, kobject_put() will be called on
the blocks which drives their refcount to 0 and deletes them and also
decrementing the bank's refcount. And finally kobject_put() will be called
on the bank driving its refcount to 0 and deleting it.
The same example above:
CPU1 shutdown
bank is shared; bank refcount = 5; threshold_remove_bank()
block 0 put parent bank; bank refcount = 4; __threshold_remove_blocks()
block 1 put parent bank; bank refcount = 3; __threshold_remove_blocks()
CPU0 shutdown
bank is no longer shared; bank refcount = 3; threshold_remove_bank()
block 0 put block; bank refcount = 2; deallocate_threshold_blocks()
block 1 put block; bank refcount = 1; deallocate_threshold_blocks()
put bank; bank refcount = 0; threshold_remove_bank()
Fixes: 7f99cb5e60 ("x86/CPU/AMD: Use default_groups in kobj_type")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/alpine.LRH.2.02.2205301145540.25840@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b28ff3a7d7 upstream.
btrfs_attach_transaction_barrier() is used to get a handle pointing to the
current running transaction if the transaction has not started its commit
yet (its state is < TRANS_STATE_COMMIT_START). If the transaction commit
has started, then we wait for the transaction to commit and finish before
returning - however we completely ignore if the transaction was aborted
due to some error during its commit, we simply return ERR_PT(-ENOENT),
which makes the caller assume everything is fine and no errors happened.
This could make an fsync return success (0) to user space when in fact we
had a transaction abort and the target inode changes were therefore not
persisted.
Fix this by checking for the return value from btrfs_wait_for_commit(),
and if it returned an error, return it back to the caller.
Fixes: d4edf39bd5 ("Btrfs: fix uncompleted transaction")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf7ecbe987 upstream.
At btrfs_wait_for_commit() we wait for a transaction to finish and then
always return 0 (success) without checking if it was aborted, in which
case the transaction didn't happen due to some critical error. Fix this
by checking if the transaction was aborted.
Fixes: 462045928b ("Btrfs: add START_SYNC, WAIT_SYNC ioctls")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8dbfc14fc7 upstream.
When using the block group tree feature, this tree is a critical tree just
like the extent, csum and free space trees, and just like them it uses the
delayed refs block reserve.
So take into account the block group tree, and its current size, when
calculating the size for the global reserve.
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95ca6599a5 upstream.
The zoned mode need to reset a zone before using it. We rely on btrfs's
original discard functionality (discarding unused block group range) to do
the resetting.
While the commit 63a7cb1307 ("btrfs: auto enable discard=async when
possible") made the discard done in an async manner, a zoned reset do not
need to be async, as it is fast enough.
Even worth, delaying zone rests prevents using those zones again. So, let's
disable async discard on the zoned mode.
Fixes: 63a7cb1307 ("btrfs: auto enable discard=async when possible")
CC: stable@vger.kernel.org # 6.3+
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update message text ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e146503ac6 upstream.
Industrial processor i3255 supports temperatures -40 deg celcius
to 105 deg Celcius. The current implementation of k10temp_read_temp
rounds off any negative temperatures to '0'. To fix this,
the following changes have been made.
A flag 'disp_negative' is added to struct k10temp_data to support
AMD i3255 processors. Flag 'disp_negative' is set if 3255 processor
is found during k10temp_probe. Flag 'disp_negative' is used to
determine whether to round off negative temperatures to '0' in
k10temp_read_temp.
Signed-off-by: Baskaran Kannan <Baski.Kannan@amd.com>
Link: https://lore.kernel.org/r/20230727162159.1056136-1-Baski.Kannan@amd.com
Fixes: aef17ca127 ("hwmon: (k10temp) Only apply temperature offset if result is positive")
Cc: stable@vger.kernel.org
[groeck: Fixed multi-line comment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c1897ae4b upstream.
The kernel security team does NOT assign CVEs, so document that properly
and provide the "if you want one, ask MITRE for it" response that we
give on a weekly basis in the document, so we don't have to constantly
say it to everyone who asks.
Link: https://lore.kernel.org/r/2023063022-retouch-kerosene-7e4a@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fee0915e6 upstream.
Because the linux-distros group forces reporters to release information
about reported bugs, and they impose arbitrary deadlines in having those
bugs fixed despite not actually being kernel developers, the kernel
security team recommends not interacting with them at all as this just
causes confusion and the early-release of reported security problems.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/2023063020-throat-pantyhose-f110@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 288b4fa179 upstream.
This reverts commit 18fc7c435b.
The reverted commit was based on static analysis and a misunderstanding
of how PTR_ERR() and NULLs are supposed to work. When a function
returns both pointer errors and NULL then normally the NULL means
"continue operating without a feature because it was deliberately
turned off". The NULL should not be treated as a failure. If a driver
cannot work when that feature is disabled then the KConfig should
enforce that the function cannot return NULL. We should not need to
test for it.
In this code, the patch means that certain tegra_xusb_probe() will
fail if the firmware supports power-domains but CONFIG_PM is disabled.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 18fc7c435b ("usb: xhci: tegra: Fix error check")
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/8baace8d-fb4b-41a4-ad5f-848ae643a23b@moroto.mountain
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dc162e223 upstream.
The Focusrite Scarlett audio device does not behave correctly during
resumes. Below is what happens during every resume (captured with
Beagle 5000):
<Suspend>
<Resume>
<Reset>/<Chirp J>/<Tiny J>
<Reset/Target disconnected>
<High Speed>
The Scarlett disconnects and is enumerated again.
However from time to time it drops completely off the USB bus during
resume. Below is captured occurrence of such an event:
<Suspend>
<Resume>
<Reset>/<Chirp J>/<Tiny J>
<Reset>/<Chirp K>/<Tiny K>
<High Speed>
<Corrupted packet>
<Reset/Target disconnected>
To fix the condition a user has to unplug and plug the device again.
With USB_QUIRK_RESET_RESUME applied ("usbcore.quirks=1235:8211:b")
for the Scarlett audio device the issue still reproduces.
Applying USB_QUIRK_DISCONNECT_SUSPEND ("usbcore.quirks=1235:8211:m")
fixed the issue and the Scarlett audio device didn't drop off the USB
bus for ~5000 suspend/resume cycles where originally issue reproduced in
~100 or less suspend/resume cycles.
Signed-off-by: Łukasz Bartosik <lb@semihalf.com>
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/20230724112911.1802577-1-lb@semihalf.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e835c0a4e2 upstream.
Commit c4a5153e87 ("usb: dwc3: core: Power-off core/PHYs on
system_suspend in host mode") replaces check for HOST only dr_mode with
current_dr_role. But during booting, the current_dr_role isn't
initialized, thus the device side reset is always issued even if dwc3
was configured as host-only. What's more, on some platforms with host
only dwc3, aways issuing device side reset by accessing device register
block can cause kernel panic.
Fixes: c4a5153e87 ("usb: dwc3: core: Power-off core/PHYs on system_suspend in host mode")
Cc: stable <stable@kernel.org>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/20230627162018.739-1-jszhang@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b32b8f2b95 upstream.
Hardware based on the Bay Trail / BYT SoCs require an external ULPI phy for
USB device-mode. The phy chip usually has its 'reset' and 'chip select'
lines connected to GPIOs described by ACPI fwnodes in the DSDT table.
Because of hardware with missing ACPI resources for the 'reset' and 'chip
select' GPIOs commit 5741022cbd ("usb: dwc3: pci: Add GPIO lookup table
on platforms without ACPI GPIO resources") introduced a fallback
gpiod_lookup_table with hard-coded mappings for Bay Trail devices.
However there are existing Bay Trail based devices, like the National
Instruments cRIO-903x series, where the phy chip has its 'reset' and
'chip-select' lines always asserted in hardware via resistor pull-ups. On
this hardware the phy chip is always enabled and the ACPI dsdt table is
missing information not only for the 'chip-select' and 'reset' lines but
also for the BYT GPIO controller itself "INT33FC".
With the introduction of the gpiod_lookup_table initializing the USB
device-mode on these hardware now errors out. The error comes from the
gpiod_get_optional() calls in dwc3_pci_quirks() which will now return an
-ENOENT error due to the missing ACPI entry for the INT33FC gpio controller
used in the aforementioned table.
This hardware used to work before because gpiod_get_optional() will return
NULL instead of -ENOENT if no GPIO has been assigned to the requested
function. The dwc3_pci_quirks() code for setting the 'cs' and 'reset' GPIOs
was then skipped (due to the NULL return). This is the correct behavior in
cases where the phy chip is hardwired and there are no GPIOs to control.
Since the gpiod_lookup_table relies on the presence of INT33FC fwnode
in ACPI tables only add the table if we know the entry for the INT33FC
gpio controller is present. This allows Bay Trail based devices with
hardwired dwc3 ULPI phys to continue working.
Fixes: 5741022cbd ("usb: dwc3: pci: Add GPIO lookup table on platforms without ACPI GPIO resources")
Cc: stable <stable@kernel.org>
Signed-off-by: Gratian Crisan <gratian.crisan@ni.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230726184555.218091-2-gratian.crisan@ni.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 734ae15ab9 upstream.
This reverts commit b138e23d3d.
AutoRetry has been found to sometimes cause controller freezes when
communicating with buggy USB devices.
This controller feature allows the controller in host mode to send
non-terminating/burst retry ACKs instead of terminating retry ACKs
to devices when a transaction error (CRC error or overflow) occurs.
Unfortunately, if the USB device continues to respond with a CRC error,
the controller will not complete endpoint-related commands while it
keeps trying to auto-retry. [3] The xHCI driver will notice this once
it tries to abort the transfer using a Stop Endpoint command and
does not receive a completion in time. [1]
This situation is reported to dmesg:
[sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00
xhci-hcd: xHCI host not responding to stop endpoint command
xhci-hcd: xHCI host controller not responding, assume dead
xhci-hcd: HC died; cleaning up
Some users observed this problem on an Odroid HC2 with the JMS578
USB3-to-SATA bridge. The issue can be triggered by starting
a read-heavy workload on an attached SSD. After a while, the host
controller would die and the SSD would disappear from the system. [1]
Further analysis by Synopsys determined that controller revisions
other than the one in Odroid HC2 are also affected by this.
The recommended solution was to disable AutoRetry altogether.
This change does not have a noticeable performance impact. [2]
Revert the enablement commit. This will keep the AutoRetry bit in
the default state configured during SoC design [2].
Fixes: b138e23d3d ("usb: dwc3: core: Enable AutoRetry feature in the controller")
Link: https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gmail.com/ [1]
Link: https://lore.kernel.org/r/20230711214834.kyr6ulync32d4ktk@synopsys.com/ [2]
Link: https://lore.kernel.org/r/20230712225518.2smu7wse6djc7l5o@synopsys.com/ [3]
Cc: stable@vger.kernel.org
Cc: Mauro Ribeiro <mauro.ribeiro@hardkernel.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Jakub Vanek <linuxtardis@gmail.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/20230714122419.27741-1-linuxtardis@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8a2da6ec2 upstream.
After an initial link up the CAN device is in ERROR-ACTIVE mode. Due
to a missing CAN_STATE_STOPPED in gs_can_close() it doesn't change to
STOPPED after a link down:
| ip link set dev can0 up
| ip link set dev can0 down
| ip --details link show can0
| 13: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 10
| link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
| can state ERROR-ACTIVE restart-ms 1000
Add missing assignment of CAN_STATE_STOPPED in gs_can_close().
Cc: stable@vger.kernel.org
Fixes: d08e973a77 ("can: gs_usb: Added support for the GS_USB CAN devices")
Link: https://lore.kernel.org/all/20230718-gs_usb-fix-can-state-v1-1-f19738ae2c23@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd92c8a1f9 upstream.
Add the device and product ID for this CAN bus interface / license
dongle. The device is usable either directly from user space or can be
attached to a kernel CAN interface with slcan_attach.
Reported-by: Kaufmann Automotive GmbH <info@kaufmann-automotive.ch>
Tested-by: Kaufmann Automotive GmbH <info@kaufmann-automotive.ch>
Signed-off-by: Oliver Neukum <oneukum@suse.com>
[ johan: amend commit message and move entries in sort order ]
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4dd8752a14 upstream.
The runtime PM state should not be changed by drivers that do not
implement runtime PM even if it happens to work around a bug in PM core.
With the wake irq arming now fixed, drop the bogus runtime PM state
update which left the device in active state (and could potentially
prevent a parent device from suspending).
Fixes: f3974413cf ("tty: serial: qcom_geni_serial: Wakeup IRQ cleanup")
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26a0652cb4 upstream.
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode. The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).
Opportunistically fix a benign typo in the prototype for is_valid_cr4().
Cc: stable@vger.kernel.org
Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4abd73520 upstream.
Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only
if KVM itself is not configured to utilize unrestricted guests, i.e. don't
stuff CR0/CR4 for a restricted L2 that is running as the guest of an
unrestricted L1. Any attempt to VM-Enter a restricted guest with invalid
CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should
never observe a restricted L2 with incompatible CR0/CR4, since nested
VM-Enter from L1 should have failed.
And if KVM does observe an active, restricted L2 with incompatible state,
e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail
does more harm than good, as KVM will often neglect to undo the side
effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus
the damage can easily spill over to L1. On the other hand, letting
VM-Enter fail due to bad guest state is more likely to contain the damage
to L2 as KVM relies on hardware to perform most guest state consistency
checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into
L1 irrespective of (un)restricted guest behavior.
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Fixes: bddd82d19e ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f4fc01af5b upstream.
The legacy gadget driver omitted calling usb_gadget_check_config()
to ensure that the USB device controller (UDC) has adequate resources,
including sufficient endpoint numbers and types, to support the given
configuration.
Previously, usb_add_config() was solely invoked by the legacy gadget
driver. Adds the necessary usb_gadget_check_config() after the bind()
operation to fix the issue.
Fixes: dce49449e0 ("usb: cdns3: allocate TX FIFO size according to composite EP number")
Cc: stable <stable@kernel.org>
Reported-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20230707230015.494999-1-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8291be6b5 upstream.
This reverts commit f08aa7c80d.
The reverted commit was based on static analysis and a misunderstanding
of how PTR_ERR() and NULLs are supposed to work. When a function
returns both pointer errors and NULL then normally the NULL means
"continue operating without a feature because it was deliberately
turned off". The NULL should not be treated as a failure. If a driver
cannot work when that feature is disabled then the KConfig should
enforce that the function cannot return NULL. We should not need to
test for it.
In this driver, the bug means that probe cannot succeed when CONFIG_PM
is disabled.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f08aa7c80d ("usb: gadget: tegra-xudc: Fix error check in tegra_xudc_powerdomain_init()")
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/ZKQoBa84U/ykEh3C@moroto
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dea499781a ]
Warning happened in trace_buffered_event_disable() at
WARN_ON_ONCE(!trace_buffered_event_ref)
Call Trace:
? __warn+0xa5/0x1b0
? trace_buffered_event_disable+0x189/0x1b0
__ftrace_event_enable_disable+0x19e/0x3e0
free_probe_data+0x3b/0xa0
unregister_ftrace_function_probe_func+0x6b8/0x800
event_enable_func+0x2f0/0x3d0
ftrace_process_regex.isra.0+0x12d/0x1b0
ftrace_filter_write+0xe6/0x140
vfs_write+0x1c9/0x6f0
[...]
The cause of the warning is in __ftrace_event_enable_disable(),
trace_buffered_event_enable() was called once while
trace_buffered_event_disable() was called twice.
Reproduction script show as below, for analysis, see the comments:
```
#!/bin/bash
cd /sys/kernel/tracing/
# 1. Register a 'disable_event' command, then:
# 1) SOFT_DISABLED_BIT was set;
# 2) trace_buffered_event_enable() was called first time;
echo 'cmdline_proc_show:disable_event:initcall:initcall_finish' > \
set_ftrace_filter
# 2. Enable the event registered, then:
# 1) SOFT_DISABLED_BIT was cleared;
# 2) trace_buffered_event_disable() was called first time;
echo 1 > events/initcall/initcall_finish/enable
# 3. Try to call into cmdline_proc_show(), then SOFT_DISABLED_BIT was
# set again!!!
cat /proc/cmdline
# 4. Unregister the 'disable_event' command, then:
# 1) SOFT_DISABLED_BIT was cleared again;
# 2) trace_buffered_event_disable() was called second time!!!
echo '!cmdline_proc_show:disable_event:initcall:initcall_finish' > \
set_ftrace_filter
```
To fix it, IIUC, we can change to call trace_buffered_event_enable() at
fist time soft-mode enabled, and call trace_buffered_event_disable() at
last time soft-mode disabled.
Link: https://lore.kernel.org/linux-trace-kernel/20230726095804.920457-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes: 0fc1b09ff1 ("tracing: Use temp buffer when filtering events")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d093282b0 ]
When pages are removed in rb_remove_pages(), 'cpu_buffer->read' is set
to 0 in order to make sure any read iterators reset themselves. However,
this will mess 'entries' stating, see following steps:
# cd /sys/kernel/tracing/
# 1. Enlarge ring buffer prepare for later reducing:
# echo 20 > per_cpu/cpu0/buffer_size_kb
# 2. Write a log into ring buffer of cpu0:
# taskset -c 0 echo "hello1" > trace_marker
# 3. Read the log:
# cat per_cpu/cpu0/trace_pipe
<...>-332 [000] ..... 62.406844: tracing_mark_write: hello1
# 4. Stop reading and see the stats, now 0 entries, and 1 event readed:
# cat per_cpu/cpu0/stats
entries: 0
[...]
read events: 1
# 5. Reduce the ring buffer
# echo 7 > per_cpu/cpu0/buffer_size_kb
# 6. Now entries became unexpected 1 because actually no entries!!!
# cat per_cpu/cpu0/stats
entries: 1
[...]
read events: 0
To fix it, introduce 'page_removed' field to count total removed pages
since last reset, then use it to let read iterators reset themselves
instead of changing the 'read' pointer.
Link: https://lore.kernel.org/linux-trace-kernel/20230724054040.3489499-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Fixes: 83f40318da ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3fc2febb0f ]
The global function triggers a warning because of the missing prototype
drivers/ata/pata_ns87415.c:263:6: warning: no previous prototype for 'ns87560_tf_read' [-Wmissing-prototypes]
263 | void ns87560_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
There are no other references to this, so just make it static.
Fixes: c4b5b7b6c4 ("pata_ns87415: Initial cut at 87415/87560 IDE support")
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 253e5df8b8 ]
The noswap mount option is surely not one of the three options for sizing:
move its description down.
The huge= mount option does not accept numeric values: those are just in
an internal enum. Delete those numbers, and follow the manpage text more
closely (but there's not yet any fadvise() or fcntl() which applies here).
/sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and
barely relevant to mounting a tmpfs: just refer to transhuge.rst (while
still using the words deny and force, to help as informal reminders).
[rdunlap@infradead.org: fixup Docs table for huge mount options]
Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: d0f5a85442 ("shmem: update documentation")
Fixes: 2c6efe9cf2 ("shmem: add support to ignore swap")
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99f98a7c0d ]
syzkaller found a race where IOMMUFD_DESTROY increments the refcount:
obj = iommufd_get_object(ucmd->ictx, cmd->id, IOMMUFD_OBJ_ANY);
if (IS_ERR(obj))
return PTR_ERR(obj);
iommufd_ref_to_users(obj);
/* See iommufd_ref_to_users() */
if (!iommufd_object_destroy_user(ucmd->ictx, obj))
As part of the sequence to join the two existing primitives together.
Allowing the refcount the be elevated without holding the destroy_rwsem
violates the assumption that all temporary refcount elevations are
protected by destroy_rwsem. Racing IOMMUFD_DESTROY with
iommufd_object_destroy_user() will cause spurious failures:
WARNING: CPU: 0 PID: 3076 at drivers/iommu/iommufd/device.c:477 iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:478
Modules linked in:
CPU: 0 PID: 3076 Comm: syz-executor.0 Not tainted 6.3.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
RIP: 0010:iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:477
Code: e8 3d 4e 00 00 84 c0 74 01 c3 0f 0b c3 0f 1f 44 00 00 f3 0f 1e fa 48 89 fe 48 8b bf a8 00 00 00 e8 1d 4e 00 00 84 c0 74 01 c3 <0f> 0b c3 0f 1f 44 00 00 41 57 41 56 41 55 4c 8d ae d0 00 00 00 41
RSP: 0018:ffffc90003067e08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888109ea0300 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
RBP: 0000000000000004 R08: 0000000000000000 R09: ffff88810bbb3500
R10: ffff88810bbb3e48 R11: 0000000000000000 R12: ffffc90003067e88
R13: ffffc90003067ea8 R14: ffff888101249800 R15: 00000000fffffffe
FS: 00007ff7254fe6c0(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555557262da8 CR3: 000000010a6fd000 CR4: 0000000000350ef0
Call Trace:
<TASK>
iommufd_test_create_access drivers/iommu/iommufd/selftest.c:596 [inline]
iommufd_test+0x71c/0xcf0 drivers/iommu/iommufd/selftest.c:813
iommufd_fops_ioctl+0x10f/0x1b0 drivers/iommu/iommufd/main.c:337
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x84/0xc0 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The solution is to not increment the refcount on the IOMMUFD_DESTROY path
at all. Instead use the xa_lock to serialize everything. The refcount
check == 1 and xa_erase can be done under a single critical region. This
avoids the need for any refcount incrementing.
It has the downside that if userspace races destroy with other operations
it will get an EBUSY instead of waiting, but this is kind of racing is
already dangerous.
Fixes: 2ff4bed7fe ("iommufd: File descriptor, context, kconfig and makefiles")
Link: https://lore.kernel.org/r/2-v1-85aacb2af554+bc-iommufd_syz3_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: syzbot+7574ebfe589049630608@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53e7d08f6d ]
In ublk_ctrl_start_dev(), if wait_for_completion_interruptible() is
interrupted by signal, queues aren't setup successfully yet, so we
have to fail UBLK_CMD_START_DEV, otherwise kernel oops can be triggered.
Reported by German when working on qemu-storage-deamon which requires
single thread ublk daemon.
Fixes: 71f28f3136 ("ublk_drv: add io_uring based userspace block driver")
Reported-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230726144502.566785-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b5d0ddcb3 ]
A fence id of zero is expected to be invalid, and is not removed from
the fence_idr table. If userspace is requesting to specify the fence
id with the FENCE_SN_IN flag, we need to reject a zero fence id value.
Fixes: 17154addc5 ("drm/msm: Add MSM_SUBMIT_FENCE_SN_IN")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/549180/
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c01aebeef3 ]
If the second call to amdgpu_bo_create_kernel() fails, the memory
allocated from the first call should be cleared. If the third call
fails, the memory from the second call should be cleared.
Fixes: b95b539168 ("drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d5fff8982 ]
__md_stop_writes() and __md_stop() will modify many fields that are
protected by 'reconfig_mutex', and all the callers will grab
'reconfig_mutex' except for md_stop().
Also, update md_stop() to make certain 'reconfig_mutex' is held using
lockdep_assert_held().
Fixes: 9d09e663d5 ("dm: raid456 basic support")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e74c874eab ]
There are four equivalent goto tags in raid_ctr(), clean them up to
use just one.
There is no functional change and this is preparation to fix
raid_ctr()'s unprotected md_stop().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Stable-dep-of: 7d5fff8982 ("dm raid: protect md_stop() with 'reconfig_mutex'")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bae3028799 ]
In the error paths 'bad_stripe_cache' and 'bad_check_reshape',
'reconfig_mutex' is still held after raid_ctr() returns.
Fixes: 9dbd1aa3a8 ("dm raid: add reshaping support to the target")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1982655821 ]
The NTLMSSP_NEGOTIATE_VERSION flag only needs to be sent during
the NTLMSSP NEGOTIATE (not the AUTH) request, so filter it out for
NTLMSSP AUTH requests. See MS-NLMP 2.2.1.3
This fixes a problem found by the gssntlmssp server.
Link: https://github.com/gssapi/gss-ntlmssp/issues/95
Fixes: 52d005337b ("smb3: send NTLMSSP version information")
Acked-by: Roy Shterman <roy.shterman@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4cf67d3cc9 ]
KASAN and KFENCE detected an user-after-free in the CXL driver. This
happens in the cxl_decoder_add() fail path. KASAN prints the following
error:
BUG: KASAN: slab-use-after-free in cxl_parse_cfmws (drivers/cxl/acpi.c:299)
This happens in cxl_parse_cfmws(), where put_device() is called,
releasing cxld, which is accessed later.
Use the local variables in the dev_err() instead of pointing to the
released memory. Since the dev_err() is printing a resource, change the open
coded print format to use the %pr format specifier.
Fixes: e50fe01e1f ("cxl/core: Drop ->platform_res attribute for root decoders")
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230714093146.2253438-1-leitao@debian.org
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cd0787f08 ]
In an error path where the submit is free'd without the job being run,
the hw_fence pointer is simply a kzalloc'd block of memory. In this
case we should just kfree() it, rather than trying to decrement it's
reference count. Fortunately we can tell that this is the case by
checking for a zero refcount, since if the job was run, the submit would
be holding a reference to the hw_fence.
Fixes: f94e6a51e1 ("drm/msm: Pre-allocate hw_fence")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/547088/
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29900bf351 ]
Driver unload hits a hang during stress testing of load/unload.
stack trace snippet -
tasklet_kill at ffffffff9aabb8b2
bnxt_qplib_nq_stop_irq at ffffffffc0a805fb [bnxt_re]
bnxt_qplib_disable_nq at ffffffffc0a80c5b [bnxt_re]
bnxt_re_dev_uninit at ffffffffc0a67d15 [bnxt_re]
bnxt_re_remove_device at ffffffffc0a6af1d [bnxt_re]
tasklet_kill can hang if the tasklet is scheduled after it is disabled.
Modified the sequences to disable the interrupt first and synchronize
irq before disabling the tasklet.
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1689322969-25402-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65288a22dd ]
Whenever there is a fast path IO and create/destroy resources from
the slow path is happening in parallel, we may notice high latency
of slow path command completion.
Introduces a shadow queue depth to prevent the outstanding requests
to the FW. Driver will not allow more than #RCFW_CMD_NON_BLOCKING_SHADOW_QD
non-blocking commands to the Firmware.
Shadow queue depth is a soft limit only for non-blocking
commands. Blocking commands will be posted to the firmware
as long as there is a free slot.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1686308514-11996-8-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Stable-dep-of: 29900bf351 ("RDMA/bnxt_re: Fix hang during driver unload")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5bbc65512 ]
HW may generate completions that indicates QP is destroyed.
Driver should not be scheduling any more completion handlers
for this QP, after the QP is destroyed. Since CQs are active
during the QP destroy, driver may still schedule completion
handlers. This can cause a race where the destroy_cq and poll_cq
running simultaneously.
Snippet of kernel panic while doing bnxt_re driver load unload in loop.
This indicates a poll after the CQ is freed.
[77786.481636] Call Trace:
[77786.481640] <TASK>
[77786.481644] bnxt_re_poll_cq+0x14a/0x620 [bnxt_re]
[77786.481658] ? kvm_clock_read+0x14/0x30
[77786.481693] __ib_process_cq+0x57/0x190 [ib_core]
[77786.481728] ib_cq_poll_work+0x26/0x80 [ib_core]
[77786.481761] process_one_work+0x1e5/0x3f0
[77786.481768] worker_thread+0x50/0x3a0
[77786.481785] ? __pfx_worker_thread+0x10/0x10
[77786.481790] kthread+0xe2/0x110
[77786.481794] ? __pfx_kthread+0x10/0x10
[77786.481797] ret_from_fork+0x2c/0x50
To avoid this, complete all completion handlers before returning the
destroy QP. If free_cq is called soon after destroy_qp, IB stack
will cancel the CQ work before invoking the destroy_cq verb and
this will prevent any race mentioned.
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1689322969-25402-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc52aadbc1 ]
Commit 21c2fe94ab ("RDMA/mthca: Combine special QP struct with mthca QP")
introduced a new struct mthca_sqp which doesn't contain struct mthca_qp
any longer. Placing a pointer of this new struct into qptable leads
to crashes, because mthca_poll_one() expects a qp pointer. Fix this
by putting the correct pointer into qptable.
Fixes: 21c2fe94ab ("RDMA/mthca: Combine special QP struct with mthca QP")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Link: https://lore.kernel.org/r/20230713141658.9426-1-tbogendoerfer@suse.de
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e15863015 ]
8d037973d4 ("RDMA/core: Refactor rdma_bind_addr") intoduces as regression
on irdma devices on certain tests which uses rdma CM, such as cmtime.
No connections can be established with the MAD QP experiences a fatal
error on the active side.
The cma destination address is not updated with the dst_addr when ULP
on active side calls rdma_bind_addr followed by rdma_resolve_addr.
The id_priv state is 'bound' in resolve_prepare_src and update is skipped.
This leaves the dgid passed into irdma driver to create an Address Handle
(AH) for the MAD QP at 0. The create AH descriptor as well as the ARP cache
entry is invalid and HW throws an asynchronous events as result.
[ 1207.656888] resolve_prepare_src caller: ucma_resolve_addr+0xff/0x170 [rdma_ucm] daddr=200.0.4.28 id_priv->state=7
[....]
[ 1207.680362] ice 0000:07:00.1 rocep7s0f1: caller: irdma_create_ah+0x3e/0x70 [irdma] ah_id=0 arp_idx=0 dest_ip=0.0.0.0
destMAC=00:00:64:ca:b7:52 ipvalid=1 raw=0000:0000:0000:0000:0000:ffff:0000:0000
[ 1207.682077] ice 0000:07:00.1 rocep7s0f1: abnormal ae_id = 0x401 bool qp=1 qp_id = 1, ae_src=5
[ 1207.691657] infiniband rocep7s0f1: Fatal error (1) on MAD QP (1)
Fix this by updating the CMA destination address when the ULP calls
a resolve address with the CM state already bound.
Fixes: 8d037973d4 ("RDMA/core: Refactor rdma_bind_addr")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230712234133.1343-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4984eb5145 ]
On code inspection, there are many instances in the driver where
CEQE and AEQE fields written to by HW are read without guaranteeing
that the polarity bit has been read and checked first.
Add a read barrier to avoid reordering of loads on the CEQE/AEQE fields
prior to checking the polarity bit.
Fixes: 3f49d68425 ("RDMA/irdma: Implement HW Admin Queue OPs")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230711175253.1289-2-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95f41d8781 ]
The commit in Fixes has introduced some "enum p9_session_flags" values
larger than a char.
Such values are stored in "v9fs_session_info->flags" which is a char only.
Turn it into an int so that the "enum p9_session_flags" values can fit in
it.
Fixes: 6deffc8924 ("fs/9p: Add new mount modes")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de52e17326 ]
If tipc_link_bc_create() fails inside tipc_node_create() for a newly
allocated tipc node then we should stop its tipc crypto and free the
resources allocated with a call to tipc_crypto_start().
As the node ref is initialized to one to that point, just put the ref on
tipc_link_bc_create() error case that would lead to tipc_node_free() be
eventually executed and properly clean the node and its crypto resources.
Found by Linux Verification Center (linuxtesting.org).
Fixes: cb8092d70a ("tipc: move bc link creation back to tipc_node_create")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230725214628.25246-1-pchelkin@ispras.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c58c8816a ]
The nla_for_each_nested parsing in function mqprio_parse_nlattr() does
not check the length of the nested attribute. This can lead to an
out-of-attribute read and allow a malformed nlattr (e.g., length 0) to
be viewed as 8 byte integer and passed to priv->max_rate/min_rate.
This patch adds the check based on nla_len() when check the nla_type(),
which ensures that the length of these two attribute must equals
sizeof(u64).
Fixes: 4e8b86c062 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230725024227.426561-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f0bc0b042 ]
Commit eda0047296 ("mm: make the page fault mmap locking killable")
intentionally made it much easier to trigger the "page fault fails
because a fatal signal is pending" situation, by having the mmap locking
fail early in that case.
We have long aborted page faults in other fatal cases when the actual IO
for a page is interrupted by SIGKILL - which is particularly useful for
the traditional case of NFS hanging due to network issues, but local
filesystems could cause it too if you happened to get the SIGKILL while
waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()).
So aborting the page fault wasn't a new condition - but it now triggers
earlier, before we even get to 'handle_mm_fault()'. And as a result the
error doesn't go through our 'fault_signal_pending()' logic, and doesn't
get filtered away there.
Normally you'd never even notice, because if a fatal signal is pending,
the new SIGSEGV we send ends up being ignored anyway.
But it turns out that there is one very noticeable exception: if you
enable 'show_unhandled_signals', the aborted page fault will be logged
in the kernel messages, and you'll get a scary line looking something
like this in your logs:
pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0)
which is rather misleading. It's not really a segfault at all, it's
just "the thread was killed before the page fault completed, so we
aborted the page fault".
Fix this by just making it clear that a pending fatal signal means that
any new signal coming in after that is implicitly handled. This will
avoid the misleading logging, since now the signal isn't 'unhandled' any
more.
Reported-and-tested-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/
Acked-by: Oleg Nesterov <oleg@redhat.com>
Fixes: eda0047296 ("mm: make the page fault mmap locking killable")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ebc1064e4 ]
Bail out with EOPNOTSUPP when adding rule to bound chain via
NFTA_RULE_CHAIN_ID. The following warning splat is shown when
adding a rule to a deleted bound chain:
WARNING: CPU: 2 PID: 13692 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
CPU: 2 PID: 13692 Comm: chain-bound-rul Not tainted 6.1.39 #1
RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a771f7b26 ]
On error when building the rule, the immediate expression unbinds the
chain, hence objects can be deactivated by the transaction records.
Otherwise, it is possible to trigger the following warning:
WARNING: CPU: 3 PID: 915 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
CPU: 3 PID: 915 Comm: chain-bind-err- Not tainted 6.1.39 #1
RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
Fixes: 4bedf9eee0 ("netfilter: nf_tables: fix chain binding transaction logic")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f718863aca ]
The lazy gc on insert that should remove timed-out entries fails to release
the other half of the interval, if any.
Can be reproduced with tests/shell/testcases/sets/0044interval_overlap_0
in nftables.git and kmemleak enabled kernel.
Second bug is the use of rbe_prev vs. prev pointer.
If rbe_prev() returns NULL after at least one iteration, rbe_prev points
to element that is not an end interval, hence it should not be removed.
Lastly, check the genmask of the end interval if this is active in the
current generation.
Fixes: c9e6978e27 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 55cef78c24 ]
The previous commit 954d1fa1ac ("macvlan: Add netlink attribute for
broadcast cutoff") added one additional attribute named
IFLA_MACVLAN_BC_CUTOFF to allow broadcast cutfoff.
However, it forgot to describe the nla_policy at macvlan_policy
(drivers/net/macvlan.c). Hence, this suppose NLA_S32 (4 bytes) integer
can be faked as empty (0 bytes) by a malicious user, which could leads
to OOB in heap just like CVE-2023-3773.
To fix it, this commit just completes the nla_policy description for
IFLA_MACVLAN_BC_CUTOFF. This enforces the length check and avoids the
potential OOB read.
Fixes: 954d1fa1ac ("macvlan: Add netlink attribute for broadcast cutoff")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230723080205.3715164-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f91164061 ]
Commit c4e34dd99f ("x86: simplify load_unaligned_zeropad()
implementation") changes how exceptions around load_unaligned_zeropad()
handled. The kernel now uses the fault_address in fixup_exception() to
verify the address calculations for the load_unaligned_zeropad().
It works fine for #PF, but breaks on #VE since no fault address is
passed down to fixup_exception().
Propagating ve_info.gla down to fixup_exception() resolves the issue.
See commit 1e7769653b ("x86/tdx: Handle load_unaligned_zeropad()
page-cross to a shared page") for more context.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Michael Kelley <mikelley@microsoft.com>
Fixes: c4e34dd99f ("x86: simplify load_unaligned_zeropad() implementation")
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad084a6d99 ]
Only the HW rfkill state is toggled on laptops with quirks->ec_read_only
(so far only MSI Wind U90/U100). There are, however, a few issues with
the implementation:
1. The initial HW state is always unblocked, regardless of the actual
state on boot, because msi_init_rfkill only sets the SW state,
regardless of ec_read_only.
2. The initial SW state corresponds to the actual state on boot, but it
can't be changed afterwards, because set_device_state returns
-EOPNOTSUPP. It confuses the userspace, making Wi-Fi and/or Bluetooth
unusable if it was blocked on boot, and breaking the airplane mode if
the rfkill was unblocked on boot.
Address the above issues by properly initializing the HW state on
ec_read_only laptops and by allowing the userspace to toggle the SW
state. Don't set the SW state ourselves and let the userspace fully
control it. Toggling the SW state is a no-op, however, it allows the
userspace to properly toggle the airplane mode. The actual SW radio
disablement is handled by the corresponding rtl818x_pci and btusb
drivers that have their own rfkills.
Tested on MSI Wind U100 Plus, BIOS ver 1.0G, EC ver 130.
Fixes: 0816392b97 ("msi-laptop: merge quirk tables to one")
Fixes: 0de6575ad0 ("msi-laptop: Add MSI Wind U90/U100 support")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Link: https://lore.kernel.org/r/20230721145423.161057-1-maxtram95@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 284779dbf4 ]
commit a3a57bf07d ("net: stmmac: work
around sporadic tx issue on link-up") worked around a problem with TX
sometimes not working after a link-up by avoiding a redundant write to
MAC_CTRL_REG (aka GMAC_CONFIG), since the IP appeared to have problems
with handling multiple writes to that register in some cases.
That commit however only added the work around to dwmac_lib.c (apart
from the common code in stmmac_main.c), but my systems with version
4.21a of the IP exhibit the same problem, so add the work around to
dwmac4_lib.c too.
Fixes: a3a57bf07d ("net: stmmac: work around sporadic tx issue on link-up")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230721-stmmac-tx-workaround-v1-1-9411cbd5ee07@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e62c99d71 ]
As of today, hash extraction support is enabled for all the silicons.
Because of which we are facing initialization issues when the silicon
does not support hash extraction. During creation of the hardware
parsing table for IPv6 address, we need to consider if hash extraction
is enabled then extract only 32 bit, otherwise 128 bit needs to be
extracted. This patch fixes the issue and configures the hardware parser
based on the availability of the feature.
Fixes: a95ab93550 ("octeontx2-af: Use hashed field in MCAM key")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230721061222.2632521-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa532bee17 ]
When adding a point to point downlink to team device, we neglected to reset
the team's flags, which were still using flags like BROADCAST and
MULTICAST. Consequently, this would initiate ARP/DAD for P2P downlink
interfaces, such as when adding a GRE device to team device. Fix this by
remove multicast/broadcast flags and add p2p and noarp flags.
After removing the none ethernet interface and adding an ethernet interface
to team, we need to reset team interface flags. Unlike bonding interface,
team do not need restore IFF_MASTER, IFF_SLAVE flags.
Reported-by: Liang Li <liali@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2221438
Fixes: 1d76efe157 ("team: add support for non-ethernet devices")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da19a2b967 ]
When adding a point to point downlink to the bond, we neglected to reset
the bond's flags, which were still using flags like BROADCAST and
MULTICAST. Consequently, this would initiate ARP/DAD for P2P downlink
interfaces, such as when adding a GRE device to the bonding.
To address this issue, let's reset the bond's flags for P2P interfaces.
Before fix:
7: gre0@NONE: <POINTOPOINT,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN group default qlen 1000
link/gre6 2006:70:10::1 peer 2006:70:10::2 permaddr 167f:18:f188::
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/gre6 2006:70:10::1 brd 2006:70:10::2
inet6 fe80::200:ff:fe00:0/64 scope link
valid_lft forever preferred_lft forever
After fix:
7: gre0@NONE: <POINTOPOINT,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond2 state UNKNOWN group default qlen 1000
link/gre6 2006:70:10::1 peer 2006:70:10::2 permaddr c29e:557a:e9d9::
8: bond0: <POINTOPOINT,NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/gre6 2006:70:10::1 peer 2006:70:10::2
inet6 fe80::1/64 scope link
valid_lft forever preferred_lft forever
Reported-by: Liang Li <liali@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2221438
Fixes: 872254dd6b ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d11b0df7dd ]
For both IPv4 and IPv6 incoming TCP connections are tracked in a hash
table with a hash over the source & destination addresses and ports.
However, the IPv6 hash is insufficient and can lead to a high rate of
collisions.
The IPv6 hash used an XOR to fit everything into the 96 bits for the
fast jenkins hash, meaning it is possible for an external entity to
ensure the hash collides, thus falling back to a linear search in the
bucket, which is slow.
We take the approach of hash the full length of IPv6 address in
__ipv6_addr_jhash() so that all users can benefit from a more secure
version.
While this may look like it adds overhead, the reality of modern CPUs
means that this is unmeasurable in real world scenarios.
In simulating with llvm-mca, the increase in cycles for the hashing
code was ~16 cycles on Skylake (from a base of ~155), and an extra ~9
on Nehalem (base of ~173).
In commit dd6d2910c5 ("netfilter: conntrack: switch to siphash")
netfilter switched from a jenkins hash to a siphash, but even the faster
hsiphash is a more significant overhead (~20-30%) in some preliminary
testing. So, in this patch, we keep to the more conservative approach to
ensure we don't add much overhead per SYN.
In testing, this results in a consistently even spread across the
connection buckets. In both testing and real-world scenarios, we have
not found any measurable performance impact.
Fixes: 08dcdbf6a7 ("ipv6: use a stronger hash for tcp")
Signed-off-by: Stewart Smith <trawets@amazon.com>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230721222410.17914-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb7a015636 ]
According to the implementation of XDP of FEC driver, the XDP path
shares the transmit queues with the kernel network stack, so it is
possible to lead to a tx timeout event when XDP uses the tx queue
pretty much exclusively. And this event will cause the reset of the
FEC hardware.
To avoid timeout in this case, we use the txq_trans_cond_update()
interface to update txq->trans_start to jiffies so that watchdog
won't generate a transmit timeout warning.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20230721083559.2857312-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69172f0bcb ]
currently on 6.4 net/main:
# ip link add dummy1 type dummy
# echo 1 > /proc/sys/net/ipv6/conf/dummy1/use_tempaddr
# ip link set dummy1 up
# ip -6 addr add 2000::1/64 mngtmpaddr dev dummy1
# ip -6 addr show dev dummy1
11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet6 2000::44f3:581c:8ca:3983/64 scope global temporary dynamic
valid_lft 604800sec preferred_lft 86172sec
inet6 2000::1/64 scope global mngtmpaddr
valid_lft forever preferred_lft forever
inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link
valid_lft forever preferred_lft forever
# ip -6 addr del 2000::44f3:581c:8ca:3983/64 dev dummy1
(can wait a few seconds if you want to, the above delete isn't [directly] the problem)
# ip -6 addr show dev dummy1
11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet6 2000::1/64 scope global mngtmpaddr
valid_lft forever preferred_lft forever
inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link
valid_lft forever preferred_lft forever
# ip -6 addr del 2000::1/64 mngtmpaddr dev dummy1
# ip -6 addr show dev dummy1
11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet6 2000::81c9:56b7:f51a:b98f/64 scope global temporary dynamic
valid_lft 604797sec preferred_lft 86169sec
inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link
valid_lft forever preferred_lft forever
This patch prevents this new 'global temporary dynamic' address from being
created by the deletion of the related (same subnet prefix) 'mngtmpaddr'
(which is triggered by there already being no temporary addresses).
Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: 53bd674915 ("ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses")
Reported-by: Xiao Ma <xiaom@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230720160022.1887942-1-maze@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 13c088cf36 ]
The size of array 'priv->ports[]' is INNO_PHY_PORT_NUM.
In the for loop, 'i' is used as the index for array 'priv->ports[]'
with a check (i > INNO_PHY_PORT_NUM) which indicates that
INNO_PHY_PORT_NUM is allowed value for 'i' in the same loop.
This > comparison needs to be changed to >=, otherwise it potentially leads
to an out of bounds write on the next iteration through the loop
Fixes: ba8b0ee81f ("phy: add inno-usb2-phy driver for hi3798cv200 SoC")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20230721090558.3588613-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0b672c4d0 ]
In VXLAN-GPE, there may not be an Ethernet header following the VXLAN
header. But in GRO, the vxlan driver calls eth_gro_receive
unconditionally, which means the following header is incorrectly parsed
as Ethernet.
Introduce GPE specific GRO handling.
For better performance, do not check for GPE during GRO but rather
install a different set of functions at setup time.
Fixes: e1e5314de0 ("vxlan: implement GPE")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17a0a64448 ]
The vxlan_parse_gpe_hdr function extracts the next protocol value from
the GPE header and marks GPE bits as parsed.
In order to be used in the next patch, split the function into protocol
extraction and bit marking. The bit marking is meaningful only in
vxlan_rcv; move it directly there.
Rename the function to vxlan_parse_gpe_proto to reflect what it now
does. Remove unused arguments skb and vxflags. Move the function earlier
in the file to allow it to be called from more places in the next patch.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: b0b672c4d0 ("vxlan: fix GRO with VXLAN-GPE")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94d166c531 ]
VXLAN-GPE does not add an extra inner Ethernet header. Take that into
account when calculating header length.
This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
cached.
In the collect_md mode (which is the only mode that VXLAN-GPE
supports), there's no magic auto-setting of the tunnel interface MTU.
It can't be, since the destination and thus the underlying interface
may be different for each packet.
So, the administrator is responsible for setting the correct tunnel
interface MTU. Apparently, the administrators are capable enough to
calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36).
They set the tunnel interface MTU to 1464. If you run a TCP stream over
such interface, it's then segmented according to the MTU 1464, i.e.
producing 1514 bytes frames. Which is okay, this still fits the lower
MTU.
However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as
the header size and thus incorrectly calculates the frame size to be
1528. This leads to ICMP too big message being generated (locally),
PMTU of 1450 to be cached and the TCP stream to be resegmented.
The fix is to use the correct actual header size, especially for
skb_tunnel_check_pmtu calculation.
Fixes: e1e5314de0 ("vxlan: implement GPE")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 882481b1c5 ]
In dwrr mode, the default bandwidth weight of disabled tc is set to 0.
If the bandwidth weight is 0, the mode will change to sp.
Therefore, disabled tc default bandwidth weight need changed to 1,
and 0 is returned when query the bandwidth weight of disabled tc.
In addition, driver need stop configure bandwidth weight if tc is disabled.
Fixes: 848440544b ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 116d9f732e ]
Currently, the weight saved by the driver is used as the query result,
which may be different from the actual weight in the register.
Therefore, the register value read from the firmware is used
as the query result
Fixes: 0e32038dc8 ("net: hns3: refactor dump tc of debugfs")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b27d0232e8 ]
Current only the first 32 bits of the capability flag bit are considered.
When the matching capability flag bit is greater than 31 bits,
it will get an error bit.This patch use bitmap to solve this issue.
It can handle each capability bit whitout bit width limit.
Fixes: da77aef9cc ("net: hns3: create common cmdq resource allocate/free/query APIs")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7b75bea85 ]
Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
it sometimes does not take effect immediately. And a read of this
register causes the bit not to clear. This will cause mv3310_reset()
to time out, which will fail the config initialization. So add a delay
before the next access.
Fixes: c9cc1c815d ("net: phy: marvell10g: place in powersave mode at probe")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 91896c8acc ]
In iavf_adminq_task(), if the function can't acquire the
adapter->crit_lock, it checks if the driver is removing. If so, it simply
exits without re-enabling the interrupt. This is done to ensure that the
task stops processing as soon as possible once the driver is being removed.
However, if the IAVF_FLAG_PF_COMMS_FAILED is set, the function checks this
before attempting to acquire the lock. In this case, the function exits
early and re-enables the interrupt. This will happen even if the driver is
already removing.
Avoid this, by moving the check to after the adapter->crit_lock is
acquired. This way, if the driver is removing, we will not re-enable the
interrupt.
Fixes: fc2e6b3b13 ("iavf: Rework mutexes for better synchronisation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2f054c10b ]
In iavf_adminq_task(), if kzalloc() fails to allocate the event.msg_buf,
the function will exit without releasing the adapter->crit_lock.
This is unlikely, but if it happens, the next access to that mutex will
deadlock.
Fix this by moving the unlock to the end of the function, and adding a new
label to allow jumping to the unlock portion of the function exit flow.
Fixes: fc2e6b3b13 ("iavf: Rework mutexes for better synchronisation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 043b1f185f ]
The debugfs_create_dir() function returns error pointers.
It never returns NULL. Most incorrect error checks were fixed,
but the one in i40e_dbg_init() was forgotten.
Fix the remaining error check.
Fixes: 02e9c29081 ("i40e: debugfs interface")
Signed-off-by: Wang Ming <machel@vivo.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20de9fdaf4 ]
The mtk8195_jpegenc_drvdata object was added outside of an #ifdef causing
a harmless build warning.
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1879:32: error: 'mtk8195_jpegenc_drvdata' defined but not used [-Werror=unused-variable]
1879 | static struct mtk_jpeg_variant mtk8195_jpegenc_drvdata = {
| ^~~~~~~~~~~~~~~~~~~~~~~
A follow-up patch moved it inside of an #ifdef, which caused more
warnings, and a third patch ended up adding even more #ifdefs. These
were all bogus, since the actual problem here is the incorrect use
of of_ptr(). Since the driver (like any other modern platform driver)
only works in combination with CONFIG_OF, there is no point in hiding
the reference, so just remove that along with all the pointless #ifdef
checks in the driver.
This improves build coverage and avoids running into the same problem
again when another part of the driver gets changed that relies on
the #ifdef blocks to be completely matched.
Fixes: 934e8bccac ("mtk-jpegenc: support jpegenc multi-hardware")
Fixes: 4ae47770d5 ("media: mtk-jpegenc: Fix a compilation issue")
Fixes: da4ede4b7f ("media: mtk-jpeg: move data/code inside CONFIG_OF blocks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da4ede4b7f ]
Lots of data and functions here are not needed when CONFIG_OF is not
set, so move them inside #ifdef CONFIG_OF blocks to prevent the warnings.
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1645:29: warning: ‘mtk_jpeg_clocks’ defined but not used [-Wunused-variable]
1645 | static struct clk_bulk_data mtk_jpeg_clocks[] = {
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1640:29: warning: ‘mt8173_jpeg_dec_clocks’ defined but not used [-Wunused-variable]
1640 | static struct clk_bulk_data mt8173_jpeg_dec_clocks[] = {
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1481:20: warning: ‘mtk_jpeg_dec_irq’ defined but not used [-Wunused-function]
1481 | static irqreturn_t mtk_jpeg_dec_irq(int irq, void *priv)
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1461:20: warning: ‘mtk_jpeg_enc_irq’ defined but not used [-Wunused-function]
1461 | static irqreturn_t mtk_jpeg_enc_irq(int irq, void *priv)
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1180:13: warning: ‘mtk_jpegdec_worker’ defined but not used [-Wunused-function]
1180 | static void mtk_jpegdec_worker(struct work_struct *work)
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:986:13: warning: ‘mtk_jpegenc_worker’ defined but not used [-Wunused-function]
986 | static void mtk_jpegenc_worker(struct work_struct *work)
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:79:28: warning: ‘mtk_jpeg_dec_formats’ defined but not used [-Wunused-variable]
79 | static struct mtk_jpeg_fmt mtk_jpeg_dec_formats[] = {
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:31:28: warning: ‘mtk_jpeg_enc_formats’ defined but not used [-Wunused-variable]
31 | static struct mtk_jpeg_fmt mtk_jpeg_enc_formats[] = {
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1222:20: warning: ‘mtk_jpeg_enc_done’ defined but not used [-Wunused-function]
1222 | static irqreturn_t mtk_jpeg_enc_done(struct mtk_jpeg_dev *jpeg)
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1072:12: warning: ‘mtk_jpegdec_set_hw_param’ defined but not used [-Wunused-function]
1072 | static int mtk_jpegdec_set_hw_param(struct mtk_jpeg_ctx *ctx,
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1060:12: warning: ‘mtk_jpegdec_put_hw’ defined but not used [-Wunused-function]
1060 | static int mtk_jpegdec_put_hw(struct mtk_jpeg_dev *jpeg, int hw_id)
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1038:12: warning: ‘mtk_jpegdec_get_hw’ defined but not used [-Wunused-function]
1038 | static int mtk_jpegdec_get_hw(struct mtk_jpeg_ctx *ctx)
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:977:12: warning: ‘mtk_jpegenc_put_hw’ defined but not used [-Wunused-function]
977 | static int mtk_jpegenc_put_hw(struct mtk_jpeg_dev *jpeg, int hw_id)
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:963:12: warning: ‘mtk_jpegenc_set_hw_param’ defined but not used [-Wunused-function]
963 | static int mtk_jpegenc_set_hw_param(struct mtk_jpeg_ctx *ctx,
../drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:941:12: warning: ‘mtk_jpegenc_get_hw’ defined but not used [-Wunused-function]
941 | static int mtk_jpegenc_get_hw(struct mtk_jpeg_ctx *ctx)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/linux-media/202305042146.j4ZxuvpM-lkp@intel.com/
Cc: Bin Liu <bin.liu@mediatek.com>
Cc: oushixiong <oushixiong@kylinos.cn>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: linux-media@vger.kernel.org
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Stable-dep-of: 20de9fdaf4 ("media: mtk_jpeg_core: avoid unused-variable warning")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcff0b56f6 ]
The path did not match the one it was submitted into linux-firmware
which prevented generic distribution from having working CODEC.
Fixes: 9f599f351e ("media: amphion: add vpu core driver")
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c2dcfc2cf ]
Address these compiler warnings by initialising the m_best and p_best
values to 0 and 1 respectively (as latter is used as a divisor):
drivers/media/i2c/tc358746.c: In function 'tc358746_find_pll_settings':
>> drivers/media/i2c/tc358746.c:817:13: warning: 'p_best' is used uninitialized
[-Wuninitialized]
817 | u16 p_best, p;
| ^~~~~~
>> drivers/media/i2c/tc358746.c:816:13: warning: 'm_best' is used uninitialized
[-Wuninitialized]
816 | u16 m_best, mul;
| ^~~~~~
The warnings may well be a false positive but it is difficult for a
compiler to find out whether that truly is the case.
Closes: https://lore.kernel.org/oe-kbuild-all/202305301627.fLT3Bkds-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 80a21da360 ("media: tc358746: add Toshiba TC358746 Parallel to CSI-2 bridge driver")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a0eb8f9b9 ]
The driver is not enabling the ref clock, which thus gets disabled by
the clk_disable_unused() initcall. This leads to the dwc3 controller
failing to initialize if probed after clk_disable_unused() is called,
for instance when the driver is built as a module.
To fix this, switch to the clk_bulk API to handle both cfg_ahb and ref
clocks at the proper places.
Note that the cfg_ahb clock is currently not used by any device tree
instantiation of the PHY. Work needs to be done separately to fix this.
Link: https://lore.kernel.org/linux-arm-msm/ZEqvy+khHeTkC2hf@fedora/
Fixes: 51e8114f80 ("phy: qcom-snps: Add SNPS USB PHY driver for QCOM based SOCs")
Signed-off-by: Adrien Thierry <athierry@redhat.com>
Link: https://lore.kernel.org/r/20230629144542.14906-3-athierry@redhat.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45d89a344e ]
In the dwc3 core, both system and runtime suspend end up calling
dwc3_suspend_common(). From there, what happens for the PHYs depends on
the USB mode and whether the controller is entering system or runtime
suspend.
HOST mode:
(1) system suspend on a non-wakeup-capable controller
The [1] if branch is taken. dwc3_core_exit() is called, which ends up
calling phy_power_off() and phy_exit(). Those two functions decrease the
PM runtime count at some point, so they will trigger the PHY runtime
sleep (assuming the count is right).
(2) runtime suspend / system suspend on a wakeup-capable controller
The [1] branch is not taken. dwc3_suspend_common() calls
phy_pm_runtime_put_sync(). Assuming the ref count is right, the PHY
runtime suspend op is called.
DEVICE mode:
dwc3_core_exit() is called on both runtime and system sleep
unless the controller is already runtime suspended.
OTG mode:
(1) system suspend : dwc3_core_exit() is called
(2) runtime suspend : do nothing
In host mode, the code seems to make a distinction between 1) runtime
sleep / system sleep for wakeup-capable controller, and 2) system sleep
for non-wakeup-capable controller, where phy_power_off() and phy_exit()
are only called for the latter. This suggests the PHY is not supposed to
be in a fully powered-off state for runtime sleep and system sleep for
wakeup-capable controller.
Moreover, downstream, cfg_ahb_clk only gets disabled for system suspend.
The clocks are disabled by phy->set_suspend() [2] which is only called
in the system sleep path through dwc3_core_exit() [3].
With that in mind, don't disable the clocks during the femto PHY runtime
suspend callback. The clocks will only be disabled during system suspend
for non-wakeup-capable controllers, through dwc3_core_exit().
[1] https://elixir.bootlin.com/linux/v6.4/source/drivers/usb/dwc3/core.c#L1988
[2] https://git.codelinaro.org/clo/la/kernel/msm-5.4/-/blob/LV.AU.1.2.1.r2-05300-gen3meta.0/drivers/usb/phy/phy-msm-snps-hs.c#L524
[3] https://git.codelinaro.org/clo/la/kernel/msm-5.4/-/blob/LV.AU.1.2.1.r2-05300-gen3meta.0/drivers/usb/dwc3/core.c#L1915
Signed-off-by: Adrien Thierry <athierry@redhat.com>
Link: https://lore.kernel.org/r/20230629144542.14906-2-athierry@redhat.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Stable-dep-of: 8a0eb8f9b9 ("phy: qcom-snps-femto-v2: properly enable ref clock")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d3de7ee19 ]
During allocations, while looking for preallocations(PA) in the per
inode rbtree, we can't do a direct traversal of the tree because
ext4_mb_discard_group_preallocation() can paralelly mark the pa deleted
and that can cause direct traversal to skip some entries. This was
leading to a BUG_ON() being hit [1] when we missed a PA that could satisfy
our request and ultimately tried to create a new PA that would overlap
with the missed one.
To makes sure we handle that case while still keeping the performance of
the rbtree, we make use of the fact that the only pa that could possibly
overlap the original goal start is the one that satisfies the below
conditions:
1. It must have it's logical start immediately to the left of
(ie less than) original logical start.
2. It must not be deleted
To find this pa we use the following traversal method:
1. Descend into the rbtree normally to find the immediate neighboring
PA. Here we keep descending irrespective of if the PA is deleted or if
it overlaps with our request etc. The goal is to find an immediately
adjacent PA.
2. If the found PA is on right of original goal, use rb_prev() to find
the left adjacent PA.
3. Check if this PA is deleted and keep moving left with rb_prev() until
a non deleted PA is found.
4. This is the PA we are looking for. Now we can check if it can satisfy
the original request and proceed accordingly.
This approach also takes care of having deleted PAs in the tree.
(While we are at it, also fix a possible overflow bug in calculating the
end of a PA)
[1] https://lore.kernel.org/linux-ext4/CA+G9fYv2FRpLqBZf34ZinR8bU2_ZRAUOjKAD3+tKRFaEQHtt8Q@mail.gmail.com/
Cc: stable@kernel.org # 6.4
Fixes: 3872778664 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com
Tested-by: Ritesh Harjani (IBM) ritesh.list@gmail.com
Link: https://lore.kernel.org/r/edd2efda6a83e6343c5ace9deea44813e71dbe20.1690045963.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 569f196f1e ]
There will be changes coming in future patches which will introduce a new
criteria for block allocation. This removes the useless setting of ac_criteria.
AFAIU, this might be only used to differentiate between whether a preallocated
blocks was allocated or was regular allocator called for allocating blocks.
Hence this also adds the debug prints to identify what type of block allocation
was done in ext4_mb_show_ac().
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/1dbae05617519cb6202f1b299c9d1be3e7cda763.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 9d3de7ee19 ("ext4: fix rbtree traversal bug in ext4_mb_use_preallocated")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1eff590489 ]
ext4_mb_use_preallocated will ignore the demand to alloc goal blocks,
although the EXT4_MB_HINT_GOAL_ONLY is requested.
For group pa, ext4_mb_group_or_file will not set EXT4_MB_HINT_GROUP_ALLOC
if EXT4_MB_HINT_GOAL_ONLY is set. So we will not alloc goal blocks from
group pa if EXT4_MB_HINT_GOAL_ONLY is set.
For inode pa, ext4_mb_pa_goal_check is added to check if free extent in
found inode pa meets goal blocks when EXT4_MB_HINT_GOAL_ONLY is set.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Suggested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230603150327.3596033-6-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 9d3de7ee19 ("ext4: fix rbtree traversal bug in ext4_mb_use_preallocated")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a4bcdbea4 ]
[Why]
Underflow observed when using a display with a large vblank region
and low refresh rate
[How]
Simplify calculation of vblank_nom
Increase value for VBlankNomDefaultUS to 800us
Reviewed-by: Jun Lei <jun.lei@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Daniel Miess <daniel.miess@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 2a9482e559 ("drm/amd/display: Prevent vtotal from being set to 0")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 469a62938a ]
[Why]
Flickering and underflow was observed when testing extended
blank on dcn314.
[What]
Vstartup is contrainted by vblank_nom, so adjusting it to include
non-adjusted vtotal in its calculation during freesync video
means that Vstartup is not changed when vtotal changes.
This fixed the flickering + underflow.
dc_extended_blank_supported function was removed
because extended blank is only relevant to when
zstate is supported. The increased vtotal during
freesync can be passed to dml regardless of whether
extended blank is supported or not, so this function is
not needed.
Updates were made recently in dml to the calculation of
min_dst_y_next_start. Dml input for dcn314 will now
always use the newer calculation for min_dst_y_next_start.
Dml input for older dcn versions remains untouched.
The variable optimized_min_dst_y_next_start
is replaced everywhere with min_dst_y_next_start,
and the updated dml allows min_dst_y_next_start to
increase to an optimized value during freesync video,
then return to default when freesync is disengaged.
Also removed registry key for controlling
extended blank feature.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Gabe Teeger <gabe.teeger@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 2a9482e559 ("drm/amd/display: Prevent vtotal from being set to 0")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ba90d760e ]
This feature is meant to unblock PSTATE for certain high end display
configs on dcn315. This is achieved by allocating CRB to detile buffer
based on display requirements to meet pstate latency hiding needs.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 49f26218c3 ("drm/amd/display: fix dcn315 single stream crb allocation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fa8cc0c44 ]
[WHY]
32ms delay was added to resolve issue with a specific sink, however this same
delay also introduces erroneous link training failures with certain sink
devices.
[HOW]
Only apply the 32ms delay for offending devices instead of globally.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 5a096b73c8 ("drm/amd/display: Keep disable aux-i delay as 0")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a52587e0be ]
The RK3399 PCIe endpoint controller cannot generate MSI-X IRQs.
This is documented in the RK3399 technical reference manual (TRM)
section 17.5.9 "Interrupt Support".
MSI-X capability should therefore not be advertised. Remove the
MSI-X capability by editing the capability linked-list. The
previous entry is the MSI capability, therefore get the next
entry from the MSI-X capability entry and set it as next entry
for the MSI capability. This in effect removes MSI-X from the list.
Linked list before : MSI cap -> MSI-X cap -> PCIe Device cap -> ...
Linked list now : MSI cap -> PCIe Device cap -> ...
Link: https://lore.kernel.org/r/20230418074700.1083505-11-rick.wertenbroek@gmail.com
Fixes: cf590b0783 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc73ed0f1b ]
The RK3399 PCI endpoint core has 33 windows for PCIe space, now in the
driver up to 32 fixed size (1M) windows are used and pages are allocated
and mapped accordingly. The driver first used a single window and allocated
space inside which caused translation issues (between CPU space and PCI
space) because a window can only have a single translation at a given
time, which if multiple pages are allocated inside will cause conflicts.
Now each window is a single region of 1M which will always guarantee that
the translation is not in conflict.
Set the translation register addresses for physical function. As documented
in the technical reference manual (TRM) section 17.5.5 "PCIe Address
Translation" and section 17.6.8 "Address Translation Registers Description"
Link: https://lore.kernel.org/r/20230418074700.1083505-9-rick.wertenbroek@gmail.com
Fixes: cf590b0783 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7e3975636 ]
PCIe r6.0.1, sec 7.5.3.7, recommends setting the link control parameters,
then waiting for the Link Training bit to be clear before setting the
Retrain Link bit.
This avoids a race where the LTSSM may not use the updated parameters if it
is already in the midst of link training because of other normal link
activity.
Wait for the Link Training bit to be clear before toggling the Retrain Link
bit to ensure that the LTSSM uses the updated link control parameters.
[bhelgaas: commit log, return 0 (success)/-ETIMEDOUT instead of bool for
both pcie_wait_for_retrain() and the existing pcie_retrain_link()]
Suggested-by: Lukas Wunner <lukas@wunner.de>
Fixes: 7d715a6c1a ("PCI: add PCI Express ASPM support")
Link: https://lore.kernel.org/r/20230502083923.34562-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05f933d5f7 ]
Since commit 235602146e ("i2c-nomadik: turn the platform driver to an amba
driver"), there is no more request_mem_region() call in this driver.
So remove the release_mem_region() call from the remove function which is
likely a left over.
Fixes: 235602146e ("i2c-nomadik: turn the platform driver to an amba driver")
Cc: <stable@vger.kernel.org> # v3.6+
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c7174db4c ]
Replace the pair of functions, devm_clk_get() and
clk_prepare_enable(), with a single function
devm_clk_get_enabled().
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Stable-dep-of: 05f933d5f7 ("i2c: nomadik: Remove a useless call in the remove function")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06e9895782 ]
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding
size determination a bit safer according to the Linux coding style
convention.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Stable-dep-of: 05f933d5f7 ("i2c: nomadik: Remove a useless call in the remove function")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b3b21a854 ]
These issues were detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Stable-dep-of: 05f933d5f7 ("i2c: nomadik: Remove a useless call in the remove function")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a4a0b2a3e ]
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
# --- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
# +++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
# @@ -1,2 +1,4 @@
# QA output created by 255
# +ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
# +There may be more info in syslog - try dmesg | tail
# Silence is golden
# ...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26 ("Btrfs: qgroup implementation and prototypes")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed9ee98ecb ]
Split all the conditionals for the fsverity calls in end_page_read into
a btrfs_verify_page helper to keep the code readable and make additional
refactoring easier.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 2c14f0ffdd ("btrfs: fix fsverify read error handling in end_page_read")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9e26169cf ]
REGCACHE_RBTREE and REGCACHE_MAPLE dynamically allocate memory
for regmap operations. This is incompatible with spinlock based locking
which is used for fast_io operations. Disable locking for the associated
unit tests to avoid lockdep splashes.
Fixes: f033c26de5 ("regmap: Add maple tree based register cache")
Fixes: 2238959b6a ("regmap: Add some basic kunit tests")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230720032848.1306349-1-linux@roeck-us.net
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1945063eb5 ]
This allows to get rid of a call to pwmchip_remove() in the error path. There
is no .remove function for this driver, so this change fixes a resource leak
when a gpio-mvebu device is unbound.
Fixes: 757642f9a5 ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6adc2272aa ]
The check being unconditional may lead to unwanted denials reported by
LSMs when a process has the capability granted by DAC, but denied by an
LSM. In the case of SELinux such denials are a problem, since they can't
be effectively filtered out via the policy and when not silenced, they
produce noise that may hide a true problem or an attack.
Since not having the capability merely means that the created io_uring
context will be accounted against the current user's RLIMIT_MEMLOCK
limit, we can disable auditing of denials for this check by using
ns_capable_noaudit() instead of capable().
Fixes: 2b188cc1bb ("Add io_uring IO interface")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2193317
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/20230718115607.65652-1-omosnace@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2fceb59bb ]
The index field of the struct page corresponding to a guest ASCE should
be 0. When replacing the ASCE in s390_replace_asce(), the index of the
new ASCE should also be set to 0.
Having the wrong index might lead to the wrong addresses being passed
around when notifying pte invalidations, and eventually to validity
intercepts (VM crash) if the prefix gets unmapped and the notifier gets
called with the wrong address.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Fixes: faa2f72cb3 ("KVM: s390: pv: leak the topmost page table when destroy fails")
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20230705111937.33472-3-imbrenda@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ff9218157 ]
Simplify the shutdown of non-protected VMs. There is no need to do
complex manipulations of the counter if it was zero.
This also fixes a very rare race which caused pages to be torn down
from the address space with a non-zero counter even on older machines
that don't support the UVC instruction, causing a crash.
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: fb491d5500 ("KVM: s390: pv: asynchronous destroy for reboot")
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20230705111937.33472-2-imbrenda@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b59c9dc4d9 ]
Commit 8ef7b9e176 ("powerpc/pseries/vas: Close windows with DLPAR
core removal") unmaps the window paste address and issues HCALL to
close window in the hypervisor for migration or DLPAR core removal
events. So holds mmap_mutex and then mmap lock before unmap the
paste address. But if the user space issue mmap paste address at
the same time with the migration event, coproc_mmap() is called
after holding the mmap lock which can trigger deadlock when trying
to acquire mmap_mutex in coproc_mmap().
t1: mmap() call to mmap t2: Migration event
window paste address
do_mmap2() migration_store()
ksys_mmap_pgoff() pseries_migrate_partition()
vm_mmap_pgoff() vas_migration_handler()
Acquire mmap lock reconfig_close_windows()
do_mmap() lock mmap_mutex
mmap_region() Acquire mmap lock
call_mmap() //Wait for mmap lock
coproc_mmap() unmap vma
lock mmap_mutex update window status
//wait for mmap_mutex Release mmap lock
mmap vma unlock mmap_mutex
update window status
unlock mmap_mutex
...
Release mmap lock
Fix this deadlock issue by holding mmap lock first before mmap_mutex
in reconfig_close_windows().
Fixes: 8ef7b9e176 ("powerpc/pseries/vas: Close windows with DLPAR core removal")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230716100506.7833-1-haren@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7090426351 ]
We have seen rare IO stalls as follows:
* blk_mq_plug_issue_direct() is entered with an mq_list containing two
requests.
* For the first request, it sets last == false and enters the driver's
queue_rq callback.
* The driver queue_rq callback indirectly calls schedule() which calls
blk_flush_plug(). This may happen if the driver has the
BLK_MQ_F_BLOCKING flag set and is allowed to sleep in ->queue_rq.
* blk_flush_plug() handles the remaining request in the mq_list. mq_list
is now empty.
* The original call to queue_rq resumes (with last == false).
* The loop in blk_mq_plug_issue_direct() terminates because there are no
remaining requests in mq_list.
The IO is now stalled because the last request submitted to the driver
had last == false and there was no subsequent call to commit_rqs().
Fix this by returning early in blk_mq_flush_plug_list() if rq_count is 0
which it will be in the recursive case, rather than checking if the
mq_list is empty. At the same time, adjust one of the callers to skip
the mq_list empty check as it is not necessary.
Fixes: dc5fc361d8 ("block: attempt direct issue of plug list")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230714101106.3635611-1-ross.lagerwall@citrix.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e34c8dd238 ]
Following process,
jbd2_journal_commit_transaction
// there are several dirty buffer heads in transaction->t_checkpoint_list
P1 wb_workfn
jbd2_log_do_checkpoint
if (buffer_locked(bh)) // false
__block_write_full_page
trylock_buffer(bh)
test_clear_buffer_dirty(bh)
if (!buffer_dirty(bh))
__jbd2_journal_remove_checkpoint(jh)
if (buffer_write_io_error(bh)) // false
>> bh IO error occurs <<
jbd2_cleanup_journal_tail
__jbd2_update_log_tail
jbd2_write_superblock
// The bh won't be replayed in next mount.
, which could corrupt the ext4 image, fetch a reproducer in [Link].
Since writeback process clears buffer dirty after locking buffer head,
we can fix it by try locking buffer and check dirtiness while buffer is
locked, the buffer head can be removed if it is neither dirty nor locked.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Fixes: 470decc613 ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit e701156ccc upstream.
SMU13 overrides dynamic PCIe lane width and dynamic speed by when on
certain hosts. commit 38e4ced804 ("drm/amd/pm: conditionally disable
pcie lane switching for some sienna_cichlid SKUs") worked around this
issue by setting up certain SKUs to set up certain limits, but the same
fundamental problem with those hosts affects all SMU11 implmentations
as well, so align the SMU11 and SMU13 driver handling.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f6d9e38c4 upstream.
[Why]
Specific TBT4 dock doesn't send out short HPD to notify source
that IRQ event DOWN_REP_MSG_RDY is set. Which violates the spec
and cause source can't send out streams to mst sinks.
[How]
To cover this misbehavior, add an additional polling method to detect
DOWN_REP_MSG_RDY is set. HPD driven handling method is still kept.
Just hook up our handler to drm mgr->cbs->poll_hpd_irq().
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jerry Zuo <jerry.zuo@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 87279fdf5e upstream.
Fix the following errors & warnings reported by checkpatch:
ERROR: space required before the open brace '{'
ERROR: space required before the open parenthesis '('
ERROR: that open brace { should be on the previous line
ERROR: space prohibited before that ',' (ctx:WxW)
ERROR: else should follow close brace '}'
ERROR: open brace '{' following function definitions go on the next line
ERROR: code indent should use tabs where possible
WARNING: braces {} are not necessary for single statement blocks
WARNING: void function return statements are not generally useful
WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fcaa174a9c upstream.
In order to prevent request_queue to be freed before cleaning up
blktrace debugfs entries, commit db59133e92 ("scsi: sg: fix blktrace
debugfs entries leakage") use scsi_device_get(), however,
scsi_device_get() will also grab scsi module reference and scsi module
can't be removed.
It's reported that blktests can't unload scsi_debug after block/001:
blktests (master) # ./check block
block/001 (stress device hotplugging) [failed]
+++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
Running block/001
Stressing sd
+modprobe: FATAL: Module scsi_debug is in use.
Fix this problem by grabbing request_queue reference directly, so that
scsi host module can still be unloaded while request_queue will be
pinged by sg device.
Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
Fixes: db59133e92 ("scsi: sg: fix blktrace debugfs entries leakage")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230621160111.1433521-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df01b7cfce upstream.
`rustc` outputs by default the temporary files (i.e. the ones saved
by `-Csave-temps`, such as `*.rcgu*` files) in the current working
directory when `-o` and `--out-dir` are not given (even if
`--emit=x=path` is given, i.e. it does not use those for temporaries).
Since out-of-tree modules are compiled from the `linux` tree,
`rustc` then tries to create them there, which may not be accessible.
Thus pass `--out-dir` explicitly, even if it is just for the temporary
files.
Similarly, do so for Rust host programs too.
Reported-by: Raphael Nestler <raphael.nestler@gmail.com>
Closes: https://github.com/Rust-for-Linux/linux/issues/1015
Reported-by: Andrea Righi <andrea.righi@canonical.com>
Tested-by: Raphael Nestler <raphael.nestler@gmail.com> # non-hostprogs
Tested-by: Andrea Righi <andrea.righi@canonical.com> # non-hostprogs
Fixes: 295d8398c6 ("kbuild: specify output names separately for each emission type from rustc")
Cc: stable@vger.kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2d6fd9d6f upstream.
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable@vger.kernel.org
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 95b7015433 ]
Commit c13380a555 ("Bluetooth: btusb: Do not require hardcoded
interface numbers") inadvertedly broke bluetooth on Intel Macbook 2014.
The intention was to keep behavior intact when BTUSB_IFNUM_2 is set and
otherwise allow any interface numbers. The problem is that the new logic
condition omits the case where bInterfaceNumber is 0.
Fix BTUSB_IFNUM_2 handling by allowing both interface number 0 and 2
when the flag is set.
Fixes: c13380a555 ("Bluetooth: btusb: Do not require hardcoded interface numbers")
Reported-by: John Holland <johnbholland@icloud.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217651
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Tested-by: John Holland<johnbholland@icloud.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3dcaa192ac ]
Operations that check/update sk_state and access conn should hold
lock_sock, otherwise they can race.
The order of taking locks is hci_dev_lock > lock_sock > sco_conn_lock,
which is how it is in connect/disconnect_cfm -> sco_conn_del ->
sco_chan_del.
Fix locking in sco_connect to take lock_sock around updating sk_state
and conn.
sco_conn_del must not occur during sco_connect, as it frees the
sco_conn. Hold hdev->lock longer to prevent that.
sco_conn_add shall return sco_conn with valid hcon. Make it so also when
reusing an old SCO connection waiting for disconnect timeout (see
__sco_sock_close where conn->hcon is set to NULL).
This should not reintroduce the issue fixed in the earlier
commit 9a8ec9e8eb ("Bluetooth: SCO: Fix possible circular locking
dependency on sco_connect_cfm"), the relevant fix of releasing lock_sock
in sco_sock_connect before acquiring hdev->lock is retained.
These changes mirror similar fixes earlier in ISO sockets.
Fixes: 9a8ec9e8eb ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b4066eb04b ]
hci_connect_sco currently returns NULL when there is no link (i.e. when
hci_conn_link() returns NULL).
sco_connect() expects an ERR_PTR in case of any error (see line 266 in
sco.c). Thus, hcon set as NULL passes through to sco_conn_add(), which
tries to get hcon->hdev, resulting in dereferencing a NULL pointer as
reported by syzkaller.
The same issue exists for iso_connect_cis() calling hci_connect_cis().
Thus, make hci_connect_sco() and hci_connect_cis() return ERR_PTR
instead of NULL.
Reported-and-tested-by: syzbot+37acd5d80d00d609d233@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=37acd5d80d00d609d233
Fixes: 06149746e7 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de6dfcefd1 ]
KASAN reports that there's a use-after-free in
hci_remove_adv_monitor(). Trawling through the disassembly, you can
see that the complaint is from the access in bt_dev_dbg() under the
HCI_ADV_MONITOR_EXT_MSFT case. The problem case happens because
msft_remove_monitor() can end up freeing the monitor
structure. Specifically:
hci_remove_adv_monitor() ->
msft_remove_monitor() ->
msft_remove_monitor_sync() ->
msft_le_cancel_monitor_advertisement_cb() ->
hci_free_adv_monitor()
Let's fix the problem by just stashing the relevant data when it's
still valid.
Fixes: 7cf5c2978f ("Bluetooth: hci_sync: Refactor remove Adv Monitor")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d40ae85ee6 ]
sk->sk_state indicates whether iso_pi(sk)->conn is valid. Operations
that check/update sk_state and access conn should hold lock_sock,
otherwise they can race.
The order of taking locks is hci_dev_lock > lock_sock > iso_conn_lock,
which is how it is in connect/disconnect_cfm -> iso_conn_del ->
iso_chan_del.
Fix locking in iso_connect_cis/bis and sendmsg/recvmsg to take lock_sock
around updating sk_state and conn.
iso_conn_del must not occur during iso_connect_cis/bis, as it frees the
iso_conn. Hold hdev->lock longer to prevent that.
This should not reintroduce the issue fixed in commit 241f51931c
("Bluetooth: ISO: Avoid circular locking dependency"), since the we
acquire locks in order. We retain the fix in iso_sock_connect to release
lock_sock before iso_connect_* acquires hdev->lock.
Similarly for commit 6a5ad251b7 ("Bluetooth: ISO: Fix possible
circular locking dependency"). We retain the fix in iso_conn_ready to
not acquire iso_conn_lock before lock_sock.
iso_conn_add shall return iso_conn with valid hcon. Make it so also when
reusing an old CIS connection waiting for disconnect timeout (see
__iso_sock_close where conn->hcon is set to NULL).
Trace with iso_conn_del after iso_chan_add in iso_connect_cis:
===============================================================
iso_sock_create:771: sock 00000000be9b69b7
iso_sock_init:693: sk 000000004dff667e
iso_sock_bind:827: sk 000000004dff667e 70:1a:b8:98:ff:a2 type 1
iso_sock_setsockopt:1289: sk 000000004dff667e
iso_sock_setsockopt:1289: sk 000000004dff667e
iso_sock_setsockopt:1289: sk 000000004dff667e
iso_sock_connect:875: sk 000000004dff667e
iso_connect_cis:353: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da
hci_get_route:1199: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da
hci_conn_add:1005: hci0 dst 28:3d:c2:4a:7e:da
iso_conn_add:140: hcon 000000007b65d182 conn 00000000daf8625e
__iso_chan_add:214: conn 00000000daf8625e
iso_connect_cfm:1700: hcon 000000007b65d182 bdaddr 28:3d:c2:4a:7e:da status 12
iso_conn_del:187: hcon 000000007b65d182 conn 00000000daf8625e, err 16
iso_sock_clear_timer:117: sock 000000004dff667e state 3
<Note: sk_state is BT_BOUND (3), so iso_connect_cis is still
running at this point>
iso_chan_del:153: sk 000000004dff667e, conn 00000000daf8625e, err 16
hci_conn_del:1151: hci0 hcon 000000007b65d182 handle 65535
hci_conn_unlink:1102: hci0: hcon 000000007b65d182
hci_chan_list_flush:2780: hcon 000000007b65d182
iso_sock_getsockopt:1376: sk 000000004dff667e
iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e
iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e
iso_sock_getsockopt:1376: sk 000000004dff667e
iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e
iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e
iso_sock_shutdown:1434: sock 00000000be9b69b7, sk 000000004dff667e, how 1
__iso_sock_close:632: sk 000000004dff667e state 5 socket 00000000be9b69b7
<Note: sk_state is BT_CONNECT (5), even though iso_chan_del sets
BT_CLOSED (6). Only iso_connect_cis sets it to BT_CONNECT, so it
must be that iso_chan_del occurred between iso_chan_add and end of
iso_connect_cis.>
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 8000000006467067 P4D 8000000006467067 PUD 3f5f067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
RIP: 0010:__iso_sock_close (net/bluetooth/iso.c:664) bluetooth
===============================================================
Trace with iso_conn_del before iso_chan_add in iso_connect_cis:
===============================================================
iso_connect_cis:356: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da
...
iso_conn_add:140: hcon 0000000093bc551f conn 00000000768ae504
hci_dev_put:1487: hci0 orig refcnt 21
hci_event_packet:7607: hci0: event 0x0e
hci_cmd_complete_evt:4231: hci0: opcode 0x2062
hci_cc_le_set_cig_params:3846: hci0: status 0x07
hci_sent_cmd_data:3107: hci0 opcode 0x2062
iso_connect_cfm:1703: hcon 0000000093bc551f bdaddr 28:3d:c2:4a:7e:da status 7
iso_conn_del:187: hcon 0000000093bc551f conn 00000000768ae504, err 12
hci_conn_del:1151: hci0 hcon 0000000093bc551f handle 65535
hci_conn_unlink:1102: hci0: hcon 0000000093bc551f
hci_chan_list_flush:2780: hcon 0000000093bc551f
__iso_chan_add:214: conn 00000000768ae504
<Note: this conn was already freed in iso_conn_del above>
iso_sock_clear_timer:117: sock 0000000098323f95 state 3
general protection fault, probably for non-canonical address 0x30b29c630930aec8: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1920 Comm: bluetoothd Tainted: G E 6.3.0-rc7+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
RIP: 0010:detach_if_pending+0x28/0xd0
Code: 90 90 0f 1f 44 00 00 48 8b 47 08 48 85 c0 0f 84 ad 00 00 00 55 89 d5 53 48 83 3f 00 48 89 fb 74 7d 66 90 48 8b 03 48 8b 53 08 <>
RSP: 0018:ffffb90841a67d08 EFLAGS: 00010007
RAX: 0000000000000000 RBX: ffff9141bd5061b8 RCX: 0000000000000000
RDX: 30b29c630930aec8 RSI: ffff9141fdd21e80 RDI: ffff9141bd5061b8
RBP: 0000000000000001 R08: 0000000000000000 R09: ffffb90841a67b88
R10: 0000000000000003 R11: ffffffff8613f558 R12: ffff9141fdd21e80
R13: 0000000000000000 R14: ffff9141b5976010 R15: ffff914185755338
FS: 00007f45768bd840(0000) GS:ffff9141fdd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000619000424074 CR3: 0000000009f5e005 CR4: 0000000000170ee0
Call Trace:
<TASK>
timer_delete+0x48/0x80
try_to_grab_pending+0xdf/0x170
__cancel_work+0x37/0xb0
iso_connect_cis+0x141/0x400 [bluetooth]
===============================================================
Trace with NULL conn->hcon in state BT_CONNECT:
===============================================================
__iso_sock_close:619: sk 00000000f7c71fc5 state 1 socket 00000000d90c5fe5
...
__iso_sock_close:619: sk 00000000f7c71fc5 state 8 socket 00000000d90c5fe5
iso_chan_del:153: sk 00000000f7c71fc5, conn 0000000022c03a7e, err 104
...
iso_sock_connect:862: sk 00000000129b56c3
iso_connect_cis:348: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7d:2a
hci_get_route:1199: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7d:2a
hci_dev_hold:1495: hci0 orig refcnt 19
__iso_chan_add:214: conn 0000000022c03a7e
<Note: reusing old conn>
iso_sock_clear_timer:117: sock 00000000129b56c3 state 3
...
iso_sock_ready:1485: sk 00000000129b56c3
...
iso_sock_sendmsg:1077: sock 00000000e5013966, sk 00000000129b56c3
BUG: kernel NULL pointer dereference, address: 00000000000006a8
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1403 Comm: wireplumber Tainted: G E 6.3.0-rc7+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
RIP: 0010:iso_sock_sendmsg+0x63/0x2a0 [bluetooth]
===============================================================
Fixes: 241f51931c ("Bluetooth: ISO: Avoid circular locking dependency")
Fixes: 6a5ad251b7 ("Bluetooth: ISO: Fix possible circular locking dependency")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 195ef75e19 ]
hci_update_accept_list_sync iterates over hdev->pend_le_conns and
hdev->pend_le_reports, and waits for controller events in the loop body,
without holding hdev lock.
Meanwhile, these lists and the items may be modified e.g. by
le_scan_cleanup. This can invalidate the list cursor or any other item
in the list, resulting to invalid behavior (eg use-after-free).
Use RCU for the hci_conn_params action lists. Since the loop bodies in
hci_sync block and we cannot use RCU or hdev->lock for the whole loop,
copy list items first and then iterate on the copy. Only the flags field
is written from elsewhere, so READ_ONCE/WRITE_ONCE should guarantee we
read valid values.
Free params everywhere with hci_conn_params_free so the cleanup is
guaranteed to be done properly.
This fixes the following, which can be triggered e.g. by BlueZ new
mgmt-tester case "Add + Remove Device Nowait - Success", or by changing
hci_le_set_cig_params to always return false, and running iso-tester:
==================================================================
BUG: KASAN: slab-use-after-free in hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
Read of size 8 at addr ffff888001265018 by task kworker/u3:0/32
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl (./arch/x86/include/asm/irqflags.h:134 lib/dump_stack.c:107)
print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
? __virt_addr_valid (./include/linux/mmzone.h:1915 ./include/linux/mmzone.h:2011 arch/x86/mm/physaddr.c:65)
? hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
kasan_report (mm/kasan/report.c:538)
? hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
? __pfx_hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2780)
? mutex_lock (kernel/locking/mutex.c:282)
? __pfx_mutex_lock (kernel/locking/mutex.c:282)
? __pfx_mutex_unlock (kernel/locking/mutex.c:538)
? __pfx_update_passive_scan_sync (net/bluetooth/hci_sync.c:2861)
hci_cmd_sync_work (net/bluetooth/hci_sync.c:306)
process_one_work (./arch/x86/include/asm/preempt.h:27 kernel/workqueue.c:2399)
worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2538)
? __pfx_worker_thread (kernel/workqueue.c:2480)
kthread (kernel/kthread.c:376)
? __pfx_kthread (kernel/kthread.c:331)
ret_from_fork (arch/x86/entry/entry_64.S:314)
</TASK>
Allocated by task 31:
kasan_save_stack (mm/kasan/common.c:46)
kasan_set_track (mm/kasan/common.c:52)
__kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383)
hci_conn_params_add (./include/linux/slab.h:580 ./include/linux/slab.h:720 net/bluetooth/hci_core.c:2277)
hci_connect_le_scan (net/bluetooth/hci_conn.c:1419 net/bluetooth/hci_conn.c:1589)
hci_connect_cis (net/bluetooth/hci_conn.c:2266)
iso_connect_cis (net/bluetooth/iso.c:390)
iso_sock_connect (net/bluetooth/iso.c:899)
__sys_connect (net/socket.c:2003 net/socket.c:2020)
__x64_sys_connect (net/socket.c:2027)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
Freed by task 15:
kasan_save_stack (mm/kasan/common.c:46)
kasan_set_track (mm/kasan/common.c:52)
kasan_save_free_info (mm/kasan/generic.c:523)
__kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244)
__kmem_cache_free (mm/slub.c:1807 mm/slub.c:3787 mm/slub.c:3800)
hci_conn_params_del (net/bluetooth/hci_core.c:2323)
le_scan_cleanup (net/bluetooth/hci_conn.c:202)
process_one_work (./arch/x86/include/asm/preempt.h:27 kernel/workqueue.c:2399)
worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2538)
kthread (kernel/kthread.c:376)
ret_from_fork (arch/x86/entry/entry_64.S:314)
==================================================================
Fixes: e8907f7654 ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87b5a5c209 ]
end key should be equal to start unless NFT_SET_EXT_KEY_END is present.
Its possible to add elements that only have a start key
("{ 1.0.0.0 . 2.0.0.0 }") without an internval end.
Insertion treats this via:
if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
end = (const u8 *)nft_set_ext_key_end(ext)->data;
else
end = start;
but removal side always uses nft_set_ext_key_end().
This is wrong and leads to garbage remaining in the set after removal
next lookup/insert attempt will give:
BUG: KASAN: slab-use-after-free in pipapo_get+0x8eb/0xb90
Read of size 1 at addr ffff888100d50586 by task nft-pipapo_uaf_/1399
Call Trace:
kasan_report+0x105/0x140
pipapo_get+0x8eb/0xb90
nft_pipapo_insert+0x1dc/0x1710
nf_tables_newsetelem+0x31f5/0x4e00
..
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: lonial con <kongln9170@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 314c828416 ]
Can be called via nft set element list iteration, which may acquire
rcu and/or bh read lock (depends on set type).
BUG: sleeping function called from invalid context at net/netfilter/nf_tables_api.c:3353
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1232, name: nft
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by nft/1232:
#0: ffff8881180e3ea8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid
#1: ffffffff83f5f540 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire
Call Trace:
nft_chain_validate
nft_lookup_validate_setelem
nft_pipapo_walk
nft_lookup_validate
nft_chain_validate
nft_immediate_validate
nft_chain_validate
nf_tables_validate
nf_tables_abort
No choice but to move it to nf_tables_validate().
Fixes: 81ea010667 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ddbd8be689 ]
On some platforms there is a padding hole in the nft_verdict
structure, between the verdict code and the chain pointer.
On element insertion, if the new element clashes with an existing one and
NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as
the data associated with duplicated element is the same as the existing
one. The data equality check uses memcmp.
For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT
padding area leads to spurious failure even if the verdict data is the
same.
This then makes the insertion fail with 'already exists' error, even
though the new "key : data" matches an existing entry and userspace
told the kernel that it doesn't want to receive an error indication.
Fixes: c016c7e45d ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6631463b6e ]
Now these upper layer protocol handlers can be called from llc_rcv()
as sap->rcv_func(), which is registered by llc_sap_open().
* function which is passed to register_8022_client()
-> no in-kernel user calls register_8022_client().
* snap_rcv()
`- proto->rcvfunc() : registered by register_snap_client()
-> aarp_rcv() and atalk_rcv() drop packets from non-root netns
* stp_pdu_rcv()
`- garp_protos[]->rcv() : registered by stp_proto_register()
-> garp_pdu_rcv() and br_stp_rcv() are netns-aware
So, we can safely remove the netns restriction in llc_rcv().
Fixes: e730c15519 ("[NET]: Make packet reception network namespace safe")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e88761f5f ]
This func misses checking for platform_get_irq()'s call and may passes the
negative error codes to request_irq(), which takes unsigned IRQ #,
causing it to fail with -EINVAL, overriding an original error code.
Fix this by stop calling request_irq() with invalid IRQ #s.
Fixes: 1630d85a83 ("au1200fb: fix hardcoded IRQ")
Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81b3ade5d2 ]
This reverts commit 3f4ca5fafc.
Commit 3f4ca5fafc ("tcp: avoid the lookup process failing to get sk in
ehash table") reversed the order in how a socket is inserted into ehash
to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are
swapped. However, it introduced another lookup failure.
The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU
and does not have SOCK_RCU_FREE, so the socket could be reused even while
it is being referenced on another CPU doing RCU lookup.
Let's say a socket is reused and inserted into the same hash bucket during
lookup. After the blamed commit, a new socket is inserted at the end of
the list. If that happens, we will skip sockets placed after the previous
position of the reused socket, resulting in ehash lookup failure.
As described in Documentation/RCU/rculist_nulls.rst, we should insert a
new socket at the head of the list to avoid such an issue.
This issue, the swap-lookup-failure, and another variant reported in [0]
can all be handled properly by adding a locked ehash lookup suggested by
Eric Dumazet [1].
However, this issue could occur for every packet, thus more likely than
the other two races, so let's revert the change for now.
Link: https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/ [0]
Link: https://lore.kernel.org/netdev/CANn89iK8snOz8TYOhhwfimC7ykYA78GA3Nyv8x06SZYa1nKdyA@mail.gmail.com/ [1]
Fixes: 3f4ca5fafc ("tcp: avoid the lookup process failing to get sk in ehash table")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230717215918.15723-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit daa751444f ]
key might contain private part of the key, so better use
kfree_sensitive to free it.
Fixes: 38320c70d2 ("[IPSEC]: Use crypto_aead and authenc in ESP")
Signed-off-by: Wang Ming <machel@vivo.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0a8966e2b ]
When using IPv4/TCP, skb->hash comes from sk->sk_txhash except in
TIME_WAIT and SYN_RECV where it's not set in the reply skb from
ip_send_unicast_reply. Those packets will have a mismatched hash with
others from the same flow as their hashes will be 0. IPv6 does not have
the same issue as the hash is set from the socket txhash in those cases.
This commits sets the hash in the reply skb from ip_send_unicast_reply,
which makes the IPv4 code behaving like IPv6.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 5e5265522a ("tcp: annotate data-races around tcp_rsk(req)->txhash")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78adb4bcf9 ]
In normal operation, each populated queue item has
next_to_watch pointing to the last TX desc of the packet,
while each cleaned item has it set to 0. In particular,
next_to_use that points to the next (necessarily clean)
item to use has next_to_watch set to 0.
When the TX queue is used both by an application using
AF_XDP with ZEROCOPY as well as a second non-XDP application
generating high traffic, the queue pointers can get in
an invalid state where next_to_use points to an item
where next_to_watch is NOT set to 0.
However, the implementation assumes at several places
that this is never the case, so if it does hold,
bad things happen. In particular, within the loop inside
of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
Finally, this prevents any further transmission via
this queue and it never gets unblocked or signaled.
Secondly, if the queue is in this garbled state,
the inner loop of igc_clean_tx_ring() will never terminate,
completely hogging a CPU core.
The reason is that igc_xdp_xmit_zc() reads next_to_use
before acquiring the lock, and writing it back
(potentially unmodified) later. If it got modified
before locking, the outdated next_to_use is written
pointing to an item that was already used elsewhere
(and thus next_to_watch got written).
Fixes: 9acf59a752 ("igc: Enable TX via AF_XDP zero-copy")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95b6814855 ]
High XDP load triggers the netdev watchdog:
|NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
The reason is the Tx queue transmission start (txq->trans_start) is not updated
in XDP code path. Therefore, add it for all XDP transmission functions.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: 78adb4bcf9 ("igc: Prevent garbled TX queue with XDP ZEROCOPY")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a3f25d614b ]
When running an freplace attached bpf program on an arm64 system w were
seeing the following issue:
Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI
After a bit of work to track it down I determined that what appeared to be
happening is that the 'bti c' at the start of the program was somehow being
reached after a 'br' instruction. Further digging pointed me toward the
fact that the function was attached via freplace. This in turn led me to
build_plt which I believe is invoking the long jump which is triggering
this error.
To resolve it we can replace the 'bti c' with 'bti jc' and add a comment
explaining why this has to be modified as such.
Fixes: b2ad54e153 ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/168926677665.316237.9953845318337455525.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5e9ad522c ]
While the check_max_stack_depth function explores call chains emanating
from the main prog, which is typically enough to cover all possible call
chains, it doesn't explore those rooted at async callbacks unless the
async callback will have been directly called, since unlike non-async
callbacks it skips their instruction exploration as they don't
contribute to stack depth.
It could be the case that the async callback leads to a callchain which
exceeds the stack depth, but this is never reachable while only
exploring the entry point from main subprog. Hence, repeat the check for
the main subprog *and* all async callbacks marked by the symbolic
execution pass of the verifier, as execution of the program may begin at
any of them.
Consider functions with following stack depths:
main: 256
async: 256
foo: 256
main:
rX = async
bpf_timer_set_callback(...)
async:
foo()
Here, async is not descended as it does not contribute to stack depth of
main (since it is referenced using bpf_pseudo_func and not
bpf_pseudo_call). However, when async is invoked asynchronously, it will
end up breaching the MAX_BPF_STACK limit by calling foo.
Hence, in addition to main, we also need to explore call chains
beginning at all async callback subprogs in a program.
Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba7b3e7d5f ]
The assignment to idx in check_max_stack_depth happens once we see a
bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of
the code performs a few checks and then pushes the frame to the frame
stack, except the case of async callbacks. If the async callback case
causes the loop iteration to be skipped, the idx assignment will be
incorrect on the next iteration of the loop. The value stored in the
frame stack (as the subprogno of the current subprog) will be incorrect.
This leads to incorrect checks and incorrect tail_call_reachable
marking. Save the target subprog in a new variable and only assign to
idx once we are done with the is_async_cb check which may skip pushing
of frame to the frame stack and subsequent stack depth checks and tail
call markings.
Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2033ab9038 ]
Cited commit converted the neighbour code to use the standard RCU
variant instead of the RCU-bh variant, but the VRF code still uses
rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup
code in its IPv4 and IPv6 output paths, resulting in lockdep splats
[1][2]. Can be reproduced using [3].
Fix by switching to rcu_read_lock() / rcu_read_unlock().
[1]
=============================
WARNING: suspicious RCU usage
6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted
-----------------------------
include/net/neighbour.h:302 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by ping/183:
#0: ffff888105ea1d80 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xc6c/0x33c0
#1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output+0x2e3/0x2030
stack backtrace:
CPU: 0 PID: 183 Comm: ping Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xc1/0xf0
lockdep_rcu_suspicious+0x211/0x3b0
vrf_output+0x1380/0x2030
ip_push_pending_frames+0x125/0x2a0
raw_sendmsg+0x200d/0x33c0
inet_sendmsg+0xa2/0xe0
__sys_sendto+0x2aa/0x420
__x64_sys_sendto+0xe5/0x1c0
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[2]
=============================
WARNING: suspicious RCU usage
6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted
-----------------------------
include/net/neighbour.h:302 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by ping6/182:
#0: ffff888114b63000 (sk_lock-AF_INET6){+.+.}-{0:0}, at: rawv6_sendmsg+0x1602/0x3e50
#1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output6+0xe9/0x1310
stack backtrace:
CPU: 0 PID: 182 Comm: ping6 Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xc1/0xf0
lockdep_rcu_suspicious+0x211/0x3b0
vrf_output6+0xd32/0x1310
ip6_local_out+0xb4/0x1a0
ip6_send_skb+0xbc/0x340
ip6_push_pending_frames+0xe5/0x110
rawv6_sendmsg+0x2e6e/0x3e50
inet_sendmsg+0xa2/0xe0
__sys_sendto+0x2aa/0x420
__x64_sys_sendto+0xe5/0x1c0
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[3]
#!/bin/bash
ip link add name vrf-red up numtxqueues 2 type vrf table 10
ip link add name swp1 up master vrf-red type dummy
ip address add 192.0.2.1/24 dev swp1
ip address add 2001:db8:1::1/64 dev swp1
ip neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud perm dev swp1
ip neigh add 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud perm dev swp1
ip vrf exec vrf-red ping 192.0.2.2 -c 1 &> /dev/null
ip vrf exec vrf-red ping6 2001:db8:1::2 -c 1 &> /dev/null
Fixes: 09eed1192c ("neighbour: switch to standard rcu, instead of rcu_bh")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/netdev/CA+G9fYtEr-=GbcXNDYo3XOkwR+uYgehVoDjsP0pFLUpZ_AZcyg@mail.gmail.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230715153605.4068066-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c34743daca ]
The reset task is currently scheduled from the watchdog or adminq tasks.
First, all direct calls to schedule the reset task are replaced with the
iavf_schedule_reset(), which is modified to accept the flag showing the
type of reset.
To prevent the reset task from starting once iavf_remove() starts, we need
to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now
easily added to iavf_schedule_reset().
Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task.
It is redundant since all callers who set the flag immediately schedules
the reset task.
Fixes: 3ccd54ef44 ("iavf: Fix init state closure on remove")
Fixes: 14756b2ae2 ("iavf: Fix __IAVF_RESETTING state usage")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1639a1731 ]
A driver's lock (crit_lock) is used to serialize all the driver's tasks.
Lockdep, however, shows a circular dependency between rtnl and
crit_lock. This happens when an ndo that already holds the rtnl requests
the driver to reset, since the reset task (in some paths) tries to grab
rtnl to either change real number of queues of update netdev features.
[566.241851] ======================================================
[566.241893] WARNING: possible circular locking dependency detected
[566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE
[566.241984] ------------------------------------------------------
[566.242025] repro.sh/2604 is trying to acquire lock:
[566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
[566.242167]
but task is already holding lock:
[566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
[566.242300]
which lock already depends on the new lock.
[566.242353]
the existing dependency chain (in reverse order) is:
[566.242401]
-> #1 (rtnl_mutex){+.+.}-{3:3}:
[566.242451] __mutex_lock+0xc1/0xbb0
[566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf]
[566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf]
[566.242627] process_one_work+0x2b3/0x560
[566.242663] worker_thread+0x4f/0x3a0
[566.242696] kthread+0xf2/0x120
[566.242730] ret_from_fork+0x29/0x50
[566.242763]
-> #0 (&adapter->crit_lock){+.+.}-{3:3}:
[566.242815] __lock_acquire+0x15ff/0x22b0
[566.242869] lock_acquire+0xd2/0x2c0
[566.242901] __mutex_lock+0xc1/0xbb0
[566.242934] iavf_close+0x3c/0x240 [iavf]
[566.242997] __dev_close_many+0xac/0x120
[566.243036] dev_close_many+0x8b/0x140
[566.243071] unregister_netdevice_many_notify+0x165/0x7c0
[566.243116] unregister_netdevice_queue+0xd3/0x110
[566.243157] iavf_remove+0x6c1/0x730 [iavf]
[566.243217] pci_device_remove+0x33/0xa0
[566.243257] device_release_driver_internal+0x1bc/0x240
[566.243299] pci_stop_bus_device+0x6c/0x90
[566.243338] pci_stop_and_remove_bus_device+0xe/0x20
[566.243380] pci_iov_remove_virtfn+0xd1/0x130
[566.243417] sriov_disable+0x34/0xe0
[566.243448] ice_free_vfs+0x2da/0x330 [ice]
[566.244383] ice_sriov_configure+0x88/0xad0 [ice]
[566.245353] sriov_numvfs_store+0xde/0x1d0
[566.246156] kernfs_fop_write_iter+0x15e/0x210
[566.246921] vfs_write+0x288/0x530
[566.247671] ksys_write+0x74/0xf0
[566.248408] do_syscall_64+0x58/0x80
[566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[566.249886]
other info that might help us debug this:
[566.252014] Possible unsafe locking scenario:
[566.253432] CPU0 CPU1
[566.254118] ---- ----
[566.254800] lock(rtnl_mutex);
[566.255514] lock(&adapter->crit_lock);
[566.256233] lock(rtnl_mutex);
[566.256897] lock(&adapter->crit_lock);
[566.257388]
*** DEADLOCK ***
The deadlock can be triggered by a script that is continuously resetting
the VF adapter while doing other operations requiring RTNL, e.g:
while :; do
ip link set $VF up
ethtool --set-channels $VF combined 2
ip link set $VF down
ip link set $VF up
ethtool --set-channels $VF combined 4
ip link set $VF down
done
Any operation that triggers a reset can substitute "ethtool --set-channles"
As a fix, add a new task "finish_config" that do all the work which
needs rtnl lock. With the exception of iavf_remove(), all work that
require rtnl should be called from this task.
As for iavf_remove(), at the point where we need to call
unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
finish_config work cannot restart after that since the task is guarded
by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
Fixes: 5ac49f3c27 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2ed2403f1 ]
There was a fail when trying to add the interface to bonding
right after changing the MTU on the interface. It was caused
by bonding interface unable to open the interface due to
interface being in __RESETTING state because of MTU change.
Add new reset_waitqueue to indicate that reset has finished.
Add waiting for reset to finish in callbacks which trigger hw reset:
iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
We use a 5000ms timeout period because on Hyper-V based systems,
this operation takes around 3000-4000ms. In normal circumstances,
it doesn't take more than 500ms to complete.
Add a function iavf_wait_for_reset() to reuse waiting for reset code and
use it also in iavf_set_channels(), which already waits for reset.
We don't use error handling in iavf_set_channels() as this could
cause the device to be in incorrect state if the reset was scheduled
but hit timeout or the waitng function was interrupted by a signal.
Fixes: 4e5e6b5d9d ("iavf: Fix return of set the new channel count")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a77ed5c5b7 ]
If the system tries to close the netdev while iavf_reset_task() is
running, __LINK_STATE_START will be cleared and netif_running() will
return false in iavf_reinit_interrupt_scheme(). This will result in
iavf_free_traffic_irqs() not being called and a leak as follows:
[7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
[7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
is shown when pci_disable_msix() is later called. Fix by using the
internal adapter state. The traffic IRQs will always exist if
state == __IAVF_RUNNING.
Fixes: 5b36e8d04b ("i40evf: Enable VF to request an alternate queue allocation")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 162d626f30 ]
Referenced commit missed that for chip versions 42 and 43 ASPM
remained disabled in the respective rtl_hw_start_...() routines.
This resulted in problems as described in the referenced bug
ticket. Therefore re-instantiate the previous logic.
Fixes: 5fc3f6c90c ("r8169: consolidate disabling ASPM before EPHY access")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217635
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4bdf79d686 ]
The KSZ8795 driver code was modified to use on KSZ8863/73, which has
different register definitions. Some of the new KSZ8795 register
information are wrong compared to previous code.
KSZ8795 also behaves differently in that the STATIC_MAC_TABLE_USE_FID
and STATIC_MAC_TABLE_FID bits are off by 1 when doing MAC table reading
than writing. To compensate that a special code was added to shift the
register value by 1 before applying those bits. This is wrong when the
code is running on KSZ8863, so this special code is only executed when
KSZ8795 is detected.
Fixes: 4b20a07e10 ("net: dsa: microchip: ksz8795: add support for ksz88xx chips")
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26a2219492 ]
If cls_bpf_offload errors out, we must also undo tcf_bind_filter that
was done before the error.
Fix that by calling tcf_unbind_filter in errout_parms.
Fixes: eadb41489f ("net: cls_bpf: add support for marking filters as hardware-only")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8d3d78c19 ]
In the case of an update, when TCA_U32_LINK is set, u32_set_parms will
decrement the refcount of the ht_down (struct tc_u_hnode) pointer
present in the older u32 filter which we are replacing. However, if
u32_replace_hw_knode errors out, the update command fails and that
ht_down pointer continues decremented. To fix that, when
u32_replace_hw_knode fails, check if ht_down's refcount was decremented
and undo the decrement.
Fixes: d34e3e1813 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3d0e04894 ]
In case an error occurred after mall_set_parms executed successfully, we
must undo the tcf_bind_filter call it issues.
Fix that by calling tcf_unbind_filter in err_replace_hw_filter label.
Fixes: ec2507d2a3 ("net/sched: cls_matchall: Fix error path")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 469e2f28c2 ]
This doesn't check how many bytes the simple_write_to_buffer() writes to
the buffer. The only thing that we know is that the first byte is
initialized and the last byte of the buffer is set to NUL. However
the middle bytes could be uninitialized.
There is no need to use simple_write_to_buffer(). This code does not
support partial writes but instead passes "pos = 0" as the starting
offset regardless of what the user passed as "*ppos". Just use the
copy_from_user() function and initialize the whole buffer.
Fixes: 671e0b9005 ("ASoC: SOF: Clone the trace code to ipc3-dtrace as fw_tracing implementation")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/74148292-ce4d-4e01-a1a7-921e6767da14@moroto.mountain
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24a3298ac9 ]
Since commit 6624e780a5 ("ice: split ice_vsi_setup into smaller
functions") ice_vsi_release does things twice. There is unregister
netdev which is unregistered in ice_deinit_eth also.
It also unregisters the devlink_port twice which is also unregistered
in ice_deinit_eth(). This double deregistration is hidden because
devl_port_unregister ignores the return value of xa_erase.
[ 68.642167] Call Trace:
[ 68.650385] ice_devlink_destroy_pf_port+0xe/0x20 [ice]
[ 68.655656] ice_vsi_release+0x445/0x690 [ice]
[ 68.660147] ice_deinit+0x99/0x280 [ice]
[ 68.664117] ice_remove+0x1b6/0x5c0 [ice]
[ 171.103841] Call Trace:
[ 171.109607] ice_devlink_destroy_pf_port+0xf/0x20 [ice]
[ 171.114841] ice_remove+0x158/0x270 [ice]
[ 171.118854] pci_device_remove+0x3b/0xc0
[ 171.122779] device_release_driver_internal+0xc7/0x170
[ 171.127912] driver_detach+0x54/0x8c
[ 171.131491] bus_remove_driver+0x77/0xd1
[ 171.135406] pci_unregister_driver+0x2d/0xb0
[ 171.139670] ice_module_exit+0xc/0x55f [ice]
Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69cba9d3c1 ]
When the number of responses with status of STATUS_IO_TIMEOUT
exceeds a specified threshold (NUM_STATUS_IO_TIMEOUT), we reconnect
the connection. But we do not return the mid, or the credits
returned for the mid, or reduce the number of in-flight requests.
This bug could result in the server->in_flight count to go bad,
and also cause a leak in the mids.
This change moves the check to a few lines below where the
response is decrypted, even of the response is read from the
transform header. This way, the code for returning the mids
can be reused.
Also, the cifs_reconnect was reconnecting just the transport
connection before. In case of multi-channel, this may not be
what we want to do after several timeouts. Changed that to
reconnect the session and the tree too.
Also renamed NUM_STATUS_IO_TIMEOUT to a more appropriate name
MAX_STATUS_IO_TIMEOUT.
Fixes: 8e670f77c4 ("Handle STATUS_IO_TIMEOUT gracefully")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c20ecf7bb6 ]
The ida_alloc_range() function returns negative error codes on error.
On success it returns values in the min to max range (inclusive). It
never returns more then INT_MAX even if "max" is higher. It never
returns values in the 0 to (min - 1) range.
The bug is that "min" is an unsigned int so negative error codes will
be promoted to high positive values errors treated as success.
Fixes: 1a14bf0fc7 ("iommu/sva: Use GFP_KERNEL for pasid allocation")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/6b32095d-7491-4ebb-a850-12e96209eaaf@kili.mountain
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d6d537dc5 ]
Move the call to of_get_ethdev_address to mtk_add_mac which is part of
the probe function and can hence itself return -EPROBE_DEFER should
of_get_ethdev_address return -EPROBE_DEFER. This allows us to entirely
get rid of the mtk_init function.
The problem of of_get_ethdev_address returning -EPROBE_DEFER surfaced
in situations in which the NVMEM provider holding the MAC address has
not yet be loaded at the time mtk_eth_soc is initially probed. In this
case probing of mtk_eth_soc should be deferred instead of falling back
to use a random MAC address, so once the NVMEM provider becomes
available probing can be repeated.
Fixes: 656e705243 ("net-next: mediatek: add support for MT7623 ethernet")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56a16035bb ]
When we create an L2 loop on a bridge in netns, we will see packets storm
even if STP is enabled.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
# ip link set veth1 master br0 up
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
# sleep 30
# ip -s link show br0
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
956553768 12861249 0 0 0 12861249 <-. Keep
TX: bytes packets errors dropped carrier collsns | increasing
1027834 11951 0 0 0 0 <-' rapidly
This is because llc_rcv() drops all packets in non-root netns and BPDU
is dropped.
Let's add extack warning when enabling STP in netns.
# unshare -n
# ip link add br0 type bridge
# ip link set br0 type bridge stp_state 1
Warning: bridge: STP does not work in non-root netns.
Note this commit will be reverted later when we namespacify the whole LLC
infra.
Fixes: e730c15519 ("[NET]: Make packet reception network namespace safe")
Suggested-by: Harry Coin <hcoin@quietfountain.com>
Link: https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com/
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b685f1a589 ]
CPSW ALE has 75 bit ALE entries which are stored within three 32 bit words.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions assume that the
field will be strictly contained within one word. However, this is not
guaranteed to be the case and it is possible for ALE field entries to span
across up to two words at the most.
Fix the methods to handle getting/setting fields spanning up to two words.
Fixes: db82173f23 ("netdev: driver: ethernet: add cpsw address lookup engine support")
Signed-off-by: Tanmay Patil <t-patil@ti.com>
[s-vadapalli@ti.com: rephrased commit message and added Fixes tag]
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95ce158b6c ]
I get sporadic timeouts from the driver when using the
MV88E6352. Reading the status again after the loop fixes the
problem: the operation is successful but goes undetected.
Some added prints show things like this:
[ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
for switch, addr 1b reg 0b, mask 8000, val 0000, data c000
[ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for
ATU op 4000, fid 0001
(...)
[ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
for switch, addr 1c reg 18, mask 8000, val 0000, data 9860
[ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting
for PHY command 1860 to complete
The reason is probably not the commands: I think those are
mostly fine with the 50+50ms timeout, but the problem
appears when OpenWrt brings up several interfaces in
parallel on a system with 7 populated ports: if one of
them take more than 50 ms and waits one or more of the
others can get stuck on the mutex for the switch and then
this can easily multiply.
As we sleep and wait, the function loop needs a final
check after exiting the loop if we were successful.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Cc: Tobias Waldekranz <tobias@waldekranz.com>
Fixes: 35da1dfd94 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230712223405.861899-1-linus.walleij@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bf99f6be2d ]
Use new cifs_smb_ses_inc_refcount() helper to get an active reference
of @ses and @ses->dfs_root_ses (if set). This will prevent
@ses->dfs_root_ses of being put in the next call to cifs_put_smb_ses()
and thus potentially causing an use-after-free bug.
Fixes: 8e3554150d ("cifs: fix sharing of DFS connections")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8cc32a9bbf ]
Commit 6eb4bd92c1 ("kallsyms: strip LTO suffixes from static functions")
stripped all function/variable suffixes started with '.' regardless
of whether those suffixes are generated at LTO mode or not. In fact,
as far as I know, in LTO mode, when a static function/variable is
promoted to the global scope, '.llvm.<...>' suffix is added.
The existing mechanism breaks live patch for a LTO kernel even if
no <symbol>.llvm.<...> symbols are involved. For example, for the following
kernel symbols:
$ grep bpf_verifier_vlog /proc/kallsyms
ffffffff81549f60 t bpf_verifier_vlog
ffffffff8268b430 d bpf_verifier_vlog._entry
ffffffff8282a958 d bpf_verifier_vlog._entry_ptr
ffffffff82e12a1f d bpf_verifier_vlog.__already_done
'bpf_verifier_vlog' is a static function. '_entry', '_entry_ptr' and
'__already_done' are static variables used inside 'bpf_verifier_vlog',
so llvm promotes them to file-level static with prefix 'bpf_verifier_vlog.'.
Note that the func-level to file-level static function promotion also
happens without LTO.
Given a symbol name 'bpf_verifier_vlog', with LTO kernel, current mechanism will
return 4 symbols to live patch subsystem which current live patching
subsystem cannot handle it. With non-LTO kernel, only one symbol
is returned.
In [1], we have a lengthy discussion, the suggestion is to separate two
cases:
(1). new symbols with suffix which are generated regardless of whether
LTO is enabled or not, and
(2). new symbols with suffix generated only when LTO is enabled.
The cleanup_symbol_name() should only remove suffixes for case (2).
Case (1) should not be changed so it can work uniformly with or without LTO.
This patch removed LTO-only suffix '.llvm.<...>' so live patching and
tracing should work the same way for non-LTO kernel.
The cleanup_symbol_name() in scripts/kallsyms.c is also changed to have the same
filtering pattern so both kernel and kallsyms tool have the same
expectation on the order of symbols.
[1] https://lore.kernel.org/live-patching/20230615170048.2382735-1-song@kernel.org/T/#u
Fixes: 6eb4bd92c1 ("kallsyms: strip LTO suffixes from static functions")
Reported-by: Song Liu <song@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628181926.4102448-1-yhs@fb.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e7de35eb7 ]
The mirror_num_ret is allowed to be NULL, although it has to be set when
smap is set. Unfortunately that is not a well enough specifiable
invariant for static type checkers, so add a NULL check to make sure they
are fine.
Fixes: 03793cbbc8 ("btrfs: add fast path for single device io in __btrfs_map_block")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1feece2780 ]
-L only specifies the search path for libraries directly provided in the
link line with -l. Because -lopencsd isn't specified, it's only linked
because it's a dependency of -lopencsd_c_api. Dependencies like this are
resolved using the default system search paths or -rpath-link=... rather
than -L. This means that compilation only works if OpenCSD is installed
to the system rather than provided with the CSLIBS (-L) option.
This could be fixed by adding -Wl,-rpath-link=$(CSLIBS) but that is less
conventional than just adding -lopencsd to the link line so that it uses
-L. -lopencsd seems to have been removed in commit ed17b19149
("perf tools: Drop requirement for libstdc++.so for libopencsd check")
because it was thought that there was a chance compilation would work
even if it didn't exist, but I think that only applies to libstdc++ so
there is no harm to add it back. libopencsd.so and libopencsd_c_api.so
would always exist together.
Testing
=======
The following scenarios now all work:
* Cross build with OpenCSD installed
* Cross build using CSLIBS=...
* Native build with OpenCSD installed
* Native build using CSLIBS=...
* Static cross build with OpenCSD installed
* Static cross build with CSLIBS=...
Committer testing:
⬢[acme@toolbox perf-tools]$ alias m
alias m='make -k BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools -C tools/perf install-bin && git status && perf test python ; perf record -o /dev/null sleep 0.01 ; perf stat --null sleep 0.01'
⬢[acme@toolbox perf-tools]$ ldd ~/bin/perf | grep csd
libopencsd_c_api.so.1 => /lib64/libopencsd_c_api.so.1 (0x00007fd49c44e000)
libopencsd.so.1 => /lib64/libopencsd.so.1 (0x00007fd49bd56000)
⬢[acme@toolbox perf-tools]$ cat /etc/redhat-release
Fedora release 36 (Thirty Six)
⬢[acme@toolbox perf-tools]$
Fixes: ed17b19149 ("perf tools: Drop requirement for libstdc++.so for libopencsd check")
Reported-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Uwe Kleine-König <uwe@kleine-koenig.org>
Cc: coresight@lists.linaro.org
Closes: https://lore.kernel.org/linux-arm-kernel/56905d7a-a91e-883a-b707-9d5f686ba5f1@arm.com/
Link: https://lore.kernel.org/all/36cc4dc6-bf4b-1093-1c0a-876e368af183@kleine-koenig.org/
Link: https://lore.kernel.org/r/20230707154546.456720-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45fcc058a7 ]
Remove unnecessary release_mem_region from the error path to prevent
mem region from being released twice, which could avoid resource leak
or other unexpected issues.
Fixes: b083c22d51 ("video: fbdev: imxfb: Convert request_mem_region + ioremap to devm_ioremap_resource")
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e47382fbc ]
Warn about invalid var->left_margin or var->right_margin. Their values
are read from the device tree.
We store var->left_margin-3 and var->right_margin-1 in register
fields. These fields should be >= 0.
Fixes: 7e8549bcee ("imxfb: Fix margin settings")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5158814cbb ]
The command word is defined as following:
/* Command */
#define SPI_CMD_COMMAND_SHIFT 0
#define SPI_CMD_DEVICE_ID_SHIFT 4
#define SPI_CMD_PREPEND_BYTE_CNT_SHIFT 8
#define SPI_CMD_ONE_BYTE_SHIFT 11
#define SPI_CMD_ONE_WIRE_SHIFT 12
If the prepend byte count field starts at bit 8, and the next defined
bit is SPI_CMD_ONE_BYTE at bit 11, it can be at most 3 bits wide, and
thus the max value is 7, not 15.
Fixes: b17de07606 ("spi/bcm63xx: work around inability to keep CS up")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://lore.kernel.org/r/20230629071453.62024-1-jonas.gorski@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bfc374a145 ]
Currently, sd1 and sd0 have unique subnode names 'sd1_mux' and 'sd0_mux'.
If we change these to non-unique subnode names such as 'mux' this can
lead to the below conflict as the RZ/G2L pin control driver considers
only the names of the subnodes.
pinctrl-rzg2l 11030000.pinctrl: pin P47_0 already requested by 11c00000.mmc; cannot claim for 11c10000.mmc
pinctrl-rzg2l 11030000.pinctrl: pin-376 (11c10000.mmc) status -22
pinctrl-rzg2l 11030000.pinctrl: could not request pin 376 (P47_0) from group mux on device pinctrl-rzg2l
renesas_sdhi_internal_dmac 11c10000.mmc: Error applying setting, reverse things back
Fix this by constructing unique names from the node names of both the
pin control configuration node and its child node, where appropriate.
Based on the work done by Geert for the RZ/V2M pinctrl driver.
Fixes: c4c4637eb5 ("pinctrl: renesas: Add RZ/G2L pin and gpio controller driver")
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230704111858.215278-1-biju.das.jz@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f46a0b47cc ]
The eMMC and SDHI pin control configuration nodes in DT have subnodes
with the same names ("data" and "ctrl"). As the RZ/V2M pin control
driver considers only the names of the subnodes, this leads to
conflicts:
pinctrl-rzv2m b6250000.pinctrl: pin P8_2 already requested by 85000000.mmc; cannot claim for 85020000.mmc
pinctrl-rzv2m b6250000.pinctrl: pin-130 (85020000.mmc) status -22
renesas_sdhi_internal_dmac 85020000.mmc: Error applying setting, reverse things back
Fix this by constructing unique names from the node names of both the
pin control configuration node and its child node, where appropriate.
Reported by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Fixes: 92a9b82525 ("pinctrl: renesas: Add RZ/V2M pin and gpio controller driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Link: https://lore.kernel.org/r/607bd6ab4905b0b1b119a06ef953fa1184505777.1688396717.git.geert+renesas@glider.be
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aff037078e ]
Destroying psi trigger in cgroup_file_release causes UAF issues when
a cgroup is removed from under a polling process. This is happening
because cgroup removal causes a call to cgroup_file_release while the
actual file is still alive. Destroying the trigger at this point would
also destroy its waitqueue head and if there is still a polling process
on that file accessing the waitqueue, it will step on the freed pointer:
do_select
vfs_poll
do_rmdir
cgroup_rmdir
kernfs_drain_open_files
cgroup_file_release
cgroup_pressure_release
psi_trigger_destroy
wake_up_pollfree(&t->event_wait)
// vfs_poll is unblocked
synchronize_rcu
kfree(t)
poll_freewait -> UAF access to the trigger's waitqueue head
Patch [1] fixed this issue for epoll() case using wake_up_pollfree(),
however the same issue exists for synchronous poll() case.
The root cause of this issue is that the lifecycles of the psi trigger's
waitqueue and of the file associated with the trigger are different. Fix
this by using kernfs_generic_poll function when polling on cgroup-specific
psi triggers. It internally uses kernfs_open_node->poll waitqueue head
with its lifecycle tied to the file's lifecycle. This also renders the
fix in [1] obsolete, so revert it.
[1] commit c2dbe32d5d ("sched/psi: Fix use-after-free in ep_remove_wait_queue()")
Fixes: 0e94682b73 ("psi: introduce psi monitor")
Closes: https://lore.kernel.org/all/20230613062306.101831-1-lujialin4@huawei.com/
Reported-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230630005612.1014540-1-surenb@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 719a937b70 ]
Extend commit 50f9a76ef1 ("iov_iter: Mark
copy_compat_iovec_from_user() noinline") to also cover
copy_iovec_from_user(). Different compiler versions cause the same
problem on different functions.
lib/iov_iter.o: warning: objtool: .altinstr_replacement+0x1f: redundant UACCESS disable
lib/iov_iter.o: warning: objtool: iovec_from_user+0x84: call to copy_iovec_from_user.part.0() with UACCESS enabled
lib/iov_iter.o: warning: objtool: __import_iovec+0x143: call to copy_iovec_from_user.part.0() with UACCESS enabled
Fixes: 50f9a76ef1 ("iov_iter: Mark copy_compat_iovec_from_user() noinline")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lkml.kernel.org/r/20230616124354.GD4253@hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cf3d5567f ]
Now, strncpy() in hns3_dbg_fill_content() use src-length as copy-length,
it may result in dest-buf overflow.
This patch is to fix intel compile warning for csky-linux-gcc (GCC) 12.1.0
compiler.
The warning reports as below:
hclge_debugfs.c:92:25: warning: 'strncpy' specified bound depends on
the length of the source argument [-Wstringop-truncation]
strncpy(pos, items[i].name, strlen(items[i].name));
hclge_debugfs.c:90:25: warning: 'strncpy' output truncated before
terminating nul copying as many bytes from a string as its length
[-Wstringop-truncation]
strncpy(pos, result[i], strlen(result[i]));
strncpy() use src-length as copy-length, it may result in
dest-buf overflow.
So,this patch add some values check to avoid this issue.
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/lkml/202207170606.7WtHs9yS-lkp@intel.com/T/
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 004d25060c ]
In a setup where a Thunderbolt hub connects to Ethernet and a display
through USB Type-C, users may experience a hung task timeout when they
remove the cable between the PC and the Thunderbolt hub.
This is because the igb_down function is called multiple times when
the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
triggers the first call, and the igb_remove triggers the second call.
The second call to igb_down will block at napi_synchronize.
Here's the call trace:
__schedule+0x3b0/0xddb
? __mod_timer+0x164/0x5d3
schedule+0x44/0xa8
schedule_timeout+0xb2/0x2a4
? run_local_timers+0x4e/0x4e
msleep+0x31/0x38
igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
__igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
__dev_close_many+0x95/0xec
dev_close_many+0x6e/0x103
unregister_netdevice_many+0x105/0x5b1
unregister_netdevice_queue+0xc2/0x10d
unregister_netdev+0x1c/0x23
igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
pci_device_remove+0x3f/0x9c
device_release_driver_internal+0xfe/0x1b4
pci_stop_bus_device+0x5b/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_and_remove_bus_device+0x12/0x19
pciehp_unconfigure_device+0x76/0xe9
pciehp_disable_slot+0x6e/0x131
pciehp_handle_presence_or_link_change+0x7a/0x3f7
pciehp_ist+0xbe/0x194
irq_thread_fn+0x22/0x4d
? irq_thread+0x1fd/0x1fd
irq_thread+0x17b/0x1fd
? irq_forced_thread_fn+0x5f/0x5f
kthread+0x142/0x153
? __irq_get_irqchip_state+0x46/0x46
? kthread_associate_blkcg+0x71/0x71
ret_from_fork+0x1f/0x30
In this case, igb_io_error_detected detaches the network interface
and requests a PCIE slot reset, however, the PCIE reset callback is
not being invoked and thus the Ethernet connection breaks down.
As the PCIE error in this case is a non-fatal one, requesting a
slot reset can be avoided.
This patch fixes the task hung issue and preserves Ethernet
connection by ignoring non-fatal PCIE errors.
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620174732.4145155-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a528ab1da ]
Roee reported various hard-to-debug crashes with pings in
EHT aggregation scenarios. Enabling KASAN showed that we
access the BAID allocation out of bounds, and looking at
the code a bit shows that since the reorder buffer entry
(struct iwl_mvm_reorder_buf_entry) is 128 bytes if debug
such as lockdep is enabled, then staring from an agg size
512 we overflow the size calculation, and allocate a much
smaller structure than we should, causing slab corruption
once we initialize this.
Fix this by simply using u32 instead of u16.
Reported-by: Roee Goldfiner <roee.h.goldfiner@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230620125813.f428c856030d.I2c2bb808e945adb71bc15f5b2bac2d8957ea90eb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71e7552c90 ]
-Wstringop-overflow is legitimately warning us about extra_size
pontentially being zero at some point, hence potenially ending
up _allocating_ zero bytes of memory for extra pointer and then
trying to access such object in a call to copy_from_user().
Fix this by adding a sanity check to ensure we never end up
trying to allocate zero bytes of data for extra pointer, before
continue executing the rest of the code in the function.
Address the following -Wstringop-overflow warning seen when built
m68k architecture with allyesconfig configuration:
from net/wireless/wext-core.c:11:
In function '_copy_from_user',
inlined from 'copy_from_user' at include/linux/uaccess.h:183:7,
inlined from 'ioctl_standard_iw_point' at net/wireless/wext-core.c:825:7:
arch/m68k/include/asm/string.h:48:25: warning: '__builtin_memset' writing 1 or more bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
48 | #define memset(d, c, n) __builtin_memset(d, c, n)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/uaccess.h:153:17: note: in expansion of macro 'memset'
153 | memset(to + (n - res), 0, res);
| ^~~~~~
In function 'kmalloc',
inlined from 'kzalloc' at include/linux/slab.h:694:9,
inlined from 'ioctl_standard_iw_point' at net/wireless/wext-core.c:819:10:
include/linux/slab.h:577:16: note: at offset 1 into destination object of size 0 allocated by '__kmalloc'
577 | return __kmalloc(size, flags);
| ^~~~~~~~~~~~~~~~~~~~~~
This help with the ongoing efforts to globally enable
-Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/315
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZItSlzvIpjdjNfd8@work
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6aafa1c2d3 ]
Memory allocated for firmware pdev, vdev and beacon statistics
are not released during rmmod.
Fix it by calling ath11k_fw_stats_free() function before hardware
unregister.
While at it, avoid calling ath11k_fw_stats_free() while processing
the firmware stats received in the WMI event because the local list
is getting spliced and reinitialised and hence there are no elements
in the list after splicing.
Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1
Signed-off-by: P Praneesh <quic_ppranees@quicinc.com>
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230606091128.14202-1-quic_adisi@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 054b5580a3 ]
Currently 'ar' reference is not added in skb_cb.
Though this is generally not used during transmit completion
callbacks, on interface removal the remaining idr cleanup callback
uses the ar pointer from skb_cb from management txmgmt_idr. Hence fill them
during transmit call for proper usage to avoid NULL pointer dereference.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
Signed-off-by: Balamurugan S <quic_bselvara@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230518071046.14337-1-quic_bselvara@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0760d5d0e9 ]
The Intel Mount Evans SoC's Integrated Management Complex uses the SPI
controller for access to a NOR SPI FLASH. However, the SoC doesn't
provide a mechanism to override the native chip select signal.
This driver doesn't use DMA for memory operations when a chip select
override is not provided due to the native chip select timing behavior.
As a result no DMA configuration is done for the controller and this
configuration is not tested.
The controller also has an errata where a full TX FIFO can result in
data corruption. The suggested workaround is to never completely fill
the FIFO. The TX FIFO has a size of 32 so the fifo_len is set to 31.
Signed-off-by: Abe Kohandel <abe.kohandel@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230606145402.474866-2-abe.kohandel@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88ca89202f ]
Sometimes board-2.bin does not have the regdb data which matched the
parameters such as vendor, device, subsystem-vendor, subsystem-device
and etc. Add default regdb data with 'bus=%s' into board-2.bin for
WCN6855, then ath11k use 'bus=pci' to search regdb data in board-2.bin
for WCN6855.
kernel: [ 122.515808] ath11k_pci 0000:03:00.0: boot using board name 'bus=pci,vendor=17cb,device=1103,subsystem-vendor=17cb,subsystem-device=3374,qmi-chip-id=2,qmi-board-id=262'
kernel: [ 122.517240] ath11k_pci 0000:03:00.0: boot firmware request ath11k/WCN6855/hw2.0/board-2.bin size 6179564
kernel: [ 122.517280] ath11k_pci 0000:03:00.0: failed to fetch regdb data for bus=pci,vendor=17cb,device=1103,subsystem-vendor=17cb,subsystem-device=3374,qmi-chip-id=2,qmi-board-id=262 from ath11k/WCN6855/hw2.0/board-2.bin
kernel: [ 122.517464] ath11k_pci 0000:03:00.0: boot using board name 'bus=pci'
kernel: [ 122.518901] ath11k_pci 0000:03:00.0: boot firmware request ath11k/WCN6855/hw2.0/board-2.bin size 6179564
kernel: [ 122.518915] ath11k_pci 0000:03:00.0: board name
kernel: [ 122.518917] ath11k_pci 0000:03:00.0: 00000000: 62 75 73 3d 70 63 69 bus=pci
kernel: [ 122.518918] ath11k_pci 0000:03:00.0: boot found match regdb data for name 'bus=pci'
kernel: [ 122.518920] ath11k_pci 0000:03:00.0: boot found regdb data for 'bus=pci'
kernel: [ 122.518921] ath11k_pci 0000:03:00.0: fetched regdb
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230517133959.8224-1-quic_wgong@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f4b98147b ]
Devlink health is involved in error recovery. Machines in bad
state tend to be fairly unreliable, and occasionally get stuck
in error loops. Even with a reasonable grace period devlink health
may get a thousand reports in an hour.
In case of reporting on an unregistered devlink instance
the subsequent reports don't add much value. Switch to
WARN_ON_ONCE() to avoid flooding dmesg and fleet monitoring
dashboards.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230531015523.48961-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6c2f594ed ]
syzbot reported a warning in [1] with the following stacktrace:
WARNING: CPU: 0 PID: 5005 at kernel/bpf/btf.c:1988 btf_type_id_size+0x2d9/0x9d0 kernel/bpf/btf.c:1988
...
RIP: 0010:btf_type_id_size+0x2d9/0x9d0 kernel/bpf/btf.c:1988
...
Call Trace:
<TASK>
map_check_btf kernel/bpf/syscall.c:1024 [inline]
map_create+0x1157/0x1860 kernel/bpf/syscall.c:1198
__sys_bpf+0x127f/0x5420 kernel/bpf/syscall.c:5040
__do_sys_bpf kernel/bpf/syscall.c:5162 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5160 [inline]
__x64_sys_bpf+0x79/0xc0 kernel/bpf/syscall.c:5160
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
With the following btf
[1] DECL_TAG 'a' type_id=4 component_idx=-1
[2] PTR '(anon)' type_id=0
[3] TYPE_TAG 'a' type_id=2
[4] VAR 'a' type_id=3, linkage=static
and when the bpf_attr.btf_key_type_id = 1 (DECL_TAG),
the following WARN_ON_ONCE in btf_type_id_size() is triggered:
if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
!btf_type_is_var(size_type)))
return NULL;
Note that 'return NULL' is the correct behavior as we don't want
a DECL_TAG type to be used as a btf_{key,value}_type_id even
for the case like 'DECL_TAG -> STRUCT'. So there
is no correctness issue here, we just want to silence warning.
To silence the warning, I added DECL_TAG as one of kinds in
btf_type_nosize() which will cause btf_type_id_size() returning
NULL earlier without the warning.
[1] https://lore.kernel.org/bpf/000000000000e0df8d05fc75ba86@google.com/
Reported-by: syzbot+958967f249155967d42a@syzkaller.appspotmail.com
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230530205029.264910-1-yhs@fb.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e967229ead ]
rtw_sdio_rx_isr() is responsible for receiving data from the wifi chip
and is called from the SDIO interrupt handler when the interrupt status
register (HISR) has the RX_REQUEST bit set. After the first batch of
data has been processed by the driver the wifi chip may have more data
ready to be read, which is managed by a loop in rtw_sdio_rx_isr().
It turns out that there are cases where the RX buffer length (from the
REG_SDIO_RX0_REQ_LEN register) does not match the data we receive. The
following two cases were observed with a RTL8723DS card:
- RX length is smaller than the total packet length including overhead
and actual data bytes (whose length is part of the buffer we read from
the wifi chip and is stored in rtw_rx_pkt_stat.pkt_len). This can
result in errors like:
skbuff: skb_over_panic: text:ffff8000011924ac len:3341 put:3341
(one case observed was: RX buffer length = 1536 bytes but
rtw_rx_pkt_stat.pkt_len = 1546 bytes, this is not valid as it means
we need to read beyond the end of the buffer)
- RX length looks valid but rtw_rx_pkt_stat.pkt_len is zero
Check if the RX_REQUEST is set in the HISR register for each iteration
inside rtw_sdio_rx_isr(). This mimics what the RTL8723DS vendor driver
does and makes the driver only read more data if the RX_REQUEST bit is
set (which seems to be a way for the card's hardware or firmware to
tell the host that data is ready to be processed).
For RTW_WCPU_11AC chips this check is not needed. The RTL8822BS vendor
driver for example states that this check is unnecessary (but still uses
it) and the RTL8822CS drops this check entirely.
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230522202425.1827005-2-martin.blumenstingl@googlemail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9378096e8a ]
This is a preparatory commit to replace `lock_sock_fast` with
`lock_sock`,and facilitate BPF programs executed from the TCP sockets
iterator to be able to destroy TCP sockets using the bpf_sock_destroy
kfunc (implemented in follow-up commits).
Previously, BPF TCP iterator was acquiring the sock lock with BH
disabled. This led to scenarios where the sockets hash table bucket lock
can be acquired with BH enabled in some path versus disabled in other.
In such situation, kernel issued a warning since it thinks that in the
BH enabled path the same bucket lock *might* be acquired again in the
softirq context (BH disabled), which will lead to a potential dead lock.
Since bpf_sock_destroy also happens in a process context, the potential
deadlock warning is likely a false alarm.
Here is a snippet of annotated stack trace that motivated this change:
```
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&h->lhash2[i].lock);
local_bh_disable();
lock(&h->lhash2[i].lock);
kernel imagined possible scenario:
local_bh_disable(); /* Possible softirq */
lock(&h->lhash2[i].lock);
*** Potential Deadlock ***
process context:
lock_acquire+0xcd/0x330
_raw_spin_lock+0x33/0x40
------> Acquire (bucket) lhash2.lock with BH enabled
__inet_hash+0x4b/0x210
inet_csk_listen_start+0xe6/0x100
inet_listen+0x95/0x1d0
__sys_listen+0x69/0xb0
__x64_sys_listen+0x14/0x20
do_syscall_64+0x3c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
bpf_sock_destroy run from iterator:
lock_acquire+0xcd/0x330
_raw_spin_lock+0x33/0x40
------> Acquire (bucket) lhash2.lock with BH disabled
inet_unhash+0x9a/0x110
tcp_set_state+0x6a/0x210
tcp_abort+0x10d/0x200
bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9
bpf_iter_run_prog+0x1ff/0x340
------> lock_sock_fast that acquires sock lock with BH disabled
bpf_iter_tcp_seq_show+0xca/0x190
bpf_seq_read+0x177/0x450
```
Also, Yonghong reported a deadlock for non-listening TCP sockets that
this change resolves. Previously, `lock_sock_fast` held the sock spin
lock with BH which was again being acquired in `tcp_abort`:
```
watchdog: BUG: soft lockup - CPU#0 stuck for 86s! [test_progs:2331]
RIP: 0010:queued_spin_lock_slowpath+0xd8/0x500
Call Trace:
<TASK>
_raw_spin_lock+0x84/0x90
tcp_abort+0x13c/0x1f0
bpf_prog_88539c5453a9dd47_iter_tcp6_client+0x82/0x89
bpf_iter_run_prog+0x1aa/0x2c0
? preempt_count_sub+0x1c/0xd0
? from_kuid_munged+0x1c8/0x210
bpf_iter_tcp_seq_show+0x14e/0x1b0
bpf_seq_read+0x36c/0x6a0
bpf_iter_tcp_seq_show
lock_sock_fast
__lock_sock_fast
spin_lock_bh(&sk->sk_lock.slock);
/* * Fast path return with bottom halves disabled and * sock::sk_lock.slock held.* */
...
tcp_abort
local_bh_disable();
spin_lock(&((sk)->sk_lock.slock)); // from bh_lock_sock(sk)
```
With the switch to `lock_sock`, it calls `spin_unlock_bh` before returning:
```
lock_sock
lock_sock_nested
spin_lock_bh(&sk->sk_lock.slock);
:
spin_unlock_bh(&sk->sk_lock.slock);
```
Acked-by: Yonghong Song <yhs@meta.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-2-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cff36398bd ]
It's trivial for user to trigger "verifier log line truncated" warning,
as verifier has a fixed-sized buffer of 1024 bytes (as of now), and there are at
least two pieces of user-provided information that can be output through
this buffer, and both can be arbitrarily sized by user:
- BTF names;
- BTF.ext source code lines strings.
Verifier log buffer should be properly sized for typical verifier state
output. But it's sort-of expected that this buffer won't be long enough
in some circumstances. So let's drop the check. In any case code will
work correctly, at worst truncating a part of a single line output.
Reported-by: syzbot+8b2a08dfbd25fd933d75@syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230516180409.3549088-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee9fd0ac30 ]
KCSAN reported a data-race when accessing node->ref.
Although node->ref does not have to be accurate,
take this chance to use a more common READ_ONCE() and WRITE_ONCE()
pattern instead of data_race().
There is an existing bpf_lru_node_is_ref() and bpf_lru_node_set_ref().
This patch also adds bpf_lru_node_clear_ref() to do the
WRITE_ONCE(node->ref, 0) also.
==================================================================
BUG: KCSAN: data-race in __bpf_lru_list_rotate / __htab_lru_percpu_map_update_elem
write to 0xffff888137038deb of 1 bytes by task 11240 on cpu 1:
__bpf_lru_node_move kernel/bpf/bpf_lru_list.c:113 [inline]
__bpf_lru_list_rotate_active kernel/bpf/bpf_lru_list.c:149 [inline]
__bpf_lru_list_rotate+0x1bf/0x750 kernel/bpf/bpf_lru_list.c:240
bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:329 [inline]
bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
bpf_lru_pop_free+0x638/0xe20 kernel/bpf/bpf_lru_list.c:499
prealloc_lru_pop kernel/bpf/hashtab.c:290 [inline]
__htab_lru_percpu_map_update_elem+0xe7/0x820 kernel/bpf/hashtab.c:1316
bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
__sys_bpf+0x338/0x810
__do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888137038deb of 1 bytes by task 11241 on cpu 0:
bpf_lru_node_set_ref kernel/bpf/bpf_lru_list.h:70 [inline]
__htab_lru_percpu_map_update_elem+0x2f1/0x820 kernel/bpf/hashtab.c:1332
bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
__sys_bpf+0x338/0x810
__do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x01 -> 0x00
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11241 Comm: syz-executor.3 Not tainted 6.3.0-rc7-syzkaller-00136-g6a66fdd29ea1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
==================================================================
Reported-by: syzbot+ebe648a84e8784763f82@syzkaller.appspotmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230511043748.1384166-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fedf99200a ]
Only print the warning message if you are writing to
"/proc/sys/kernel/unprivileged_bpf_disabled".
The kernel may print an annoying warning when you read
"/proc/sys/kernel/unprivileged_bpf_disabled" saying
WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible
via Spectre v2 BHB attacks!
However, this message is only meaningful when the feature is
disabled or enabled.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230502181418.308479-1-kuifeng@meta.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0dd37d6dd3 ]
We've run into the case that the balancer tries to balance a migration
disabled task and trigger the warning in set_task_cpu() like below:
------------[ cut here ]------------
WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240
Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip>
CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G O 6.1.0-rc4+ #1
Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021
pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : set_task_cpu+0x188/0x240
lr : load_balance+0x5d0/0xc60
sp : ffff80000803bc70
x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040
x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001
x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78
x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000
x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000
x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530
x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e
x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a
x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001
Call trace:
set_task_cpu+0x188/0x240
load_balance+0x5d0/0xc60
rebalance_domains+0x26c/0x380
_nohz_idle_balance.isra.0+0x1e0/0x370
run_rebalance_domains+0x6c/0x80
__do_softirq+0x128/0x3d8
____do_softirq+0x18/0x24
call_on_irq_stack+0x2c/0x38
do_softirq_own_stack+0x24/0x3c
__irq_exit_rcu+0xcc/0xf4
irq_exit_rcu+0x18/0x24
el1_interrupt+0x4c/0xe4
el1h_64_irq_handler+0x18/0x2c
el1h_64_irq+0x74/0x78
arch_cpu_idle+0x18/0x4c
default_idle_call+0x58/0x194
do_idle+0x244/0x2b0
cpu_startup_entry+0x30/0x3c
secondary_start_kernel+0x14c/0x190
__secondary_switched+0xb0/0xb4
---[ end trace 0000000000000000 ]---
Further investigation shows that the warning is superfluous, the migration
disabled task is just going to be migrated to its current running CPU.
This is because that on load balance if the dst_cpu is not allowed by the
task, we'll re-select a new_dst_cpu as a candidate. If no task can be
balanced to dst_cpu we'll try to balance the task to the new_dst_cpu
instead. In this case when the migration disabled task is not on CPU it
only allows to run on its current CPU, load balance will select its
current CPU as new_dst_cpu and later triggers the warning above.
The new_dst_cpu is chosen from the env->dst_grpmask. Currently it
contains CPUs in sched_group_span() and if we have overlapped groups it's
possible to run into this case. This patch makes env->dst_grpmask of
group_balance_mask() which exclude any CPUs from the busiest group and
solve the issue. For balancing in a domain with no overlapped groups
the behaviour keeps same as before.
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230530082507.10444-1-yangyicong@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88fc7eb54e ]
The all-zero pattern is one of the more probable out-of-bound writes so
add a special case to not accidentally accept it.
Also it enables the reliable detection of stack protector initialization
during testing.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9146eb2549 ]
The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated
only on the instance corresponding to the current CPU, but can be read
more widely. Unmarked accesses are OK from the corresponding CPU, but
only if interrupts are disabled, given that interrupt handlers can and
do modify this field.
Unfortunately, although the load from rcu_preempt_deferred_qs() is always
carried out from the corresponding CPU, interrupts are not necessarily
disabled. This commit therefore upgrades this load to READ_ONCE.
Similarly, the diagnostic access from synchronize_rcu_expedited_wait()
might run with interrupts disabled and from some other CPU. This commit
therefore marks this load with data_race().
Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as
is because interrupts are disabled and this load is always from the
corresponding CPU. This commit adds a comment giving the rationale for
this access being safe.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23d28cc044 ]
The Dell Studio 1569 predates Windows 8, so it defaults to using
acpi_video# for backlight control, but this is non functional on
this model.
Add a DMI quirk to use the native intel_backlight interface which
does work properly.
Reported-by: raycekarneal <raycekarneal@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab9b400809 ]
Both create_mapping_noalloc() and update_mapping_prot() sanity-check
their 'virt' parameter, but the check itself doesn't make much sense.
The condition used today appears to be a historical accident.
The sanity-check condition:
if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
[ ... warning here ... ]
return;
}
... can only be true for the KASAN shadow region or the module region,
and there's no reason to exclude these specifically for creating and
updateing mappings.
When arm64 support was first upstreamed in commit:
c1cc155261 ("arm64: MMU initialisation")
... the condition was:
if (virt < VMALLOC_START) {
[ ... warning here ... ]
return;
}
At the time, VMALLOC_START was the lowest kernel address, and this was
checking whether 'virt' would be translated via TTBR1.
Subsequently in commit:
14c127c957 ("arm64: mm: Flip kernel VA space")
... the condition was changed to:
if ((virt >= VA_START) && (virt < VMALLOC_START)) {
[ ... warning here ... ]
return;
}
This appear to have been a thinko. The commit moved the linear map to
the bottom of the kernel address space, with VMALLOC_START being at the
halfway point. The old condition would warn for changes to the linear
map below this, and at the time VA_START was the end of the linear map.
Subsequently we cleaned up the naming of VA_START in commit:
77ad4ce693 ("arm64: memory: rename VA_START to PAGE_END")
... keeping the erroneous condition as:
if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
[ ... warning here ... ]
return;
}
Correct the condition to check against the start of the TTBR1 address
space, which is currently PAGE_OFFSET. This simplifies the logic, and
more clearly matches the "outside kernel range" message in the warning.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230615102628.1052103-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9c4a912b7 ]
commit 9946e39fe8 ("ACPI: resource: skip IRQ override on
AMD Zen platforms") attempted to overhaul the override logic so it
didn't apply on X86 AMD Zen systems. This was intentional so that
systems would prefer DSDT values instead of default MADT value for
IRQ 1 on Ryzen 6000 systems which typically uses ActiveLow for IRQ1.
This turned out to be a bad assumption because several vendors
add Interrupt Source Override but don't fix the DSDT. A pile of
quirks was collecting that proved this wasn't sustaintable.
Furthermore some vendors have used ActiveHigh for IRQ1.
To solve this problem revert the following commits:
* commit 17bb7046e7 ("ACPI: resource: Do IRQ override on all TongFang
GMxRGxx")
* commit f3cb9b7408 ("ACPI: resource: do IRQ override on Lenovo 14ALC7")
* commit bfcdf58380 ("ACPI: resource: do IRQ override on LENOVO IdeaPad")
* commit 7592b79ba4 ("ACPI: resource: do IRQ override on XMG Core 15")
* commit 9946e39fe8 ("ACPI: resource: skip IRQ override on AMD Zen
platforms")
Reported-by: evilsnoo@proton.me
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217394
Reported-by: ruinairas1992@gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217406
Reported-by: nmschulte@gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217336
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Werner Sembach <wse@tuxedocomputers.com>
Tested-by: Chuanhong Guo <gch981213@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd5d93df86 ]
Linux defaults to picking the non-working ACPI video backlight interface
on the Lenovo ThinkPad X131e (3371 AMD version).
Add a DMI quirk to pick the working native radeon_bl0 interface instead.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 48436f2e98 ]
Linux defaults to picking the non-working ACPI video backlight interface
on the Apple iMac11,3 .
Add a DMI quirk to pick the working native radeon_bl0 interface instead.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f91280f358 ]
The Lenovo Yoga Book yb1-x90f/l 2-in-1 which ships with Android as
Factory OS has (another) bug in its DSDT where the UART resource for
the BTH0 ACPI device contains "\\_SB.PCIO.URT1" as path to the UART.
Note that is with a letter 'O' instead of the number '0' which is wrong.
This causes Linux to instantiate a standard /dev/ttyS? device for
the UART instead of a /sys/bus/serial device, which in turn causes
bluetooth to not work.
Similar DSDT bugs have been encountered before and to work around those
the acpi_quirk_skip_serdev_enumeration() helper exists.
Previous devices had the broken resource pointing to the first UART, while
the BT HCI was on the second UART, which ACPI_QUIRK_UART1_TTY_UART2_SKIP
deals with. Add a new ACPI_QUIRK_UART1_SKIP quirk for skipping enumeration
of UART1 instead for the Yoga Book case and add this quirk to the
existing DMI quirk table entry for the yb1-x90f/l .
This leaves the UART1 controller unbound allowing the x86-android-tablets
module to manually instantiate a serdev for it fixing bluetooth.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fd5556608 ]
The LID0 device on the Nextbook Ares 8A tablet always reports lid
closed causing userspace to suspend the device as soon as booting
is complete.
Add a DMI quirk to disable the broken lid functionality.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69d6b37695 ]
The Nextbook Ares 8A is a x86 ACPI tablet which ships with Android x86
as factory OS. Its DSDT contains a bunch of I2C devices which are not
actually there (the Android x86 kernel fork ignores I2C devices described
in the DSDT).
On this specific model this just not cause resource conflicts, one of
the probe() calls for the non existing i2c_clients actually ends up
toggling a GPIO or executing a _PS3 after a failed probe which turns
the tablet off.
Add a ACPI_QUIRK_SKIP_I2C_CLIENTS for the Nextbook Ares 8 to the
acpi_quirk_skip_dmi_ids table to avoid the bogus i2c_clients and
to fix the tablet turning off during boot because of this.
Also add the "10EC5651" HID for the RealTek ALC5651 codec used
in this tablet to the list of HIDs for which not to skipi2c_client
instantiation, since the Intel SST sound driver relies on
the codec being instantiated through ACPI.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12d0a24afd ]
Current check for atomic context is not sufficient as
z_erofs_decompressqueue_endio can be called under rcu lock
from blk_mq_flush_plug_list(). See the stacktrace [1]
In such case we should hand off the decompression work for async
processing rather than trying to do sync decompression in current
context. Patch fixes the detection by checking for
rcu_read_lock_any_held() and while at it use more appropriate
!in_task() check than in_atomic().
Background: Historically erofs would always schedule a kworker for
decompression which would incur the scheduling cost regardless of
the context. But z_erofs_decompressqueue_endio() may not always
be in atomic context and we could actually benefit from doing the
decompression in z_erofs_decompressqueue_endio() if we are in
thread context, for example when running with dm-verity.
This optimization was later added in patch [2] which has shown
improvement in performance benchmarks.
==============================================
[1] Problem stacktrace
[name:core&]BUG: sleeping function called from invalid context at kernel/locking/mutex.c:291
[name:core&]in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1615, name: CpuMonitorServi
[name:core&]preempt_count: 0, expected: 0
[name:core&]RCU nest depth: 1, expected: 0
CPU: 7 PID: 1615 Comm: CpuMonitorServi Tainted: G S W OE 6.1.25-android14-5-maybe-dirty-mainline #1
Hardware name: MT6897 (DT)
Call trace:
dump_backtrace+0x108/0x15c
show_stack+0x20/0x30
dump_stack_lvl+0x6c/0x8c
dump_stack+0x20/0x48
__might_resched+0x1fc/0x308
__might_sleep+0x50/0x88
mutex_lock+0x2c/0x110
z_erofs_decompress_queue+0x11c/0xc10
z_erofs_decompress_kickoff+0x110/0x1a4
z_erofs_decompressqueue_endio+0x154/0x180
bio_endio+0x1b0/0x1d8
__dm_io_complete+0x22c/0x280
clone_endio+0xe4/0x280
bio_endio+0x1b0/0x1d8
blk_update_request+0x138/0x3a4
blk_mq_plug_issue_direct+0xd4/0x19c
blk_mq_flush_plug_list+0x2b0/0x354
__blk_flush_plug+0x110/0x160
blk_finish_plug+0x30/0x4c
read_pages+0x2fc/0x370
page_cache_ra_unbounded+0xa4/0x23c
page_cache_ra_order+0x290/0x320
do_sync_mmap_readahead+0x108/0x2c0
filemap_fault+0x19c/0x52c
__do_fault+0xc4/0x114
handle_mm_fault+0x5b4/0x1168
do_page_fault+0x338/0x4b4
do_translation_fault+0x40/0x60
do_mem_abort+0x60/0xc8
el0_da+0x4c/0xe0
el0t_64_sync_handler+0xd4/0xfc
el0t_64_sync+0x1a0/0x1a4
[2] Link: https://lore.kernel.org/all/20210317035448.13921-1-huangjianan@oppo.com/
Reported-by: Will Shiu <Will.Shiu@mediatek.com>
Suggested-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://lore.kernel.org/r/20230621220848.3379029-1-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eced687e22 ]
At update_ref_for_cow() we are calling btrfs_handle_fs_error() if we find
that the extent buffer has an unexpected ref count of zero, however we can
simply use btrfs_abort_transaction(), which achieves the same purposes: to
turn the fs to error state, abort the current transaction and turn the fs
to RO mode as well. Besides that, btrfs_abort_transaction() also prints a
stack trace which makes it more useful.
Also, as this is a very unexpected situation, indicating a serious
corruption/inconsistency, tag the if branch as 'unlikely', set the error
code to -EUCLEAN instead of -EROFS, and log an explicit message.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e92499e3b ]
__extent_writepage currenly sets PageError whenever any error happens,
and the also checks for PageError to decide if to call error handling.
This leads to very unclear responsibility for cleaning up on errors.
In the VM and generic writeback helpers the basic idea is that once
I/O is fired off all error handling responsibility is delegated to the
end I/O handler. But if that end I/O handler sets the PageError bit,
and the submitter checks it, the bit could in some cases leak into the
submission context for fast enough I/O.
Fix this by simply not checking PageError and just using the local
ret variable to check for submission errors. This also fundamentally
solves the long problem documented in a comment in __extent_writepage
by never leaking the error bit into the submission context.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit efcfcbc6a3 ]
The implementation of XXHASH is now CPU only but still fast enough to be
considered for the synchronous checksumming, like non-generic crc32c.
A userspace benchmark comparing it to various implementations (patched
hash-speedtest from btrfs-progs):
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 73384294, cycles/i 73
NULL-MEMCPY: cycles: 228033868, cycles/i 228, 61664.320 MiB/s
CRC32C-ref: cycles: 24758559416, cycles/i 24758, 567.950 MiB/s
CRC32C-NI: cycles: 1194350470, cycles/i 1194, 11773.433 MiB/s
CRC32C-ADLERSW: cycles: 6150186216, cycles/i 6150, 2286.372 MiB/s
CRC32C-ADLERHW: cycles: 626979180, cycles/i 626, 22427.453 MiB/s
CRC32C-PCL: cycles: 466746732, cycles/i 466, 30126.699 MiB/s
XXHASH: cycles: 860656400, cycles/i 860, 16338.188 MiB/s
Comparing purely software implementation (ref), current outdated
accelerated using crc32q instruction (NI), optimized implementations by
M. Adler (https://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software/17646775#17646775)
and the best one that was taken from kernel using the PCLMULQDQ
instruction (PCL).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ce8849dd1 ]
posix_timer_add() tries to allocate a posix timer ID by starting from the
cached ID which was stored by the last successful allocation.
This is done in a loop searching the ID space for a free slot one by
one. The loop has to terminate when the search wrapped around to the
starting point.
But that's racy vs. establishing the starting point. That is read out
lockless, which leads to the following problem:
CPU0 CPU1
posix_timer_add()
start = sig->posix_timer_id;
lock(hash_lock);
... posix_timer_add()
if (++sig->posix_timer_id < 0)
start = sig->posix_timer_id;
sig->posix_timer_id = 0;
So CPU1 can observe a negative start value, i.e. -1, and the loop break
never happens because the condition can never be true:
if (sig->posix_timer_id == start)
break;
While this is unlikely to ever turn into an endless loop as the ID space is
huge (INT_MAX), the racy read of the start value caught the attention of
KCSAN and Dmitry unearthed that incorrectness.
Rewrite it so that all id operations are under the hash lock.
Reported-by: syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db59133e92 ]
sg_ioctl() support to enable blktrace, which will create debugfs entries
"/sys/kernel/debug/block/sgx/", however, there is no guarantee that user
will remove these entries through ioctl, and deleting sg device doesn't
cleanup these blktrace entries.
This problem can be fixed by cleanup blktrace while releasing
request_queue, however, it's not a good idea to do this special handling
in common layer just for sg device.
Fix this problem by shutdown bltkrace in sg_device_destroy(), where the
device is deleted and all the users close the device, also grab a
scsi_device reference from sg_add_device() to prevent scsi_device to be
freed before sg_device_destroy();
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230610022003.2557284-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 010444623e ]
Currently, there is no limit for raid1/raid10 plugged bio. While flushing
writes, raid1 has cond_resched() while raid10 doesn't, and too many
writes can cause soft lockup.
Follow up soft lockup can be triggered easily with writeback test for
raid10 with ramdisks:
watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
Call Trace:
<TASK>
call_rcu+0x16/0x20
put_object+0x41/0x80
__delete_object+0x50/0x90
delete_object_full+0x2b/0x40
kmemleak_free+0x46/0xa0
slab_free_freelist_hook.constprop.0+0xed/0x1a0
kmem_cache_free+0xfd/0x300
mempool_free_slab+0x1f/0x30
mempool_free+0x3a/0x100
bio_free+0x59/0x80
bio_put+0xcf/0x2c0
free_r10bio+0xbf/0xf0
raid_end_bio_io+0x78/0xb0
one_write_done+0x8a/0xa0
raid10_end_write_request+0x1b4/0x430
bio_endio+0x175/0x320
brd_submit_bio+0x3b9/0x9b7 [brd]
__submit_bio+0x69/0xe0
submit_bio_noacct_nocheck+0x1e6/0x5a0
submit_bio_noacct+0x38c/0x7e0
flush_pending_writes+0xf0/0x240
raid10d+0xac/0x1ed0
Fix the problem by adding cond_resched() to raid10 like what raid1 did.
Note that unlimited plugged bio still need to be optimized, for example,
in the case of lots of dirty pages writeback, this will take lots of
memory and io will spend a long time in plug, hence io latency is bad.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230529131106.2123367-2-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95e2b352c0 ]
This patch adds a check for read-only mounted filesystem
in txBegin before starting a transaction potentially saving
from NULL pointer deref.
Signed-off-by: Immad Mir <mirimmad17@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b191b9b55 ]
Zero-length arrays are deprecated, and we are replacing them with flexible
array members instead. So, replace zero-length array with flexible-array
member in struct memmap.
Address the following warning found after building (with GCC-13) mips64
with decstation_64_defconfig:
In function 'rex_setup_memory_region',
inlined from 'prom_meminit' at arch/mips/dec/prom/memory.c:91:3:
arch/mips/dec/prom/memory.c:72:31: error: array subscript i is outside array bounds of 'unsigned char[0]' [-Werror=array-bounds=]
72 | if (bm->bitmap[i] == 0xff)
| ~~~~~~~~~~^~~
In file included from arch/mips/dec/prom/memory.c:16:
./arch/mips/include/asm/dec/prom.h: In function 'prom_meminit':
./arch/mips/include/asm/dec/prom.h:73:23: note: while referencing 'bitmap'
73 | unsigned char bitmap[0];
This helps with the ongoing efforts to globally enable -Warray-bounds.
This results in no differences in binary output.
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/323
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f723edb8a5 ]
Porting overlayfs to the new amount api I started experiencing random
crashes that couldn't be explained easily. So after much debugging and
reasoning it became clear that struct ovl_entry requires the point to
struct vfsmount to be the first member and of type struct vfsmount.
During the port I added a new member at the beginning of struct
ovl_entry which broke all over the place in the form of random crashes
and cache corruptions. While there's a comment in ovl_free_fs() to the
effect of "Hack! Reuse ofs->layers as a vfsmount array before freeing
it" there's no such comment on struct ovl_entry which makes this easy to
trip over.
Add a comment and two static asserts for both the offset and the type of
pointer in struct ovl_entry.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0db117359e ]
HP Elite Presenter Mouse HID Record Descriptor shows
two mouses (Repord ID 0x1 and 0x2), one keypad (Report ID 0x5),
two Consumer Controls (Report IDs 0x6 and 0x3).
Previous to this commit it registers one mouse, one keypad
and one Consumer Control, and it was usable only as a
digitl laser pointer (one of the two mouses). This patch defines
the 464a USB device ID and enables the HID_QUIRK_MULTI_INPUT
quirk for it, allowing to use the device both as a mouse
and a digital laser pointer.
Signed-off-by: Marco Morandini <marco.morandini@polimi.it>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6a4e336379 ]
When add_dquot_ref() fails (usually due to IO error or ENOMEM), we want
to disable quotas we are trying to enable. However dquot_disable() call
was passed just the flags we are enabling so in case flags ==
DQUOT_USAGE_ENABLED dquot_disable() call will just fail with EINVAL
instead of properly disabling quotas. Fix the problem by always passing
DQUOT_LIMITS_ENABLED | DQUOT_USAGE_ENABLED to dquot_disable() in this
case.
Reported-and-tested-by: Ye Bin <yebin10@huawei.com>
Reported-by: syzbot+e633c79ceaecbf479854@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230605140731.2427629-2-yebin10@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f828b681d0 ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: hackyzh002 <hackyzh002@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 6909cf5c41 upstream.
When run on a file system where the inline_data feature has been
enabled, xfstests generic/269, generic/270, and generic/476 cause ext4
to emit error messages indicating that inline directory entries are
corrupted. This occurs because the inline offset used to locate
inline directory entries in the inode body is not updated when an
xattr in that shared region is deleted and the region is shifted in
memory to recover the space it occupied. If the deleted xattr precedes
the system.data attribute, which points to the inline directory entries,
that attribute will be moved further up in the region. The inline
offset continues to point to whatever is located in system.data's former
location, with unfortunate effects when used to access directory entries
or (presumably) inline data in the inode body.
Cc: stable@kernel.org
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20230522181520.1570360-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b321c31c9b upstream.
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq. This occurs because when the vcpu thread is
resumed on its way to block, vcpu_load() will make the vPE resident
again. Once the vcpu actually blocks, we don't request a doorbell
anymore, and the vcpu won't be woken up on interrupt delivery.
Fix it by tracking that we're entering WFI, and key the doorbell
request on that flag. This allows us not to make the vPE resident
when going through a preempt/schedule cycle, meaning we don't lose
any state.
Cc: stable@vger.kernel.org
Fixes: 8e01d9a396 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put")
Reported-by: Xiang Chen <chenxiang66@hisilicon.com>
Suggested-by: Zenghui Yu <yuzenghui@huawei.com>
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 970dee09b2 upstream.
Since 0bf50497f0 ("KVM: Drop kvm_count_lock and instead protect
kvm_usage_count with kvm_lock"), hotplugging back a CPU whilst
a guest is running results in a number of ugly splats as most
of this code expects to run with preemption disabled, which isn't
the case anymore.
While the context is preemptable, it isn't migratable, which should
be enough. But we have plenty of preemptible() checks all over
the place, and our per-CPU accessors also disable preemption.
Since this affects released versions, let's do the easy fix first,
disabling preemption in kvm_arch_hardware_enable(). We can always
revisit this with a more invasive fix in the future.
Fixes: 0bf50497f0 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
Reported-by: Kristina Martsenko <kristina.martsenko@arm.com>
Tested-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/aeab7562-2d39-e78e-93b1-4711f8cc3fa5@arm.com
Cc: stable@vger.kernel.org # v6.3, v6.4
Link: https://lore.kernel.org/r/20230703163548.1498943-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df6556adf2 upstream.
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable@vger.kernel.org # v5.15
Fixes: 056aad67f8 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe769e6c1f upstream.
It recently appeared that, when running VHE, there is a notable
difference between using CNTKCTL_EL1 and CNTHCTL_EL2, despite what
the architecture documents:
- When accessed from EL2, bits [19:18] and [16:10] of CNTKCTL_EL1 have
the same assignment as CNTHCTL_EL2
- When accessed from EL1, bits [19:18] and [16:10] are RES0
It is all OK, until you factor in NV, where the EL2 guest runs at EL1.
In this configuration, CNTKCTL_EL11 doesn't trap, nor ends up in
the VNCR page. This means that any write from the guest affecting
CNTHCTL_EL2 using CNTKCTL_EL1 ends up losing some state. Not good.
The fix it obvious: don't use CNTKCTL_EL1 if you want to change bits
that are not part of the EL1 definition of CNTKCTL_EL1, and use
CNTHCTL_EL2 instead. This doesn't change anything for a bare-metal OS,
and fixes it when running under NV. The NV hypervisor will itself
have to work harder to merge the two accessors.
Note that there is a pending update to the architecture to address
this issue by making the affected bits UNKNOWN when CNTKCTL_EL1 is
used from EL2 with VHE enabled.
Fixes: c605ee2450 ("KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org # v6.4
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20230627140557.544885-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f49256897 upstream.
Make sure that the soundwire device used for register accesses has been
enumerated and initialised before trying to read the codec variant
during component probe.
This specifically avoids interpreting (a masked and shifted) -EBUSY
errno as the variant:
wcd938x_codec audio-codec: ASoC: error at soc_component_read_no_lock on audio-codec for register: [0x000034b0] -16
in case the soundwire device has not yet been initialised, which in turn
prevents some headphone controls from being registered.
Fixes: 8d78602aa8 ("ASoC: codecs: wcd938x: add basic driver")
Cc: stable@vger.kernel.org # 5.14
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reported-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Link: https://lore.kernel.org/r/20230701094723.29379-1-johan+linaro@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85a61b1ce4 upstream.
Make sure to resume the codec and soundwire device before trying to read
the codec variant and configure the device during component probe.
This specifically avoids interpreting (a masked and shifted) -EBUSY
errno as the variant:
wcd938x_codec audio-codec: ASoC: error at soc_component_read_no_lock on audio-codec for register: [0x000034b0] -16
when the soundwire device happens to be suspended, which in turn
prevents some headphone controls from being registered.
Fixes: 8d78602aa8 ("ASoC: codecs: wcd938x: add basic driver")
Cc: stable@vger.kernel.org # 5.14
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reported-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20230630120318.6571-1-johan+linaro@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49bd7b0814 upstream.
Byte mask for channel-1 of stream-1 is not getting enabled and this
causes failures during AMX use cases. This happens because the byte
map value 0 matches the byte map array and put() callback returns
without enabling the corresponding bits in the byte mask.
AMX supports 4 input streams and each stream can take a maximum of
16 channels. Each byte in the output frame is uniquely mapped to a
byte in one of these 4 inputs. This mapping is done with the help of
byte map array via user space control setting. The byte map array
size in the driver is 16 and each array element is of size 4 bytes.
This corresponds to 64 byte map values.
Each byte in the byte map array can have any value between 0 to 255
to enable the corresponding bits in the byte mask. The value 256 is
used as a way to disable the byte map. However the byte map array
element cannot store this value. The put() callback disables the byte
mask for 256 value and byte map value is reset to 0 for this case.
This causes problems during subsequent runs since put() callback,
for value of 0, just returns without enabling the byte mask. In short,
the problem is coming because 0 and 256 control values are stored as
0 in the byte map array.
Right now fix the put() callback by actually looking at the byte mask
array state to identify if any change is needed and update the fields
accordingly. The get() callback needs an update as well to return the
correct control value that user has set before. Note that when user
sets 256, the value is stored as 0 and byte mask is disabled. So byte
mask state is used to either return 256 or the value from byte map
array.
Given above, this looks bit complicated and all this happens because
the byte map array is tightly packed and cannot actually store the 256
value. Right now the priority is to fix the existing failure and a TODO
item is put to improve this logic.
Fixes: 8db78ace1b ("ASoC: tegra: Fix kcontrol put callback in AMX")
Cc: stable@vger.kernel.org
Signed-off-by: Sheetal <sheetal@nvidia.com>
Reviewed-by: Mohan Kumar D <mkumard@nvidia.com>
Reviewed-by: Sameer Pujar <spujar@nvidia.com>
Link: https://lore.kernel.org/r/1688015537-31682-2-git-send-email-spujar@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a5475829ad upstream.
The MBHC resources must be released on component probe failure and
removal so can not be tied to the lifetime of the component device.
This is specifically needed to allow probe deferrals of the sound card
which otherwise fails when reprobing the codec component:
snd-sc8280xp sound: ASoC: failed to instantiate card -517
genirq: Flags mismatch irq 299. 00002001 (mbhc sw intr) vs. 00002001 (mbhc sw intr)
wcd938x_codec audio-codec: Failed to request mbhc interrupts -16
wcd938x_codec audio-codec: mbhc initialization failed
wcd938x_codec audio-codec: ASoC: error at snd_soc_component_probe on audio-codec: -16
snd-sc8280xp sound: ASoC: failed to instantiate card -16
Fixes: 0e5c9e7ff8 ("ASoC: codecs: wcd: add multi button Headset detection support")
Cc: stable@vger.kernel.org # 5.14
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230705123018.30903-7-johan+linaro@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e51df4f81b upstream.
In commit 2cb1e0259f ("ASoC: cs42l51: re-hook of_match_table
pointer"), 9 years ago, some random guy fixed the cs42l51 after it was
split into a core part and an I2C part to properly match based on a
Device Tree compatible string.
However, the fix in this commit is wrong: the MODULE_DEVICE_TABLE(of,
....) is in the core part of the driver, not the I2C part. Therefore,
automatic module loading based on module.alias, based on matching with
the DT compatible string, loads the core part of the driver, but not
the I2C part. And threfore, the i2c_driver is not registered, and the
codec is not known to the system, nor matched with a DT node with the
corresponding compatible string.
In order to fix that, we move the MODULE_DEVICE_TABLE(of, ...) into
the I2C part of the driver. The cs42l51_of_match[] array is also moved
as well, as it is not possible to have this definition in one file,
and the MODULE_DEVICE_TABLE(of, ...) invocation in another file, due
to how MODULE_DEVICE_TABLE works.
Thanks to this commit, the I2C part of the driver now properly
autoloads, and thanks to its dependency on the core part, the core
part gets autoloaded as well, resulting in a functional sound card
without having to manually load kernel modules.
Fixes: 2cb1e0259f ("ASoC: cs42l51: re-hook of_match_table pointer")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Link: https://lore.kernel.org/r/20230713112112.778576-1-thomas.petazzoni@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70a6404ff6 upstream.
Following prints are observed while testing audio on Jetson AGX Orin which
has onboard RT5640 audio codec:
BUG: sleeping function called from invalid context at kernel/workqueue.c:3027
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/0
preempt_count: 10001, expected: 0
RCU nest depth: 0, expected: 0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159 __handle_irq_event_percpu+0x1e0/0x270
---[ end trace ad1c64905aac14a6 ]-
The IRQ handler rt5640_irq() runs in interrupt context and can sleep
during cancel_delayed_work_sync().
Fix this by running IRQ handler, rt5640_irq(), in thread context.
Hence replace request_irq() calls with devm_request_threaded_irq().
Fixes: 051dade346 ("ASoC: rt5640: Fix the wrong state of JD1 and JD2")
Cc: stable@vger.kernel.org
Cc: Oder Chiou <oder_chiou@realtek.com>
Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Link: https://lore.kernel.org/r/1688015537-31682-4-git-send-email-spujar@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6dfe70be0b upstream.
Byte mask for channel-1 of stream-1 is not getting enabled and this
causes failures during ADX use cases. This happens because the byte
map value 0 matches the byte map array and put() callback returns
without enabling the corresponding bits in the byte mask.
ADX supports 4 output streams and each stream can have a maximum of
16 channels. Each byte in the input frame is uniquely mapped to a
byte in one of these 4 outputs. This mapping is done with the help of
byte map array via user space control setting. The byte map array
size in the driver is 16 and each array element is of size 4 bytes.
This corresponds to 64 byte map values.
Each byte in the byte map array can have any value between 0 to 255
to enable the corresponding bits in the byte mask. The value 256 is
used as a way to disable the byte map. However the byte map array
element cannot store this value. The put() callback disables the byte
mask for 256 value and byte map value is reset to 0 for this case.
This causes problems during subsequent runs since put() callback,
for value of 0, just returns without enabling the byte mask. In short,
the problem is coming because 0 and 256 control values are stored as
0 in the byte map array.
Right now fix the put() callback by actually looking at the byte mask
array state to identify if any change is needed and update the fields
accordingly. The get() callback needs an update as well to return the
correct control value that user has set before. Note that when user
set 256, the value is stored as 0 and byte mask is disabled. So byte
mask state is used to either return 256 or the value from byte map
array.
Given above, this looks bit complicated and all this happens because
the byte map array is tightly packed and cannot actually store the 256
value. Right now the priority is to fix the existing failure and a TODO
item is put to improve this logic.
Fixes: 3c97881b8c ("ASoC: tegra: Fix kcontrol put callback in ADX")
Cc: stable@vger.kernel.org
Signed-off-by: Sheetal <sheetal@nvidia.com>
Reviewed-by: Mohan Kumar D <mkumard@nvidia.com>
Reviewed-by: Sameer Pujar <spujar@nvidia.com>
Link: https://lore.kernel.org/r/1688015537-31682-3-git-send-email-spujar@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86867aca73 upstream.
This reverts commit ff87d619ac.
Andreas reports that on an i.MX8MP-based system where MCLK needs to be
used as an input, the MCLK pin is actually an output, despite not having
the 'fsl,sai-mclk-direction-output' property present in the devicetree.
This is caused by commit ff87d619ac ("ASoC: fsl_sai: Enable
MCTL_MCLK_EN bit for master mode") that sets FSL_SAI_MCTL_MCLK_EN
unconditionally for imx8mm/8mn/8mp/93, causing the MCLK to always
be configured as output.
FSL_SAI_MCTL_MCLK_EN corresponds to the MOE (MCLK Output Enable) bit
of register MCR and the drivers sets it when the
'fsl,sai-mclk-direction-output' devicetree property is present.
Revert the commit to allow SAI to use MCLK as input as well.
Cc: stable@vger.kernel.org
Fixes: ff87d619ac ("ASoC: fsl_sai: Enable MCTL_MCLK_EN bit for master mode")
Reported-by: Andreas Henriksson <andreas@fatal.se>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Link: https://lore.kernel.org/r/20230706221827.1938990-1-festevam@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ca67aba8d upstream.
Up until now, amdgpu was silently degrading to vsync when
user-space requested an async flip but the hardware didn't support
it.
The hardware doesn't support immediate flips when the update changes
the FB pitch, the DCC state, the rotation, enables or disables CRTCs
or planes, etc. This is reflected in the dm_crtc_state.update_type
field: UPDATE_TYPE_FAST means that immediate flip is supported.
Silently degrading async flips to vsync is not the expected behavior
from a uAPI point-of-view. Xorg expects async flips to fail if
unsupported, to be able to fall back to a blit. i915 already behaves
this way.
This patch aligns amdgpu with uAPI expectations and returns a failure
when an async flip is not possible.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2329cc7a10 upstream.
When a new mode is set to modeset->mode, the previous mode should be freed.
This fixes the following kmemleak report:
drm_mode_duplicate+0x45/0x220 [drm]
drm_client_modeset_probe+0x944/0xf50 [drm]
__drm_fb_helper_initial_config_and_unlock+0xb4/0x2c0 [drm_kms_helper]
drm_fbdev_client_hotplug+0x2bc/0x4d0 [drm_kms_helper]
drm_client_register+0x169/0x240 [drm]
ast_pci_probe+0x142/0x190 [ast]
local_pci_probe+0xdc/0x180
work_for_cpu_fn+0x4e/0xa0
process_one_work+0x8b7/0x1540
worker_thread+0x70a/0xed0
kthread+0x29f/0x340
ret_from_fork+0x1f/0x30
cc: <stable@vger.kernel.org>
Reported-by: Zhang Yi <yizhan@redhat.com>
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711092203.68157-3-jfalempe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b42ae87a7b upstream.
In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.
do {
xrandr --output Virtual --rotate inverted
xrandr --output Virtual --rotate right
xrandr --output Virtual --rotate left
xrandr --output Virtual --rotate normal
} while (1);
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable
It's caused by a stuck in lock dependency in such scenario on different
CPUs.
CPU1 CPU2
drm_crtc_vblank_off hrtimer_interrupt
grab event_lock (irq disabled) __hrtimer_run_queues
grab vbl_lock/vblank_time_block amdgpu_vkms_vblank_simulate
amdgpu_vkms_disable_vblank drm_handle_vblank
hrtimer_cancel grab dev->event_lock
So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.
So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.
v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well
v3: drop warn printing (Christian)
v4: drop superfluous check of blank->enabled in timer function, as it's
guaranteed in drm_handle_vblank (Christian)
Fixes: 84ec374bd5 ("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable@vger.kernel.org
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51b56382ed upstream.
Copy the bounds checking from encode_message() to decode_message().
This patch addresses the following concerns. Ensure that there is
enough space for at least one header so that we don't have a negative
size later.
if (msg_hdr_len < sizeof(*trans_hdr))
Ensure that we have enough space to read the next header from the
msg->data.
if (msg_len > msg_hdr_len - sizeof(*trans_hdr))
return -EINVAL;
Check that the trans_hdr->len is not below the minimum size:
if (hdr_len < sizeof(*trans_hdr))
This minimum check ensures that we don't corrupt memory in
decode_passthrough() when we do.
memcpy(out_trans->data, in_trans->data, len - sizeof(in_trans->hdr));
And finally, use size_add() to prevent an integer overflow:
if (size_add(msg_len, hdr_len) > msg_hdr_len)
Fixes: 129776ac2e ("accel/qaic: Add control path")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Cc: stable@vger.kernel.org # 6.4.x
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZK0Q5nbLyDO7kJa+@moroto
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea33cb6fc2 upstream.
There are several issues in this code. The check at the start of the
loop:
if (user_len >= user_msg->len) {
This check does not ensure that we have enough space for the trans_hdr
(8 bytes). Instead the check needs to be:
if (user_len > user_msg->len - sizeof(*trans_hdr)) {
That subtraction is done as an unsigned long we want to avoid
negatives. Add a lower bound to the start of the function.
if (user_msg->len < sizeof(*trans_hdr))
There is a second integer underflow which can happen if
trans_hdr->len is zero inside the encode_passthrough() function.
memcpy(out_trans->data, in_trans->data, in_trans->hdr.len - sizeof(in_trans->hdr));
Instead of adding a check to encode_passthrough() it's better to check
in this central place. Add that check:
if (trans_hdr->len < sizeof(trans_hdr)
The final concern is that the "user_len + trans_hdr->len" might have an
integer overflow bug. Use size_add() to prevent that.
- if (user_len + trans_hdr->len > user_msg->len) {
+ if (size_add(user_len, trans_hdr->len) > user_msg->len) {
Fixes: 129776ac2e ("accel/qaic: Add control path")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Cc: stable@vger.kernel.org # 6.4.x
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9a0cb0c1-a974-4f10-bc8d-94437983639a@moroto.mountain
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5886e4d5ec upstream.
If the gs_usb device driver is unloaded (or unbound) before the
interface is shut down, the USB stack first calls the struct
usb_driver::disconnect and then the struct net_device_ops::ndo_stop
callback.
In gs_usb_disconnect() all pending bulk URBs are killed, i.e. no more
RX'ed CAN frames are send from the USB device to the host. Later in
gs_can_close() a reset control message is send to each CAN channel to
remove the controller from the CAN bus. In this race window the USB
device can still receive CAN frames from the bus and internally queue
them to be send to the host.
At least in the current version of the candlelight firmware, the queue
of received CAN frames is not emptied during the reset command. After
loading (or binding) the gs_usb driver, new URBs are submitted during
the struct net_device_ops::ndo_open callback and the candlelight
firmware starts sending its already queued CAN frames to the host.
However, this scenario was not considered when implementing the
hardware timestamp function. The cycle counter/time counter
infrastructure is set up (gs_usb_timestamp_init()) after the USBs are
submitted, resulting in a NULL pointer dereference if
timecounter_cyc2time() (via the call chain:
gs_usb_receive_bulk_callback() -> gs_usb_set_timestamp() ->
gs_usb_skb_set_timestamp()) is called too early.
Move the gs_usb_timestamp_init() function before the URBs are
submitted to fix this problem.
For a comprehensive solution, we need to consider gs_usb devices with
more than 1 channel. The cycle counter/time counter infrastructure is
setup per channel, but the RX URBs are per device. Once gs_can_open()
of _a_ channel has been called, and URBs have been submitted, the
gs_usb_receive_bulk_callback() can be called for _all_ available
channels, even for channels that are not running, yet. As cycle
counter/time counter has not set up, this will again lead to a NULL
pointer dereference.
Convert the cycle counter/time counter from a "per channel" to a "per
device" functionality. Also set it up, before submitting any URBs to
the device.
Further in gs_usb_receive_bulk_callback(), don't process any URBs for
not started CAN channels, only resubmit the URB.
Fixes: 45dfa45f52 ("can: gs_usb: add RX and TX hardware timestamp support")
Closes: https://github.com/candle-usb/candleLight_fw/issues/137#issuecomment-1623532076
Cc: stable@vger.kernel.org
Cc: John Whittington <git@jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-2-9017cefcd9d5@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2603be9e81 upstream.
The gs_usb driver handles USB devices with more than 1 CAN channel.
The RX path for all channels share the same bulk endpoint (the
transmitted bulk data encodes the channel number). These per-device
resources are allocated and submitted by the first opened channel.
During this allocation, the resources are either released immediately
in case of a failure or the URBs are anchored. All anchored URBs are
finally killed with gs_usb_disconnect().
Currently, gs_can_open() returns with an error if the allocation of a
URB or a buffer fails. However, if usb_submit_urb() fails, the driver
continues with the URBs submitted so far, even if no URBs were
successfully submitted.
Treat every error as fatal and free all allocated resources
immediately.
Switch to goto-style error handling, to prepare the driver for more
per-device resource allocation.
Cc: stable@vger.kernel.org
Cc: John Whittington <git@jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-9017cefcd9d5@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9efa1a5407 upstream.
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
mode' or . The maximum length of a CAN frame is 736 bits (64 data
bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
Otherwise during polling for the CAN controller to enter 'Normal CAN
2.0 mode' the timeout limit is exceeded and the configuration fails
with:
| $ ip link set dev can1 up type can bitrate 10000
| [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
| [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
| [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
| RTNETLINK answers: Connection timed out
Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
EFF mode, worst case bit stuffing and interframe spacing at the
current bit rate.
For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
that holds the max frame length in bits, which is 736. This can be
replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
a cleanup patch later.
Fixes: 55e5b97f00 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Fedor Ross <fedor.ross@ifm.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v5-1-06600f34c684@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4d5be94a8 upstream.
When we reconfigure the SVE vector length we discard the backing storage
for the SVE vectors and then reallocate on next SVE use, leaving the SME
specific state alone. This means that we do not enable SME traps if they
were already disabled. That means that userspace code can enter streaming
mode without trapping, putting the task in a state where if we try to save
the state of the task we will fault.
Since the ABI does not specify that changing the SVE vector length disturbs
SME state, and since SVE code may not be aware of SME code in the process,
we shouldn't simply discard any ZA state. Instead immediately reallocate
the storage for SVE, and disable SME if we change the SVE vector length
while there is no SME state active.
Disabling SME traps on SVE vector length changes would make the overall
code more complex since we would have a state where we have valid SME state
stored but might get a SME trap.
Fixes: 9e4ab6c891 ("arm64/sme: Implement vector length configuration prctl()s")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230720-arm64-fix-sve-sme-vl-change-v2-1-8eea06b82d57@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c9d2eb5e9 upstream.
The SMBus I2C buses have limits on the size of transfers they can do but
do not factor in the register length meaning we may try to do a transfer
longer than our length limit, the core will not take care of this.
Future changes will factor this out into the core but there are a number
of users that assume current behaviour so let's just do something
conservative here.
This does not take account padding bits but practically speaking these
are very rarely if ever used on I2C buses given that they generally run
slowly enough to mean there's no issue.
Cc: stable@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Link: https://lore.kernel.org/r/20230712-regmap-max-transfer-v1-2-80e2aed22e83@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4cfca532dd upstream.
The length information for available buffer space for CCA
replies is covered with two fields in the T6 header prepended
on each CCA reply: fromcardlen1 and fromcardlen2. The sum of
these both values must not exceed the AP bus limit for this
card (24KB for CEX8, 12KB CEX7 and older) minus the always
present headers.
The current code adjusted the fromcardlen2 value in case
of exceeding the AP bus limit when there was a non-zero
value given from userspace. Some tests now showed that this
was the wrong assumption. Instead the userspace value given for
this field should always be trusted and if the sum of the
two fields exceeds the AP bus limit for this card the first
field fromcardlen1 should be adjusted instead.
So now the calculation is done with this new insight in mind.
Also some additional checks for overflow have been introduced
and some comments to provide some documentation for future
maintainers of this complicated calculation code.
Furthermore the 128 bytes of fix overhead which is used
in the current code is not correct. Investigations showed
that for a reply always the same two header structs are
prepended before a possible payload. So this is also fixed
with this patch.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc64734825 upstream.
When problems were noticed with the register address not being taken
into account when limiting raw transfers with I2C devices we fixed this
in the core. Unfortunately it has subsequently been realised that a lot
of buses were relying on the prior behaviour, partly due to unclear
documentation not making it obvious what was intended in the core. This
is all more involved to fix than is sensible for a fix commit so let's
just drop the original fixes, a separate commit will fix the originally
observed problem in an I2C specific way
Fixes: 3981514180 ("regmap: Account for register length when chunking")
Fixes: c8e796895e ("regmap: spi-avmm: Fix regmap_bus max_raw_write")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230712-regmap-max-transfer-v1-1-80e2aed22e83@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b19c98f237 upstream.
Syzbot reported a panic that looks like this:
assertion failed: fs_info->exclusive_operation == BTRFS_EXCLOP_BALANCE_PAUSED, in fs/btrfs/ioctl.c:465
------------[ cut here ]------------
kernel BUG at fs/btrfs/messages.c:259!
RIP: 0010:btrfs_assertfail+0x2c/0x30 fs/btrfs/messages.c:259
Call Trace:
<TASK>
btrfs_exclop_balance fs/btrfs/ioctl.c:465 [inline]
btrfs_ioctl_balance fs/btrfs/ioctl.c:3564 [inline]
btrfs_ioctl+0x531e/0x5b30 fs/btrfs/ioctl.c:4632
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The reproducer is running a balance and a cancel or pause in parallel.
The way balance finishes is a bit wonky, if we were paused we need to
save the balance_ctl in the fs_info, but clear it otherwise and cleanup.
However we rely on the return values being specific errors, or having a
cancel request or no pause request. If balance completes and returns 0,
but we have a pause or cancel request we won't do the appropriate
cleanup, and then the next time we try to start a balance we'll trip
this ASSERT.
The error handling is just wrong here, we always want to clean up,
unless we got -ECANCELLED and we set the appropriate pause flag in the
exclusive op. With this patch the reproducer ran for an hour without
tripping, previously it would trip in less than a few minutes.
Reported-by: syzbot+c0f3acf145cb465426d5@syzkaller.appspotmail.com
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1a07c2b4e upstream.
At exclude_super_stripes(), if we happen to find a block group that has
super blocks mapped to it and we are on a zoned filesystem, we error out
as this is not supposed to happen, indicating either a bug or maybe some
memory corruption for example. However we are exiting the function without
freeing the memory allocated for the logical address of the super blocks.
Fix this by freeing the logical address.
Fixes: 12659251ca ("btrfs: implement log-structured superblock for ZONED mode")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b777d279ff upstream.
At btrfs_orphan_cleanup(), if we were able to find the inode, we do an
iput() on the inode, then if btrfs_drop_verity_items() succeeds and then
either btrfs_start_transaction() or btrfs_del_orphan_item() fail, we do
another iput() in the respective error paths, resulting in an extra iput()
on the inode.
Fix this by setting inode to NULL after the first iput(), as iput()
ignores a NULL inode pointer argument.
Fixes: a13bb2c038 ("btrfs: add missing iputs on orphan cleanup failure")
CC: stable@vger.kernel.org # 6.4
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 17b17fcd6d upstream.
While trying to get the subpage blocksize tests running, I hit the
following panic on generic/476
assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:229
kernel BUG at fs/btrfs/subpage.c:229!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
CPU: 1 PID: 1453 Comm: fsstress Not tainted 6.4.0-rc7+ #12
Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20230301gitf80f052277c8-26.fc38 03/01/2023
pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : btrfs_subpage_assert+0xbc/0xf0
lr : btrfs_subpage_assert+0xbc/0xf0
Call trace:
btrfs_subpage_assert+0xbc/0xf0
btrfs_subpage_clear_checked+0x38/0xc0
btrfs_page_clear_checked+0x48/0x98
btrfs_truncate_block+0x5d0/0x6a8
btrfs_cont_expand+0x5c/0x528
btrfs_write_check.isra.0+0xf8/0x150
btrfs_buffered_write+0xb4/0x760
btrfs_do_write_iter+0x2f8/0x4b0
btrfs_file_write_iter+0x1c/0x30
do_iter_readv_writev+0xc8/0x158
do_iter_write+0x9c/0x210
vfs_iter_write+0x24/0x40
iter_file_splice_write+0x224/0x390
direct_splice_actor+0x38/0x68
splice_direct_to_actor+0x12c/0x260
do_splice_direct+0x90/0xe8
generic_copy_file_range+0x50/0x90
vfs_copy_file_range+0x29c/0x470
__arm64_sys_copy_file_range+0xcc/0x498
invoke_syscall.constprop.0+0x80/0xd8
do_el0_svc+0x6c/0x168
el0_svc+0x50/0x1b0
el0t_64_sync_handler+0x114/0x120
el0t_64_sync+0x194/0x198
This happens because during btrfs_cont_expand we'll get a page, set it
as mapped, and if it's not Uptodate we'll read it. However between the
read and re-locking the page we could have called release_folio() on the
page, but left the page in the file mapping. release_folio() can clear
the page private, and thus further down we blow up when we go to modify
the subpage bits.
Fix this by putting the set_page_extent_mapped() after the read. This
is safe because read_folio() will call set_page_extent_mapped() before
it does the read, and then if we clear page private but leave it on the
mapping we're completely safe re-setting set_page_extent_mapped(). With
this patch I can now run generic/476 without panicing.
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 486c737f7f upstream.
[REGRESSION]
Commit 75b4703329 ("btrfs: raid56: migrate recovery and scrub recovery
path to use error_bitmap") changed the behavior of scrub_rbio().
Initially if we have no error reading the raid bio, we will assign
@need_check to true, then finish_parity_scrub() would later verify the
content of P/Q stripes before writeback.
But after that commit we never verify the content of P/Q stripes and
just writeback them.
This can lead to unrepaired P/Q stripes during scrub, or already
corrupted P/Q copied to the dev-replace target.
[FIX]
The situation is more complex than the regression, in fact the initial
behavior is not 100% correct either.
If we have the following rare case, it can still lead to the same
problem using the old behavior:
0 16K 32K 48K 64K
Data 1: |IIIIIII| |
Data 2: | |
Parity: | |CCCCCCC| |
Where "I" means IO error, "C" means corruption.
In the above case, we're scrubbing the parity stripe, then read out all
the contents of Data 1, Data 2, Parity stripes.
But found IO error in Data 1, which leads to rebuild using Data 2 and
Parity and got the correct data.
In that case, we would not verify if the Parity is correct for range
[16K, 32K).
So here we have to always verify the content of Parity no matter if we
did recovery or not.
This patch would remove the @need_check parameter of
finish_parity_scrub() completely, and would always do the P/Q
verification before writeback.
Fixes: 75b4703329 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap")
CC: stable@vger.kernel.org # 6.2+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3066ff9347 upstream.
This is just a safety precaution to avoid checking flags on memory that was
initialized on the user space side. libfuse zeroes struct fuse_init_out
outarg, but this is not guranteed to be done in all implementations.
Better is to act on flags and to only apply flags2 when FUSE_INIT_EXT is
set.
There is a risk with this change, though - it might break existing user
space libraries, which are already using flags2 without setting
FUSE_INIT_EXT.
The corresponding libfuse patch is here
https://github.com/libfuse/libfuse/pull/662
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Fixes: 53db28933e ("fuse: extend init flags")
Cc: <stable@vger.kernel.org> # v5.17
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cadfbd5a1 upstream.
Add an init flag idicating whether the FUSE_EXPIRE_ONLY flag of
FUSE_NOTIFY_INVAL_ENTRY is effective.
This is needed for backports of this feature, otherwise the server could
just check the protocol version.
Fixes: 4f8d37020e ("fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY")
Cc: <stable@vger.kernel.org> # v6.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbaee87f2e upstream.
At btrfs_orphan_cleanup(), if we can't find an inode (btrfs_iget() returns
an -ENOENT error pointer), we proceed with 'ret' set to -ENOENT and the
inode pointer set to ERR_PTR(-ENOENT). Later when we proceed to the body
of the following if statement:
if (ret == -ENOENT || inode->i_nlink) {
(...)
trans = btrfs_start_transaction(root, 1);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
iput(inode);
goto out;
}
(...)
ret = btrfs_del_orphan_item(trans, root,
found_key.objectid);
btrfs_end_transaction(trans);
if (ret) {
iput(inode);
goto out;
}
continue;
}
If we get an error from btrfs_start_transaction() or from the call to
btrfs_del_orphan_item() we end calling iput() against an inode pointer
that has a value of ERR_PTR(-ENOENT), resulting in a crash with the
following trace:
[876.667] BUG: kernel NULL pointer dereference, address: 0000000000000096
[876.667] #PF: supervisor read access in kernel mode
[876.667] #PF: error_code(0x0000) - not-present page
[876.667] PGD 0 P4D 0
[876.668] Oops: 0000 [#1] PREEMPT SMP PTI
[876.668] CPU: 0 PID: 2356187 Comm: mount Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1
[876.668] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[876.668] RIP: 0010:iput+0xa/0x20
[876.668] Code: ff ff ff 66 (...)
[876.669] RSP: 0018:ffffafa9c0c9f9d0 EFLAGS: 00010282
[876.669] RAX: ffffffffffffffe4 RBX: 000000000009453b RCX: 0000000000000000
[876.669] RDX: 0000000000000001 RSI: ffffafa9c0c9f930 RDI: fffffffffffffffe
[876.669] RBP: ffff95c612f3b800 R08: 0000000000000001 R09: ffffffffffffffe4
[876.670] R10: 00018f2a71010000 R11: 000000000ead96e3 R12: ffff95cb7d6909a0
[876.670] R13: fffffffffffffffe R14: ffff95c60f477000 R15: 00000000ffffffe4
[876.670] FS: 00007f5fbe30a840(0000) GS:ffff95ccdfa00000(0000) knlGS:0000000000000000
[876.670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[876.671] CR2: 0000000000000096 CR3: 000000055e9f6004 CR4: 0000000000370ef0
[876.671] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[876.671] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[876.672] Call Trace:
[876.744] <TASK>
[876.744] ? __die_body+0x1b/0x60
[876.744] ? page_fault_oops+0x15d/0x450
[876.745] ? __kmem_cache_alloc_node+0x47/0x410
[876.745] ? do_user_addr_fault+0x65/0x8a0
[876.745] ? exc_page_fault+0x74/0x170
[876.746] ? asm_exc_page_fault+0x22/0x30
[876.746] ? iput+0xa/0x20
[876.746] btrfs_orphan_cleanup+0x221/0x330 [btrfs]
[876.746] btrfs_lookup_dentry+0x58f/0x5f0 [btrfs]
[876.747] btrfs_lookup+0xe/0x30 [btrfs]
[876.747] __lookup_slow+0x82/0x130
[876.785] walk_component+0xe5/0x160
[876.786] path_lookupat.isra.0+0x6e/0x150
[876.786] filename_lookup+0xcf/0x1a0
[876.786] ? mod_objcg_state+0xd2/0x360
[876.786] ? obj_cgroup_charge+0xf5/0x110
[876.787] ? should_failslab+0xa/0x20
[876.787] ? kmem_cache_alloc+0x47/0x450
[876.787] vfs_path_lookup+0x51/0x90
[876.788] mount_subtree+0x8d/0x130
[876.788] btrfs_mount+0x149/0x410 [btrfs]
[876.788] ? __kmem_cache_alloc_node+0x47/0x410
[876.788] ? vfs_parse_fs_param+0xc0/0x110
[876.789] legacy_get_tree+0x24/0x50
[876.834] vfs_get_tree+0x22/0xd0
[876.852] path_mount+0x2d8/0x9c0
[876.852] do_mount+0x79/0x90
[876.852] __x64_sys_mount+0x8e/0xd0
[876.853] do_syscall_64+0x38/0x90
[876.899] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[876.958] RIP: 0033:0x7f5fbe50b76a
[876.959] Code: 48 8b 0d a9 (...)
[876.959] RSP: 002b:00007fff01925798 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[876.959] RAX: ffffffffffffffda RBX: 00007f5fbe694264 RCX: 00007f5fbe50b76a
[876.960] RDX: 0000561bde6c8720 RSI: 0000561bde6bdec0 RDI: 0000561bde6c31a0
[876.960] RBP: 0000561bde6bdc70 R08: 0000000000000000 R09: 0000000000000001
[876.960] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[876.960] R13: 0000561bde6c31a0 R14: 0000561bde6c8720 R15: 0000561bde6bdc70
[876.960] </TASK>
So fix this by setting 'inode' to NULL whenever we get an error from
btrfs_iget(), and to make the code simpler, stop testing for 'ret' being
-ENOENT to check if we have an inode - instead test for 'inode' being NULL
or not. Having a NULL 'inode' prevents any iput() call from crashing, as
iput() ignores NULL inode pointers. Also, stop testing for a NULL return
value from btrfs_iget() with PTR_ERR_OR_ZERO(), because btrfs_iget() never
returns NULL - in case an inode is not found, it returns ERR_PTR(-ENOENT),
and in case of memory allocation failure, it returns ERR_PTR(-ENOMEM).
We also don't need the extra iput() calls on the error branches for the
btrfs_start_transaction() and btrfs_del_orphan_item() calls, as we have
already called iput() before, so remove them.
Fixes: a13bb2c038 ("btrfs: add missing iputs on orphan cleanup failure")
CC: stable@vger.kernel.org # 6.4
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c66e1c68c1 upstream.
After switching from dwarf_decl_file() to die_get_decl_file(), it is not
possible to add probes for certain functions:
$ perf probe -x /usr/lib/systemd/systemd-logind match_unit_removed
A function DIE doesn't have decl_line. Maybe broken DWARF?
A function DIE doesn't have decl_line. Maybe broken DWARF?
Probe point 'match_unit_removed' not found.
Error: Failed to add events.
The problem is that die_get_decl_file() uses the wrong CU to search for
the file. elfutils commit e1db5cdc9f has some good explanation for this:
dwarf_decl_file uses dwarf_attr_integrate to get the DW_AT_decl_file
attribute. This means the attribute might come from a different DIE
in a different CU. If so, we need to use the CU associated with the
attribute, not the original DIE, to resolve the file name.
This patch uses the same source of information as elfutils: use attribute
DW_AT_decl_file and use this CU to search for the file.
Fixes: dc9a5d2ccd ("perf probe: Fix to get declared file name from clang DWARF5")
Signed-off-by: Georg Müller <georgmueller@gmx.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: regressions@lists.linux.dev
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230628084551.1860532-6-georgmueller@gmx.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d55901522f upstream.
When making a DNS query inside the kernel using dns_query(), the request
code can in rare cases end up creating a duplicate index key in the
assoc_array of the destination keyring. It is eventually found by
a BUG_ON() check in the assoc_array implementation and results in
a crash.
Example report:
[2158499.700025] kernel BUG at ../lib/assoc_array.c:652!
[2158499.700039] invalid opcode: 0000 [#1] SMP PTI
[2158499.700065] CPU: 3 PID: 31985 Comm: kworker/3:1 Kdump: loaded Not tainted 5.3.18-150300.59.90-default #1 SLE15-SP3
[2158499.700096] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[2158499.700351] Workqueue: cifsiod cifs_resolve_server [cifs]
[2158499.700380] RIP: 0010:assoc_array_insert+0x85f/0xa40
[2158499.700401] Code: ff 74 2b 48 8b 3b 49 8b 45 18 4c 89 e6 48 83 e7 fe e8 95 ec 74 00 3b 45 88 7d db 85 c0 79 d4 0f 0b 0f 0b 0f 0b e8 41 f2 be ff <0f> 0b 0f 0b 81 7d 88 ff ff ff 7f 4c 89 eb 4c 8b ad 58 ff ff ff 0f
[2158499.700448] RSP: 0018:ffffc0bd6187faf0 EFLAGS: 00010282
[2158499.700470] RAX: ffff9f1ea7da2fe8 RBX: ffff9f1ea7da2fc1 RCX: 0000000000000005
[2158499.700492] RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000
[2158499.700515] RBP: ffffc0bd6187fbb0 R08: ffff9f185faf1100 R09: 0000000000000000
[2158499.700538] R10: ffff9f1ea7da2cc0 R11: 000000005ed8cec8 R12: ffffc0bd6187fc28
[2158499.700561] R13: ffff9f15feb8d000 R14: ffff9f1ea7da2fc0 R15: ffff9f168dc0d740
[2158499.700585] FS: 0000000000000000(0000) GS:ffff9f185fac0000(0000) knlGS:0000000000000000
[2158499.700610] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2158499.700630] CR2: 00007fdd94fca238 CR3: 0000000809d8c006 CR4: 00000000003706e0
[2158499.700702] Call Trace:
[2158499.700741] ? key_alloc+0x447/0x4b0
[2158499.700768] ? __key_link_begin+0x43/0xa0
[2158499.700790] __key_link_begin+0x43/0xa0
[2158499.700814] request_key_and_link+0x2c7/0x730
[2158499.700847] ? dns_resolver_read+0x20/0x20 [dns_resolver]
[2158499.700873] ? key_default_cmp+0x20/0x20
[2158499.700898] request_key_tag+0x43/0xa0
[2158499.700926] dns_query+0x114/0x2ca [dns_resolver]
[2158499.701127] dns_resolve_server_name_to_ip+0x194/0x310 [cifs]
[2158499.701164] ? scnprintf+0x49/0x90
[2158499.701190] ? __switch_to_asm+0x40/0x70
[2158499.701211] ? __switch_to_asm+0x34/0x70
[2158499.701405] reconn_set_ipaddr_from_hostname+0x81/0x2a0 [cifs]
[2158499.701603] cifs_resolve_server+0x4b/0xd0 [cifs]
[2158499.701632] process_one_work+0x1f8/0x3e0
[2158499.701658] worker_thread+0x2d/0x3f0
[2158499.701682] ? process_one_work+0x3e0/0x3e0
[2158499.701703] kthread+0x10d/0x130
[2158499.701723] ? kthread_park+0xb0/0xb0
[2158499.701746] ret_from_fork+0x1f/0x40
The situation occurs as follows:
* Some kernel facility invokes dns_query() to resolve a hostname, for
example, "abcdef". The function registers its global DNS resolver
cache as current->cred.thread_keyring and passes the query to
request_key_net() -> request_key_tag() -> request_key_and_link().
* Function request_key_and_link() creates a keyring_search_context
object. Its match_data.cmp method gets set via a call to
type->match_preparse() (resolves to dns_resolver_match_preparse()) to
dns_resolver_cmp().
* Function request_key_and_link() continues and invokes
search_process_keyrings_rcu() which returns that a given key was not
found. The control is then passed to request_key_and_link() ->
construct_alloc_key().
* Concurrently to that, a second task similarly makes a DNS query for
"abcdef." and its result gets inserted into the DNS resolver cache.
* Back on the first task, function construct_alloc_key() first runs
__key_link_begin() to determine an assoc_array_edit operation to
insert a new key. Index keys in the array are compared exactly as-is,
using keyring_compare_object(). The operation finds that "abcdef" is
not yet present in the destination keyring.
* Function construct_alloc_key() continues and checks if a given key is
already present on some keyring by again calling
search_process_keyrings_rcu(). This search is done using
dns_resolver_cmp() and "abcdef" gets matched with now present key
"abcdef.".
* The found key is linked on the destination keyring by calling
__key_link() and using the previously calculated assoc_array_edit
operation. This inserts the "abcdef." key in the array but creates
a duplicity because the same index key is already present.
Fix the problem by postponing __key_link_begin() in
construct_alloc_key() until an actual key which should be linked into
the destination keyring is determined.
[jarkko@kernel.org: added a fixes tag and cc to stable]
Cc: stable@vger.kernel.org # v5.3+
Fixes: df593ee23e ("keys: Hoist locking out of __key_link_begin()")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Joey Lee <jlee@suse.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32832a407a upstream.
The io_uring testcase is broken on IA-64 since commit d808459b2e
("io_uring: Adjust mapping wrt architecture aliasing requirements").
The reason is, that this commit introduced an own architecture
independend get_unmapped_area() search algorithm which finds on IA-64 a
memory region which is outside of the regular memory region used for
shared userspace mappings and which can't be used on that platform
due to aliasing.
To avoid similar problems on IA-64 and other platforms in the future,
it's better to switch back to the architecture-provided
get_unmapped_area() function and adjust the needed input parameters
before the call. Beside fixing the issue, the function now becomes
easier to understand and maintain.
This patch has been successfully tested with the io_uring testcase on
physical x86-64, ppc64le, IA-64 and PA-RISC machines. On PA-RISC the LTP
mmmap testcases did not report any regressions.
Cc: stable@vger.kernel.org # 6.4
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Fixes: d808459b2e ("io_uring: Adjust mapping wrt architecture aliasing requirements")
Link: https://lore.kernel.org/r/20230721152432.196382-2-deller@gmx.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 522b1d6921 upstream.
Add a fix for the Zen2 VZEROUPPER data corruption bug where under
certain circumstances executing VZEROUPPER can cause register
corruption or leak data.
The optimal fix is through microcode but in the case the proper
microcode revision has not been applied, enable a fallback fix using
a chicken bit.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 790071347a upstream.
Change ndo_set_mac_address to dev_set_mac_address because
dev_set_mac_address provides a way to notify network layer about MAC
change. In other case, services may not aware about MAC change and keep
using old one which set from network adapter driver.
As example, DHCP client from systemd do not update MAC address without
notification from net subsystem which leads to the problem with acquiring
the right address from DHCP server.
Fixes: cb10c7c0df ("net/ncsi: Add NCSI Broadcom OEM command")
Cc: stable@vger.kernel.org # v6.0+ 2f38e84 net/ncsi: make one oem_gma function for all mfr id
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: Ivan Mikhaylov <fr0st61te@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e076c73e4 upstream.
This requires a bit of background. Properly done a modeset driver's
unload/remove sequence should be
drm_dev_unplug();
drm_atomic_helper_shutdown();
drm_dev_put();
The trouble is that the drm_dev_unplugged() checks are by design racy,
they do not synchronize against all outstanding ioctl. This is because
those ioctl could block forever (both for modeset and for driver
specific ioctls), leading to deadlocks in hotunplug. Instead the code
sections that touch the hardware need to be annotated with
drm_dev_enter/exit, to avoid accessing hardware resources after the
unload/remove has finished.
To avoid use-after-free issues all the involved userspace visible
objects are supposed to hold a reference on the underlying drm_device,
like drm_file does.
The issue now is that we missed one, the atomic modeset ioctl can be run
in a nonblocking fashion, and in that case it cannot rely on the implied
drm_device reference provided by the ioctl calling context. This can
result in a use-after-free if an nonblocking atomic commit is carefully
raced against a driver unload.
Fix this by unconditionally grabbing a drm_device reference for any
drm_atomic_state structures. Strictly speaking this isn't required for
blocking commits and TEST_ONLY calls, but it's the simpler approach.
Thanks to shanzhulig for the initial idea of grabbing an unconditional
reference, I just added comments, a condensed commit message and fixed a
minor potential issue in where exactly we drop the final reference.
Reported-by: shanzhulig <shanzhulig@gmail.com>
Suggested-by: shanzhulig <shanzhulig@gmail.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b843adde8d upstream.
System crash, where driver is accessing scsi layer's
memory (scsi_cmnd->device->host) to search for a well known internal
pointer (vha). The scsi_cmnd was released back to upper layer which
could be freed, but the driver is still accessing it.
7 [ffffa8e8d2c3f8d0] page_fault at ffffffff86c010fe
[exception RIP: __qla2x00_eh_wait_for_pending_commands+240]
RIP: ffffffffc0642350 RSP: ffffa8e8d2c3f988 RFLAGS: 00010286
RAX: 0000000000000165 RBX: 0000000000000002 RCX: 00000000000036d8
RDX: 0000000000000000 RSI: ffff9c5c56535188 RDI: 0000000000000286
RBP: ffff9c5bf7aa4a58 R8: ffff9c589aecdb70 R9: 00000000000003d1
R10: 0000000000000001 R11: 0000000000380000 R12: ffff9c5c5392bc78
R13: ffff9c57044ff5c0 R14: ffff9c56b5a3aa00 R15: 00000000000006db
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
8 [ffffa8e8d2c3f9c8] qla2x00_eh_wait_for_pending_commands at ffffffffc0646dd5 [qla2xxx]
9 [ffffa8e8d2c3fa00] __qla2x00_async_tm_cmd at ffffffffc0658094 [qla2xxx]
Remove access of freed memory. Currently the driver was checking to see if
scsi_done was called by seeing if the sp->type has changed. Instead,
check to see if the command has left the oustanding_cmds[] array as
sign of scsi_done was called.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc0cba0c7b upstream.
System crash due to use after free.
Current code allows terminate_rport_io to exit before making
sure all IOs has returned. For FCP-2 device, IO's can hang
on in HW because driver has not tear down the session in FW at
first sign of cable pull. When dev_loss_tmo timer pops,
terminate_rport_io is called and upper layer is about to
free various resources. Terminate_rport_io trigger qla to do
the final cleanup, but the cleanup might not be fast enough where it
leave qla still holding on to the same resource.
Wait for IO's to return to upper layer before resources are freed.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0a3022f30 upstream.
When users register an event the name of the event and it's argument are
checked to ensure they match if the event already exists. Normally all
arguments are in the form of "type name", except for when the type
starts with "struct ". In those cases, the size of the struct is passed
in addition to the name, IE: "struct my_struct a 20" for an argument
that is of type "struct my_struct" with a field name of "a" and has the
size of 20 bytes.
The current code does not honor the above case properly when comparing
a match. This causes the event register to fail even when the same
string was used for events that contain a struct argument within them.
The example above "struct my_struct a 20" generates a match string of
"struct my_struct a" omitting the size field.
Add the struct size of the existing field when generating a comparison
string for a struct field to ensure proper match checking.
Link: https://lkml.kernel.org/r/20230629235049.581-2-beaub@linux.microsoft.com
Cc: stable@vger.kernel.org
Fixes: e6f89a1498 ("tracing/user_events: Ensure user provided strings are safely formatted")
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 797311bce5 upstream.
Fix to record 0-length data to data_loc in fetch_store_string*() if it fails
to get the string data.
Currently those expect that the data_loc is updated by store_trace_args() if
it returns the error code. However, that does not work correctly if the
argument is an array of strings. In that case, store_trace_args() only clears
the first entry of the array (which may have no error) and leaves other
entries. So it should be cleared by fetch_store_string*() itself.
Also, 'dyndata' and 'maxlen' in store_trace_args() should be updated
only if it is used (ret > 0 and argument is a dynamic data.)
Link: https://lore.kernel.org/all/168908496683.123124.4761206188794205601.stgit@devnote2/
Fixes: 40b53b7718 ("tracing: probeevent: Add array type support")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1f047bd7c upstream.
pSMB->hdr.Protocol is an array of size 4 bytes, hence when the compiler
analyzes this line of code
parm_data = ((char *) &pSMB->hdr.Protocol) + offset;
it legitimately complains about the fact that offset points outside the
bounds of the array. Notice that the compiler gives priority to the object
as an array, rather than merely the address of one more byte in a structure
to wich offset should be added (which seems to be the actual intention of
the original implementation).
Fix this by explicitly instructing the compiler to treat the code as a
sequence of bytes in struct smb_com_transaction2_spi_req, and not as an
array accessed through pointer notation.
Notice that ((char *)pSMB) + sizeof(pSMB->hdr.smb_buf_length) points to
the same address as ((char *) &pSMB->hdr.Protocol), therefore this results
in no differences in binary output.
Fixes the following -Wstringop-overflow warnings when built s390
architecture with defconfig (GCC 13):
CC [M] fs/smb/client/cifssmb.o
In function 'cifs_init_ace',
inlined from 'posix_acl_to_cifs' at fs/smb/client/cifssmb.c:3046:3,
inlined from 'cifs_do_set_acl' at fs/smb/client/cifssmb.c:3191:15:
fs/smb/client/cifssmb.c:2987:31: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
2987 | cifs_ace->cifs_e_perm = local_ace->e_perm;
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
In file included from fs/smb/client/cifssmb.c:27:
fs/smb/client/cifspdu.h: In function 'cifs_do_set_acl':
fs/smb/client/cifspdu.h:384:14: note: at offset [7, 11] into destination object 'Protocol' of size 4
384 | __u8 Protocol[4];
| ^~~~~~~~
In function 'cifs_init_ace',
inlined from 'posix_acl_to_cifs' at fs/smb/client/cifssmb.c:3046:3,
inlined from 'cifs_do_set_acl' at fs/smb/client/cifssmb.c:3191:15:
fs/smb/client/cifssmb.c:2988:30: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
2988 | cifs_ace->cifs_e_tag = local_ace->e_tag;
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
fs/smb/client/cifspdu.h: In function 'cifs_do_set_acl':
fs/smb/client/cifspdu.h:384:14: note: at offset [6, 10] into destination object 'Protocol' of size 4
384 | __u8 Protocol[4];
| ^~~~~~~~
This helps with the ongoing efforts to globally enable
-Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/310
Fixes: dc1af4c4b4 ("cifs: implement set acl method")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61d9658050 upstream.
When using pm_nl_ctl to validate userspace path-manager's behaviours, it
was failing on 32-bit architectures ~half of the time.
pm_nl_ctl was not reporting any error but the command was not doing what
it was expected to do. As a result, the expected linked event was not
triggered after and the test failed.
This is due to the fact the token given in argument to the application
was parsed as an integer with atoi(): in a 32-bit arch, if the number
was bigger than INT_MAX, 2147483647 was used instead.
This can simply be fixed by using strtoul() instead of atoi().
The errors have been seen "by chance" when manually looking at the
results from LKFT.
Fixes: 9a0b36509d ("selftests: mptcp: support MPTCP_PM_CMD_ANNOUNCE")
Cc: stable@vger.kernel.org
Fixes: ecd2a77d67 ("selftests: mptcp: support MPTCP_PM_CMD_REMOVE")
Fixes: cf8d0a6dfd ("selftests: mptcp: support MPTCP_PM_CMD_SUBFLOW_CREATE")
Fixes: 57cc361b8d ("selftests: mptcp: support MPTCP_PM_CMD_SUBFLOW_DESTROY")
Fixes: ca188a25d4 ("selftests: mptcp: userspace PM support for MP_PRIO signals")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c8880fcaa upstream.
MPTCP selftests are using TCP SYN Cookies for quite a while now, since
v5.9.
Some CIs don't have this config option enabled and this is causing
issues in the tests:
# ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 167ms) sysctl: cannot stat /proc/sys/net/ipv4/tcp_syncookies: No such file or directory
# [ OK ]./mptcp_connect.sh: line 554: [: -eq: unary operator expected
There is no impact in the results but the test is not doing what it is
supposed to do.
Fixes: fed61c4b58 ("selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ac4c28eb7 upstream.
When an error was detected when checking the marks, a message was
correctly printed mentioning the error but followed by another one
saying everything was OK and the selftest was not marked as failed as
expected.
Now the 'ret' variable is directly set to 1 in order to make sure the
exit is done with an error, similar to what is done in other functions.
While at it, the error is correctly propagated to the caller.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: dc65fe82fb ("selftests: mptcp: add packet mark test case")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 221e455045 upstream.
In case of "external" errors when preparing the environment for the
TProxy tests, the subtests were marked as skipped.
This is fine but it means these errors are ignored. On MPTCP Public CI,
we do want to catch such issues and mark the selftest as failed if there
are such issues. We can then use mptcp_lib_fail_if_expected_feature()
helper that has been recently added to fail if needed.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 5fb62e9cd3 ("selftests: mptcp: add tproxy test case")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fffa15bfe upstream.
While tacking care of the mptcp-level listener I unintentionally
moved the subflow level unhash after the subflow listener backlog
cleanup.
That could cause some nasty race and makes the code harder to read.
Address the issue restoring the proper order of operations.
Fixes: 57fc0f1cea ("mptcp: ensure listener is unhashed before updating the sk status")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0226436acf upstream.
Since the blamed commit, closing the first subflow resets the first
subflow socket state to SS_UNCONNECTED.
The current mptcp listen implementation relies only on such
state to prevent touching not-fully-disconnected sockets.
Incoming mptcp fastclose (or paired endpoint removal) unconditionally
closes the first subflow.
All the above allows an incoming fastclose followed by a listen() call
to successfully race with a blocking recvmsg(), potentially causing the
latter to hit a divide by zero bug in cleanup_rbuf/__tcp_select_window().
Address the issue explicitly checking the msk socket state in
mptcp_listen(). An alternative solution would be moving the first
subflow socket state update into mptcp_disconnect(), but in the long
term the first subflow socket should be removed: better avoid relaying
on it for internal consistency check.
Fixes: b29fcfb54c ("mptcp: full disconnect implementation")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/414
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02b0095e2f upstream.
Fix an issue in function 'tracing_err_log_open'.
The function doesn't call 'seq_open' if the file is opened only with
write permissions, which results in 'file->private_data' being left as null.
If we then use 'lseek' on that opened file, 'seq_lseek' dereferences
'file->private_data' in 'mutex_lock(&m->lock)', resulting in a kernel panic.
Writing to this node requires root privileges, therefore this bug
has very little security impact.
Tracefs node: /sys/kernel/tracing/error_log
Example Kernel panic:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
Call trace:
mutex_lock+0x30/0x110
seq_lseek+0x34/0xb8
__arm64_sys_lseek+0x6c/0xb8
invoke_syscall+0x58/0x13c
el0_svc_common+0xc4/0x10c
do_el0_svc+0x24/0x98
el0_svc+0x24/0x88
el0t_64_sync_handler+0x84/0xe4
el0t_64_sync+0x1b4/0x1b8
Code: d503201f aa0803e0 aa1f03e1 aa0103e9 (c8e97d02)
---[ end trace 561d1b49c12cf8a5 ]---
Kernel panic - not syncing: Oops: Fatal exception
Link: https://lore.kernel.org/linux-trace-kernel/20230703155237eucms1p4dfb6a19caa14c79eb6c823d127b39024@eucms1p4
Link: https://lore.kernel.org/linux-trace-kernel/20230704102706eucms1p30d7ecdcc287f46ad67679fc8491b2e0f@eucms1p3
Cc: stable@vger.kernel.org
Fixes: 8a062902be ("tracing: Add tracing error log")
Signed-off-by: Mateusz Stachyra <m.stachyra@samsung.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 195b9cb5b2 upstream.
Ensure running fprobe_exit_handler() has finished before
calling rethook_free() in the unregister_fprobe() so that caller can free
the fprobe right after unregister_fprobe().
unregister_fprobe() ensured that all running fprobe_entry/exit_handler()
have finished by calling unregister_ftrace_function() which synchronizes
RCU. But commit 5f81018753 ("fprobe: Release rethook after the ftrace_ops
is unregistered") changed to call rethook_free() after
unregister_ftrace_function(). So call rethook_stop() to make rethook
disabled before unregister_ftrace_function() and ensure it again.
Here is the possible code flow that can call the exit handler after
unregister_fprobe().
------
CPU1 CPU2
call unregister_fprobe(fp)
...
__fprobe_handler()
rethook_hook() on probed function
unregister_ftrace_function()
return from probed function
rethook hooks
find rh->handler == fprobe_exit_handler
call fprobe_exit_handler()
rethook_free():
set rh->handler = NULL;
return from unreigster_fprobe;
call fp->exit_handler() <- (*)
------
(*) At this point, the exit handler is called after returning from
unregister_fprobe().
This fixes it as following;
------
CPU1 CPU2
call unregister_fprobe()
...
rethook_stop():
set rh->handler = NULL;
__fprobe_handler()
rethook_hook() on probed function
unregister_ftrace_function()
return from probed function
rethook hooks
find rh->handler == NULL
return from rethook
rethook_free()
return from unreigster_fprobe;
------
Link: https://lore.kernel.org/all/168873859949.156157.13039240432299335849.stgit@devnote2/
Fixes: 5f81018753 ("fprobe: Release rethook after the ftrace_ops is unregistered")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f81018753 upstream.
While running bpf selftests it's possible to get following fault:
general protection fault, probably for non-canonical address \
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
...
Call Trace:
<TASK>
fprobe_handler+0xc1/0x270
? __pfx_bpf_testmod_init+0x10/0x10
? __pfx_bpf_testmod_init+0x10/0x10
? bpf_fentry_test1+0x5/0x10
? bpf_fentry_test1+0x5/0x10
? bpf_testmod_init+0x22/0x80
? do_one_initcall+0x63/0x2e0
? rcu_is_watching+0xd/0x40
? kmalloc_trace+0xaf/0xc0
? do_init_module+0x60/0x250
? __do_sys_finit_module+0xac/0x120
? do_syscall_64+0x37/0x90
? entry_SYSCALL_64_after_hwframe+0x72/0xdc
</TASK>
In unregister_fprobe function we can't release fp->rethook while it's
possible there are some of its users still running on another cpu.
Moving rethook_free call after fp->ops is unregistered with
unregister_ftrace_function call.
Link: https://lore.kernel.org/all/20230615115236.3476617-1-jolsa@kernel.org/
Fixes: 5b0ab78998 ("fprobe: Add exit_handler support")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b9352f3f8 upstream.
I don't see a reason why we should treat the case lo < hi differently
and return 0 as period and duty_cycle. The current logic was added with
c375bcbaab ("pwm: meson: Read the full hardware state in
meson_pwm_get_state()"), Martin as original author doesn't remember why
it was implemented this way back then.
So let's handle it as normal use case and also remove the optimization
for lo == 0. I think the improved readability is worth it.
Fixes: c375bcbaab ("pwm: meson: Read the full hardware state in meson_pwm_get_state()")
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27c68c216e upstream.
On SPR, the load latency event needs an auxiliary event in the same
group to work properly. There's a check in intel_pmu_hw_config()
for this to iterate sibling events and find a mem-loads-aux event.
The for_each_sibling_event() has a lockdep assert to make sure if it
disabled hardirq or hold leader->ctx->mutex. This works well if the
given event has a separate leader event since perf_try_init_event()
grabs the leader->ctx->mutex to protect the sibling list. But it can
cause a problem when the event itself is a leader since the event is
not initialized yet and there's no ctx for the event.
Actually I got a lockdep warning when I run the below command on SPR,
but I guess it could be a NULL pointer dereference.
$ perf record -d -e cpu/mem-loads/uP true
The code path to the warning is:
sys_perf_event_open()
perf_event_alloc()
perf_init_event()
perf_try_init_event()
x86_pmu_event_init()
hsw_hw_config()
intel_pmu_hw_config()
for_each_sibling_event()
lockdep_assert_event_ctx()
We don't need for_each_sibling_event() when it's a standalone event.
Let's return the error code directly.
Fixes: f3c0eba287 ("perf: Add a few assertions")
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20230704181516.3293665-1-namhyung@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc8d591654 upstream.
split_if_spec expects a NULL-pointer as an end marker for the argument
list, but tuntap_probe never supplied that terminating NULL. As a result
incorrectly formatted interface specification string may cause a crash
because of the random memory access. Fix that by adding NULL terminator
to the split_if_spec argument list.
Cc: stable@vger.kernel.org
Fixes: 7282bee787 ("[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 8")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26efd79c46 upstream.
As comments in ftrace_process_locs(), there may be NULL pointers in
mcount_loc section:
> Some architecture linkers will pad between
> the different mcount_loc sections of different
> object files to satisfy alignments.
> Skip any NULL pointers.
After commit 20e5227e9f ("ftrace: allow NULL pointers in mcount_loc"),
NULL pointers will be accounted when allocating ftrace pages but skipped
before adding into ftrace pages, this may result in some pages not being
used. Then after commit 706c81f87f ("ftrace: Remove extra helper
functions"), warning may occur at:
WARN_ON(pg->next);
To fix it, only warn for case that no pointers skipped but pages not used
up, then free those unused pages after releasing ftrace_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20230712060452.3175675-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes: 706c81f87f ("ftrace: Remove extra helper functions")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e42907f3a upstream.
Soft lockup occurs when reading file 'trace_pipe':
watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [cat:4488]
[...]
RIP: 0010:ring_buffer_empty_cpu+0xed/0x170
RSP: 0018:ffff88810dd6fc48 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000246 RCX: ffffffff93d1aaeb
RDX: ffff88810a280040 RSI: 0000000000000008 RDI: ffff88811164b218
RBP: ffff88811164b218 R08: 0000000000000000 R09: ffff88815156600f
R10: ffffed102a2acc01 R11: 0000000000000001 R12: 0000000051651901
R13: 0000000000000000 R14: ffff888115e49500 R15: 0000000000000000
[...]
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8d853c2000 CR3: 000000010dcd8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__find_next_entry+0x1a8/0x4b0
? peek_next_entry+0x250/0x250
? down_write+0xa5/0x120
? down_write_killable+0x130/0x130
trace_find_next_entry_inc+0x3b/0x1d0
tracing_read_pipe+0x423/0xae0
? tracing_splice_read_pipe+0xcb0/0xcb0
vfs_read+0x16b/0x490
ksys_read+0x105/0x210
? __ia32_sys_pwrite64+0x200/0x200
? switch_fpu_return+0x108/0x220
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x61/0xc6
Through the vmcore, I found it's because in tracing_read_pipe(),
ring_buffer_empty_cpu() found some buffer is not empty but then it
cannot read anything due to "rb_num_of_entries() == 0" always true,
Then it infinitely loop the procedure due to user buffer not been
filled, see following code path:
tracing_read_pipe() {
... ...
waitagain:
tracing_wait_pipe() // 1. find non-empty buffer here
trace_find_next_entry_inc() // 2. loop here try to find an entry
__find_next_entry()
ring_buffer_empty_cpu(); // 3. find non-empty buffer
peek_next_entry() // 4. but peek always return NULL
ring_buffer_peek()
rb_buffer_peek()
rb_get_reader_page()
// 5. because rb_num_of_entries() == 0 always true here
// then return NULL
// 6. user buffer not been filled so goto 'waitgain'
// and eventually leads to an deadloop in kernel!!!
}
By some analyzing, I found that when resetting ringbuffer, the 'entries'
of its pages are not all cleared (see rb_reset_cpu()). Then when reducing
the ringbuffer, and if some reduced pages exist dirty 'entries' data, they
will be added into 'cpu_buffer->overrun' (see rb_remove_pages()), which
cause wrong 'overrun' count and eventually cause the deadloop issue.
To fix it, we need to clear every pages in rb_reset_cpu().
Link: https://lore.kernel.org/linux-trace-kernel/20230708225144.3785600-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes: a5fb833172 ("ring-buffer: Fix uninitialized read_stamp")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e9cb763e9 upstream.
The ENA adapters on our instances occasionally reset. Once recently
logged a UBSAN failure to console in the process:
UBSAN: shift-out-of-bounds in build/linux/drivers/net/ethernet/amazon/ena/ena_com.c:540:13
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 28 PID: 70012 Comm: kworker/u72:2 Kdump: loaded not tainted 5.15.117
Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017
Workqueue: ena ena_fw_reset_device [ena]
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
? __const_udelay+0x43/0x50
ena_delay_exponential_backoff_us.cold+0x16/0x1e [ena]
wait_for_reset_state+0x54/0xa0 [ena]
ena_com_dev_reset+0xc8/0x110 [ena]
ena_down+0x3fe/0x480 [ena]
ena_destroy_device+0xeb/0xf0 [ena]
ena_fw_reset_device+0x30/0x50 [ena]
process_one_work+0x22b/0x3d0
worker_thread+0x4d/0x3f0
? process_one_work+0x3d0/0x3d0
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
Apparently, the reset delays are getting so large they can trigger a
UBSAN panic.
Looking at the code, the current timeout is capped at 5000us. Using a
base value of 100us, the current code will overflow after (1<<29). Even
at values before 32, this function wraps around, perhaps
unintentionally.
Cap the value of the exponent used for this backoff at (1<<16) which is
larger than currently necessary, but large enough to support bigger
values in the future.
Cc: stable@vger.kernel.org
Fixes: 4bb7f4cf60 ("net: ena: reduce driver load time")
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230711013621.GE1926@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 963b54df82 upstream.
When allocating the 2D array for handling IRQ type registers in
regmap_add_irq_chip_fwnode(), the intent is to allocate a matrix
with num_config_bases rows and num_config_regs columns.
This is currently handled by allocating a buffer to hold a pointer for
each row (i.e. num_config_bases). After that, the logic attempts to
allocate the memory required to hold the register configuration for
each row. However, instead of doing this allocation for each row
(i.e. num_config_bases allocations), the logic erroneously does this
allocation num_config_regs number of times.
This scenario can lead to out-of-bounds accesses when num_config_regs
is greater than num_config_bases. Fix this by updating the terminating
condition of the loop that allocates the memory for holding the register
configuration to allocate memory only for each row in the matrix.
Amit Pundir reported a crash that was occurring on his db845c device
due to memory corruption (see "Closes" tag for Amit's report). The KASAN
report below helped narrow it down to this issue:
[ 14.033877][ T1] ==================================================================
[ 14.042507][ T1] BUG: KASAN: invalid-access in regmap_add_irq_chip_fwnode+0x594/0x1364
[ 14.050796][ T1] Write of size 8 at addr 06ffff8081021850 by task init/1
[ 14.242004][ T1] The buggy address belongs to the object at ffffff8081021850
[ 14.242004][ T1] which belongs to the cache kmalloc-8 of size 8
[ 14.255669][ T1] The buggy address is located 0 bytes inside of
[ 14.255669][ T1] 8-byte region [ffffff8081021850, ffffff8081021858)
Fixes: faa87ce919 ("regmap-irq: Introduce config registers for irq types")
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Closes: https://lore.kernel.org/all/CAMi1Hd04mu6JojT3y6wyN2YeVkPR5R3qnkKJ8iR8if_YByCn4w@mail.gmail.com/
Tested-by: John Stultz <jstultz@google.com>
Tested-by: Amit Pundir <amit.pundir@linaro.org> # tested on Dragonboard 845c
Cc: stable@vger.kernel.org # v6.0+
Cc: Aidan MacDonald <aidanmacdonald.0x0@gmail.com>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Isaac J. Manjarres" <isaacmanjarres@google.com>
Link: https://lore.kernel.org/r/20230711193059.2480971-1-isaacmanjarres@google.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66843b14fb upstream.
Since commit 096b52fd2b ("perf: RISC-V: throttle perf events") the
perf_sample_event_took() function was added to report time spent in
overflow interrupts. If the interrupt takes too long, the perf framework
will lower the sysctl_perf_event_sample_rate and max_samples_per_tick.
When hwc->interrupts is larger than max_samples_per_tick, the
hwc->interrupts will be set to MAX_INTERRUPTS, and events will be
throttled within the __perf_event_account_interrupt() function.
However, the RISC-V PMU driver doesn't call riscv_pmu_stop() to update the
PERF_HES_STOPPED flag after perf_event_overflow() in pmu_sbi_ovf_handler()
function to avoid throttling. When the perf framework unthrottled the event
in the timer interrupt handler, it triggers riscv_pmu_start() function
and causes a WARN_ON_ONCE() warning, as shown below:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
Modules linked in:
CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
Hardware name: SiFive (DT)
epc : riscv_pmu_start+0x7c/0x8e
ra : riscv_pmu_start+0x28/0x8e
epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
t5 : 0000000700000000 t6 : ffffaf8005269870
status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
[<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
[<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
[<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
[<ffffffff800b5640>] update_process_times+0x9a/0xba
[<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
[<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
[<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
[<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
[<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
[<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
[<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36
After referring other PMU drivers like Arm, Loongarch, Csky, and Mips,
they don't call *_pmu_stop() to update with PERF_HES_STOPPED flag
after perf_event_overflow() function nor do they add PERF_HES_STOPPED
flag checking in *_pmu_start() which don't cause this warning.
Thus, it's recommended to remove this unnecessary check in
riscv_pmu_start() function to prevent this warning.
Signed-off-by: Eric Lin <eric.lin@sifive.com>
Link: https://lore.kernel.org/r/20230710154328.19574-1-eric.lin@sifive.com
Fixes: 096b52fd2b ("perf: RISC-V: throttle perf events")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8564c31587 upstream.
The ftrace-direct-too sample traces the handle_mm_fault function whose
signature changed since the introduction of the sample. Since:
commit bce617edec ("mm: do page fault accounting in handle_mm_fault")
handle_mm_fault now has 4 arguments. Therefore, the sample trampoline
should save 4 argument registers.
s390 saves all argument registers already so it does not need a change
but x86_64 needs an extra push and pop.
This also evolves the signature of the tracing function to make it
mirror the signature of the traced function.
Link: https://lkml.kernel.org/r/20230427140700.625241-2-revest@chromium.org
Cc: stable@vger.kernel.org
Fixes: bce617edec ("mm: do page fault accounting in handle_mm_fault")
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac522fc6c3 upstream.
While duplicate IDs are still very harmful, including the potential to easily
see changing devices in /dev/disk/by-id, it turn out they are extremely
common for cheap end user NVMe devices.
Relax our check for them for so that it doesn't reject the probe on
single-ported PCIe devices, but prints a big warning instead. In doubt
we'd still like to see quirk entries to disable the potential for
changing supposed stable device identifier links, but this will at least
allow users how have two (or more) of these devices to use them without
having to manually add a new PCI ID entry with the quirk through sysfs or
by patching the kernel.
Fixes: 2079f41ec6 ("nvme: check that EUI/GUID/UUID are globally unique")
Cc: stable@vger.kernel.org # 6.0+
Co-developed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6018b585e8 upstream.
Hist triggers can have referenced variables without having direct
variables fields. This can be the case if referenced variables are added
for trigger actions. In this case the newly added references will not
have field variables. Not taking such referenced variables into
consideration can result in a bug where it would be possible to remove
hist trigger with variables being refenced. This will result in a bug
that is easily reproducable like so
$ cd /sys/kernel/tracing
$ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
$ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
$ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
$ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
[ 100.263533] ==================================================================
[ 100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
[ 100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439
[ 100.266320]
[ 100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
[ 100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 100.268561] Call Trace:
[ 100.268902] <TASK>
[ 100.269189] dump_stack_lvl+0x4c/0x70
[ 100.269680] print_report+0xc5/0x600
[ 100.270165] ? resolve_var_refs+0xc7/0x180
[ 100.270697] ? kasan_complete_mode_report_info+0x80/0x1f0
[ 100.271389] ? resolve_var_refs+0xc7/0x180
[ 100.271913] kasan_report+0xbd/0x100
[ 100.272380] ? resolve_var_refs+0xc7/0x180
[ 100.272920] __asan_load8+0x71/0xa0
[ 100.273377] resolve_var_refs+0xc7/0x180
[ 100.273888] event_hist_trigger+0x749/0x860
[ 100.274505] ? kasan_save_stack+0x2a/0x50
[ 100.275024] ? kasan_set_track+0x29/0x40
[ 100.275536] ? __pfx_event_hist_trigger+0x10/0x10
[ 100.276138] ? ksys_write+0xd1/0x170
[ 100.276607] ? do_syscall_64+0x3c/0x90
[ 100.277099] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.277771] ? destroy_hist_data+0x446/0x470
[ 100.278324] ? event_hist_trigger_parse+0xa6c/0x3860
[ 100.278962] ? __pfx_event_hist_trigger_parse+0x10/0x10
[ 100.279627] ? __kasan_check_write+0x18/0x20
[ 100.280177] ? mutex_unlock+0x85/0xd0
[ 100.280660] ? __pfx_mutex_unlock+0x10/0x10
[ 100.281200] ? kfree+0x7b/0x120
[ 100.281619] ? ____kasan_slab_free+0x15d/0x1d0
[ 100.282197] ? event_trigger_write+0xac/0x100
[ 100.282764] ? __kasan_slab_free+0x16/0x20
[ 100.283293] ? __kmem_cache_free+0x153/0x2f0
[ 100.283844] ? sched_mm_cid_remote_clear+0xb1/0x250
[ 100.284550] ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
[ 100.285221] ? event_trigger_write+0xbc/0x100
[ 100.285781] ? __kasan_check_read+0x15/0x20
[ 100.286321] ? __bitmap_weight+0x66/0xa0
[ 100.286833] ? _find_next_bit+0x46/0xe0
[ 100.287334] ? task_mm_cid_work+0x37f/0x450
[ 100.287872] event_triggers_call+0x84/0x150
[ 100.288408] trace_event_buffer_commit+0x339/0x430
[ 100.289073] ? ring_buffer_event_data+0x3f/0x60
[ 100.292189] trace_event_raw_event_sys_enter+0x8b/0xe0
[ 100.295434] syscall_trace_enter.constprop.0+0x18f/0x1b0
[ 100.298653] syscall_enter_from_user_mode+0x32/0x40
[ 100.301808] do_syscall_64+0x1a/0x90
[ 100.304748] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.307775] RIP: 0033:0x7f686c75c1cb
[ 100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
[ 100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021
[ 100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb
[ 100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a
[ 100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a
[ 100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
[ 100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007
[ 100.338381] </TASK>
We hit the bug because when second hist trigger has was created
has_hist_vars() returned false because hist trigger did not have
variables. As a result of that save_hist_vars() was not called to add
the trigger to trace_array->hist_vars. Later on when we attempted to
remove the first histogram find_any_var_ref() failed to detect it is
being used because it did not find the second trigger in hist_vars list.
With this change we wait until trigger actions are created so we can take
into consideration if hist trigger has variable references. Also, now we
check the return value of save_hist_vars() and fail trigger creation if
save_hist_vars() fails.
Link: https://lore.kernel.org/linux-trace-kernel/20230712223021.636335-1-mkhalfella@purestorage.com
Cc: stable@vger.kernel.org
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a82d62f708 upstream.
This reverts commit eb26dfe8aa.
Commit eb26dfe8aa ("8250: add support for ASIX devices with a FIFO
bug") merged on Jul 13, 2012 adds a quirk for PCI_VENDOR_ID_ASIX
(0x9710). But that ID is the same as PCI_VENDOR_ID_NETMOS defined in
1f8b061050c7 ("[PATCH] Netmos parallel/serial/combo support") merged
on Mar 28, 2005. In pci_serial_quirks array, the NetMos entry always
takes precedence over the ASIX entry even since it was initially
merged, code in that commit is always unreachable.
In my tests, adding the FIFO workaround to pci_netmos_init() makes no
difference, and the vendor driver also does not have such workaround.
Given that the code was never used for over a decade, it's safe to
revert it.
Also, the real PCI_VENDOR_ID_ASIX should be 0x125b, which is used on
their newer AX99100 PCIe serial controllers released on 2016. The FIFO
workaround should not be intended for these newer controllers, and it
was never implemented in vendor driver.
Fixes: eb26dfe8aa ("8250: add support for ASIX devices with a FIFO bug")
Cc: stable <stable@kernel.org>
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230619155743.827859-1-jiaqing.zhao@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2a2ab039b upstream.
When dev_pm_opp_of_find_icc_paths() in _allocate_opp_table() returns
-EPROBE_DEFER, the opp_table is freed again, to wait until all the
interconnect paths are available.
However, if the OPP table is using required-opps then it may already
have been added to the global lazy_opp_tables list. The error path
does not remove the opp_table from the list again.
This can cause crashes later when the provider of the required-opps
is added, since we will iterate over OPP tables that have already been
freed. E.g.:
Unable to handle kernel NULL pointer dereference when read
CPU: 0 PID: 7 Comm: kworker/0:0 Not tainted 6.4.0-rc3
PC is at _of_add_opp_table_v2 (include/linux/of.h:949
drivers/opp/of.c:98 drivers/opp/of.c:344 drivers/opp/of.c:404
drivers/opp/of.c:1032) -> lazy_link_required_opp_table()
Fix this by calling _of_clear_opp_table() to remove the opp_table from
the list and clear other allocated resources. While at it, also add the
missing mutex_destroy() calls in the error path.
Cc: stable@vger.kernel.org
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Fixes: 7eba0c7641 ("opp: Allow lazy-linking of required-opps")
Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6df696cd9b upstream.
AmpereOne has an erratum in its implementation of FEAT_HAFDBS that
required disabling the feature on the design. This was done by reporting
the feature as not implemented in the ID register, although the
corresponding control bits were not actually RES0. This does not align
well with the requirements of the architecture, which mandates these
bits be RES0 if HAFDBS isn't implemented.
The kernel's use of stage-1 is unaffected, as the HA and HD bits are
only set if HAFDBS is detected in the ID register. KVM, on the other
hand, relies on the RES0 behavior at stage-2 to use the same value for
VTCR_EL2 on any cpu in the system. Mitigate the non-RES0 behavior by
leaving VTCR_EL2.HA clear on affected systems.
Cc: stable@vger.kernel.org
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Acked-by: D Scott Phillips <scott@os.amperecomputing.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609220104.1836988-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 257e6172ab upstream.
If a client sends out a cap update dropping caps with the prior 'seq'
just before an incoming cap revoke request, then the client may drop
the revoke because it believes it's already released the requested
capabilities.
This causes the MDS to wait indefinitely for the client to respond
to the revoke. It's therefore always a good idea to ack the cap
revoke request with the bumped up 'seq'.
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61782
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a282a2f105 upstream.
ceph_frame_desc::fd_lens is an int array. decode_preamble() thus
effectively casts u32 -> int but the checks for segment lengths are
written as if on unsigned values. While reading in HELLO or one of the
AUTH frames (before authentication is completed), arithmetic in
head_onwire_len() can get duped by negative ctrl_len and produce
head_len which is less than CEPH_PREAMBLE_LEN but still positive.
This would lead to a buffer overrun in prepare_read_control() as the
preamble gets copied to the newly allocated buffer of size head_len.
Cc: stable@vger.kernel.org
Fixes: cd1a677cad ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Reported-by: Thelford Williams <thelford@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4903fde804 upstream.
It is possible to hang pty devices in this case, the reader was
blocking at epoll on master side, the writer was sleeping at
wait_woken inside n_tty_write on slave side, and the write buffer
on tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.
The problem was caused by a race between reader and kworker:
n_tty_read(reader): n_tty_receive_buf_common(kworker):
copy_from_read_buf()|
|room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
|room <= 0
n_tty_kick_worker() |
|ldata->no_room = true
After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.
If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.
This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.
Cc: <stable@vger.kernel.org>
Fixes: 42458f41d0 ("n_tty: Ensure reader restarts worker for next reader")
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Hui Li <caelli@tencent.com>
Message-ID: <1680749090-14106-1-git-send-email-caelli@tencent.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 639949a703 upstream.
Since commit 79d0224f6b ("tty: serial: imx: Handle RS485 DE signal
active high") RS485 reception no longer works after a transmission.
The following scenario shows the problem:
1) Open a port in RS485 mode
2) Receive data from remote (OK)
3) Transmit data to remote (OK)
4) Receive data from remote (Nothing received)
In RS485 mode, imx_uart_start_tx() calls imx_uart_stop_rx() and, when the
transmission is complete, imx_uart_stop_tx() calls imx_uart_start_rx().
Since the above commit imx_uart_stop_rx() now sets the loopback bit but
imx_uart_start_rx() does not clear it causing the hardware to remain in
loopback mode and not receive external data.
Fix this by moving the existing loopback disable code to a helper function
and calling it from imx_uart_start_rx() too.
Fixes: 79d0224f6b ("tty: serial: imx: Handle RS485 DE signal active high")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230616104838.2729694-1-martin.fuzzey@flowbird.group
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea2c3c0855 upstream.
Per VM BOs must be marked as moved or otherwise their ranges are not
updated on use which might be necessary when the replace operation
splits mappings.
This fixes random GPU hangs when replacing sparse mappings from the
userspace, while OP_MAP/OP_UNMAP works fine because always valid BOs
are correctly handled there.
Cc: stable@vger.kernel.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72f1de49ff upstream.
[Why]
The sequence for collecting down_reply from source perspective should
be:
Request_n->repeat (get partial reply of Request_n->clear message ready
flag to ack DPRX that the message is received) till all partial
replies for Request_n are received->new Request_n+1.
Now there is chance that drm_dp_mst_hpd_irq() will fire new down
request in the tx queue when the down reply is incomplete. Source is
restricted to generate interveleaved message transactions so we should
avoid it.
Also, while assembling partial reply packets, reading out DPCD DOWN_REP
Sideband MSG buffer + clearing DOWN_REP_MSG_RDY flag should be
wrapped up as a complete operation for reading out a reply packet.
Kicking off a new request before clearing DOWN_REP_MSG_RDY flag might
be risky. e.g. If the reply of the new request has overwritten the
DPRX DOWN_REP Sideband MSG buffer before source writing one to clear
DOWN_REP_MSG_RDY flag, source then unintentionally flushes the reply
for the new request. Should handle the up request in the same way.
[How]
Separete drm_dp_mst_hpd_irq() into 2 steps. After acking the MST IRQ
event, driver calls drm_dp_mst_hpd_irq_send_new_request() and might
trigger drm_dp_mst_kick_tx() only when there is no on going message
transaction.
Changes since v1:
* Reworked on review comments received
-> Adjust the fix to let driver explicitly kick off new down request
when mst irq event is handled and acked
-> Adjust the commit message
Changes since v2:
* Adjust the commit message
* Adjust the naming of the divided 2 functions and add a new input
parameter "ack".
* Adjust code flow as per review comments.
Changes since v3:
* Update the function description of drm_dp_mst_hpd_irq_handle_event
Changes since v4:
* Change ack of drm_dp_mst_hpd_irq_handle_event() to be an array align
the size of esi[]
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bdba9d4a3 upstream.
If we disable vblank when entering self-refresh, vblank APIs (like
DRM_IOCTL_WAIT_VBLANK) no longer work. But user space is not aware when
we enter self-refresh, so this appears to be an API violation -- that
DRM_IOCTL_WAIT_VBLANK fails with EINVAL whenever the display is idle and
enters self-refresh.
The downstream driver used by many of these systems never used to
disable vblank for PSR, and in fact, even upstream, we didn't do that
until radically redesigning the state machine in commit 6c836d965b
("drm/rockchip: Use the helpers for PSR").
Thus, it seems like a reasonable API fix to simply restore that
behavior, and leave vblank enabled.
Note that this appears to potentially unbalance the
drm_crtc_vblank_{off,on}() calls in some cases, but:
(a) drm_crtc_vblank_on() documents this as OK and
(b) if I do the naive balancing, I find state machine issues such that
we're not in sync properly; so it's easier to take advantage of (a).
This issue was exposed by IGT's kms_vblank tests, and reported by
KernelCI. The bug has been around a while (longer than KernelCI
noticed), but was only exposed once self-refresh was bugfixed more
recently, and so KernelCI could properly test it. Some other notes in:
https://lore.kernel.org/dri-devel/Y6OCg9BPnJvimQLT@google.com/
Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin
== Backporting notes: ==
Marking as 'Fixes' commit 6c836d965b ("drm/rockchip: Use the helpers
for PSR"), but it probably depends on commit bed030a49f
("drm/rockchip: Don't fully disable vop on self refresh") as well.
We also need the previous patch ("drm/atomic: Allow vblank-enabled +
self-refresh "disable""), of course.
v3:
* no update
v2:
* skip unnecessary lock/unlock
Fixes: 6c836d965b ("drm/rockchip: Use the helpers for PSR")
Cc: <stable@vger.kernel.org>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Link: https://lore.kernel.org/dri-devel/Y5itf0+yNIQa6fU4@sirena.org.uk/
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230109171809.v3.2.Ic07cba4ab9a7bd3618a9e4258b8f92ea7d10ae5a@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d0e3cac35 upstream.
The self-refresh helper framework overloads "disable" to sometimes mean
"go into self-refresh mode," and this mode activates automatically
(e.g., after some period of unchanging display output). In such cases,
the display pipe is still considered "on", and user-space is not aware
that we went into self-refresh mode. Thus, users may expect that
vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
properly.
However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
vblank enabled.
Add a different expectation: that CRTCs *should* leave vblank enabled
when going into self-refresh.
This patch is preparation for another patch -- "drm/rockchip: vop: Leave
vblank enabled in self-refresh" -- which resolves conflicts between the
above self-refresh behavior and the API tests in IGT's kms_vblank test
module.
== Some alternatives discussed: ==
It's likely that on many display controllers, vblank interrupts will
turn off when the CRTC is disabled, and so in some cases, self-refresh
may not support vblank. To support such cases, we might consider
additions to the generic helpers such that we fire vblank events based
on a timer.
However, there is currently only one driver using the common
self-refresh helpers (i.e., rockchip), and at least as of commit
bed030a49f ("drm/rockchip: Don't fully disable vop on self refresh"),
the CRTC hardware is powered enough to continue to generate vblank
interrupts.
So we chose the simpler option of leaving vblank interrupts enabled. We
can reevaluate this decision and perhaps augment the helpers if/when we
gain a second driver that has different requirements.
v3:
* include discussion summary
v2:
* add 'ret != 0' warning case for self-refresh
* describe failing test case and relation to drm/rockchip patch better
Cc: <stable@vger.kernel.org> # dependency for "drm/rockchip: vop: Leave
# vblank enabled in self-refresh"
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230109171809.v3.1.I3904f697863649eb1be540ecca147a66e42bfad7@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97f975823f upstream.
Smatch detected a double free path because lpfc_nlp_not_used() releases an
ndlp object before reaching lpfc_nlp_put() at the end of
lpfc_cmpl_els_logo_acc().
Remove the outdated lpfc_nlp_not_used() routine. In
lpfc_mbx_cmpl_ns_reg_login(), replace the call with lpfc_nlp_put(). In
lpfc_cmpl_els_logo_acc(), replace the call with lpfc_unreg_rpi() and keep
the lpfc_nlp_put() at the end of the routine. If ndlp's rpi was
registered, then lpfc_unreg_rpi()'s completion routine performs the final
ndlp clean up after lpfc_nlp_put() is called from lpfc_cmpl_els_logo_acc().
Otherwise if ndlp has no rpi registered, the lpfc_nlp_put() at the end of
lpfc_cmpl_els_logo_acc() is the final ndlp clean up.
Fixes: 4430f7fd09 ("scsi: lpfc: Rework locations of ndlp reference taking")
Cc: <stable@vger.kernel.org> # v5.11+
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/Y3OefhyyJNKH%2Fiaf@kili/
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f68bb23cad upstream.
This patch sets the process_dlm_messages_pending boolean to false when
there was no message to process. It is a case which should not happen
but if we are prepared to recover from this situation by setting pending
boolean to false.
Cc: stable@vger.kernel.org
Fixes: dbb751ffab ("fs: dlm: parallelize lowcomms socket handling")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a931477bf upstream.
This patch clears the DLM_IFL_CB_PENDING_BIT flag which will be set when
there is callback work queued when there was no callback to dequeue. It
is a buggy case and should never happen, that's why there is a
WARN_ON(). However if the case happens we are prepared to somehow
recover from it.
Cc: stable@vger.kernel.org
Fixes: 61bed0baa4 ("fs: dlm: use a non-static queue for callbacks")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57e2c2f2d9 upstream.
When a waiting plock request (F_SETLKW) is sent to userspace
for processing (dlm_controld), the result is returned at a
later time. That result could be incorrectly matched to a
different waiting request in cases where the owner field is
the same (e.g. different threads in a process.) This is fixed
by comparing all the properties in the request and reply.
The results for non-waiting plock requests are now matched
based on list order because the results are returned in the
same order they were sent.
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f2b1cb89c upstream.
While a non-waiting posix lock request (F_SETLK) is waiting for
user space processing (in dlm_controld), wait for that processing
to complete with an unkillable wait_event(). This makes F_SETLK
behave the same way for F_RDLCK, F_WRLCK and F_UNLCK. F_SETLKW
continues to use wait_event_killable().
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59e45c758c upstream.
If a posix lock request is waiting for a result from user space
(dlm_controld), do not let it be interrupted unless the process
is killed. This reverts commit a6b1533e9a ("dlm: make posix locks
interruptible"). The problem with the interruptible change is
that all locks were cleared on any signal interrupt. If a signal
was received that did not terminate the process, the process
could continue running after all its dlm posix locks had been
cleared. A future patch will add cancelation to allow proper
interruption.
Cc: stable@vger.kernel.org
Fixes: a6b1533e9a ("dlm: make posix locks interruptible")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c847f4e203 upstream.
Immediately clean up a posix lock request if it is interrupted
while waiting for a result from user space (dlm_controld.) This
largely reverts the recent commit b92a4e3f86 ("fs: dlm: change
posix lock sigint handling"). That previous commit attempted
to defer lock cleanup to the point in time when a result from
user space arrived. The deferred approach was not reliable
because some dlm plock ops may not receive replies.
Cc: stable@vger.kernel.org
Fixes: b92a4e3f86 ("fs: dlm: change posix lock sigint handling")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92655fbda5 upstream.
The GETLK pid values have all been negated since commit 9d5b86ac13
("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks").
Revert this for local pids, and leave in place negative pids for remote
owners.
Cc: stable@vger.kernel.org
Fixes: 9d5b86ac13 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e836007089 upstream.
We've found that using raid0 with the 'original' layout and discard
enabled with different disk sizes (such that at least two zones are
created) can result in data corruption. This is due to the fact that
the discard handling in 'raid0_handle_discard()' assumes the 'alternate'
layout. We've seen this corruption using ext4 but other filesystems are
likely susceptible as well.
More specifically, while multiple zones are necessary to create the
corruption, the corruption may not occur with multiple zones if they
layout in such a way the layout matches what the 'alternate' layout
would have produced. Thus, not all raid0 devices with the 'original'
layout, different size disks and discard enabled will encounter this
corruption.
The 3.14 kernel inadvertently changed the raid0 disk layout for different
size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the
same raid0 array could corrupt data. This lead to the creation of the
'original' layout (to match the pre-3.14 layout) and the 'alternate' layout
(to match the post 3.14 layout) in the 5.4 kernel time frame and an option
to tell the kernel which layout to use (since it couldn't be autodetected).
However, when the 'original' layout was added back to 5.4 discard support
for the 'original' layout was not added leading this issue.
I've been able to reliably reproduce the corruption with the following
test case:
1. create raid0 array with different size disks using original layout
2. mkfs
3. mount -o discard
4. create lots of files
5. remove 1/2 the files
6. fstrim -a (or just the mount point for the raid0 array)
7. umount
8. fsck -fn /dev/md0 (spews all sorts of corruptions)
Let's fix this by adding proper discard support to the 'original' layout.
The fix 'maps' the 'original' layout disks to the order in which they are
read/written such that we can compare the disks in the same way that the
current 'alternate' layout does. A 'disk_shift' field is added to
'struct strip_zone'. This could be computed on the fly in
raid0_handle_discard() but by adding this field, we save some computation
in the discard path.
Note we could also potentially fix this by re-ordering the disks in the
zones that follow the first one, and then always read/writing them using
the 'alternate' layout. However, that is seen as a more substantial change,
and we are attempting the least invasive fix at this time to remedy the
corruption.
I've verified the change using the reproducer mentioned above. Typically,
the corruption is seen after less than 3 iterations, while the patch has
run 500+ iterations.
Cc: NeilBrown <neilb@suse.de>
Cc: Song Liu <song@kernel.org>
Fixes: c84a1372df ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230623180523.1901230-1-jbaron@akamai.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb620ae73b upstream.
The irq_raised completion used to detect the end of a test case is
initialized when the test device is probed, but never reinitialized again
before a test case. As a result, the irq_raised completion synchronization
is effective only for the first ioctl test case executed. Any subsequent
call to wait_for_completion() by another ioctl() call will immediately
return, potentially too early, leading to false positive failures.
Fix this by reinitializing the irq_raised completion before starting a new
ioctl() test command.
Link: https://lore.kernel.org/r/20230415023542.77601-16-dlemoal@kernel.org
Fixes: 2c156ac71c ("misc: Add host side PCI driver for PCI test function device")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dd3c7c4c8 upstream.
The RK3399 PCIe controller should wait until the PHY PLLs are locked.
Add poll and timeout to wait for PHY PLLs to be locked. If they cannot
be locked generate error message and jump to error handler. Accessing
registers in the PHY clock domain when PLLs are not locked causes hang
The PHY PLLs status is checked through a side channel register.
This is documented in the TRM section 17.5.8.1 "PCIe Initialization
Sequence".
Link: https://lore.kernel.org/r/20230418074700.1083505-5-rick.wertenbroek@gmail.com
Fixes: cf590b0783 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 933f31a2fe upstream.
pci_epf_test_data_transfer() and pci_epf_test_dma_callback() are not
handling DMA transfer completion correctly, leading to completion
notifications to the RC side that are too early. This problem can be
detected when the RC side is running an IOMMU with messages such as:
pci-endpoint-test 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xfff00000 flags=0x0000]
When running the pcitest.sh tests: the address used for a previous
test transfer generates the above error while the next test transfer is
running.
Fix this by testing the DMA transfer status in pci_epf_test_dma_callback()
and notifying the completion only when the transfer status is DMA_COMPLETE
or DMA_ERROR. Furthermore, in pci_epf_test_data_transfer(), be paranoid and
check again the transfer status and always call dmaengine_terminate_sync()
before returning.
Link: https://lore.kernel.org/r/20230415023542.77601-5-dlemoal@kernel.org
Fixes: 8353813c88 ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e54223275b upstream.
When contiguous windows are coalesced by pci_register_host_bridge(), the
second resource is expanded to include the first, and the first is
invalidated and consequently not added to the bus. However, it remains in
the resource hierarchy. For example, these windows:
fec00000-fec7ffff : PCI Bus 0000:00
fec80000-fecbffff : PCI Bus 0000:00
are coalesced into this, where the first resource remains in the tree with
start/end zeroed out:
00000000-00000000 : PCI Bus 0000:00
fec00000-fecbffff : PCI Bus 0000:00
In some cases (e.g. the Xen scratch region), this causes future calls to
allocate_resource() to choose an inappropriate location which the caller
cannot handle.
Fix by releasing the zeroed-out resource and removing it from the resource
hierarchy.
[bhelgaas: commit log]
Fixes: 7c3855c423 ("PCI: Coalesce host bridge contiguous apertures")
Link: https://lore.kernel.org/r/20230525153248.712779-1-ross.lagerwall@citrix.com
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af40322e90 upstream.
All kind of administrative requests should not been retried. Some card
firmware detects this and assumes a replay attack. This patch checks
on failure if the low level functions indicate a retry (EAGAIN) and
checks for the ADMIN flag set on the request message. If this both
are true, the response code for this message is changed to EIO to make
sure the zcrypt API layer does not attempt to retry the request. As of
now the ADMIN flag is set for a request message when
- for EP11 the field 'flags' of the EP11 CPRB struct has the leftmost
bit set.
- for CCA when the CPRB minor version is 'T3', 'T5', 'T6' or 'T7'.
Please note that the do-not-retry only applies to a request
which has been sent to the card (= has been successfully enqueued) but
the reply indicates some kind of failure and by default it would be
replied. It is totally fine to retry a request if a previous attempt
to enqueue the msg into the firmware queue had some kind of failure
and thus the card has never seen this request.
Reported-by: Frank Uhlig <Frank.Uhlig1@ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d50eb4725 upstream.
It was reported that dm-integrity runs out of vmalloc space on 32-bit
architectures. On x86, there is only 128MiB vmalloc space and dm-integrity
consumes it quickly because it has a 64MiB journal and 8MiB recalculate
buffer.
Fix this by reducing the size of the journal to 4MiB and the size of
the recalculate buffer to 1MiB, so that multiple dm-integrity devices
can be created and activated on 32-bit architectures.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d744ae7477 upstream.
Fix the timeout that is used for the initialisation and for the self
test. wait_for_completion_timeout expects a timeout in jiffies, but
RNGC_TIMEOUT is in milliseconds. Call msecs_to_jiffies to do the
conversion.
Cc: stable@vger.kernel.org
Fixes: 1d5449445b ("hwrng: mx-rngc - add a driver for Freescale RNGC")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11509910c5 upstream.
In jfs_dmap.c at line 381, BLKTODMAP is used to get a logical block
number inside dbFree(). db_l2nbperpage, which is the log2 number of
blocks per page, is passed as an argument to BLKTODMAP which uses it
for shifting.
Syzbot reported a shift out-of-bounds crash because db_l2nbperpage is
too big. This happens because the large value is set without any
validation in dbMount() at line 181.
Thus, make sure that db_l2nbperpage is correct while mounting.
Max number of blocks per page = Page size / Min block size
=> log2(Max num_block per page) = log2(Page size / Min block size)
= log2(Page size) - log2(Min block size)
=> Max db_l2nbperpage = L2PSIZE - L2MINBLOCKSIZE
Reported-and-tested-by: syzbot+d2cd27dcf8e04b232eb2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=2a70a453331db32ed491f5cbb07e81bf2d225715
Cc: stable@vger.kernel.org
Suggested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fcced95b6b upstream.
PAGE_ALIGN(x) macro gives the next highest value which is multiple of
pagesize. But if x is already page aligned then it simply returns x.
So, if x passed is 0 in dax_zero_range() function, that means the
length gets passed as 0 to ->iomap_begin().
In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on
here in ext2_get_blocks().
BUG_ON(maxblocks == 0);
Instead we should be calling dax_truncate_page() here which takes
care of it. i.e. it only calls dax_zero_range if the offset is not
page/block aligned.
This can be easily triggered with following on fsdax mounted pmem
device.
dd if=/dev/zero of=file count=1 bs=512
truncate -s 0 file
[79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff)
[93.793207] ------------[ cut here ]------------
[93.795102] kernel BUG at fs/ext2/inode.c:637!
[93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 #139
[93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610
<...>
[93.835298] Call Trace:
[93.836253] <TASK>
[93.837103] ? lock_acquire+0xf8/0x110
[93.838479] ? d_lookup+0x69/0xd0
[93.839779] ext2_iomap_begin+0xa7/0x1c0
[93.841154] iomap_iter+0xc7/0x150
[93.842425] dax_zero_range+0x6e/0xa0
[93.843813] ext2_setsize+0x176/0x1b0
[93.845164] ext2_setattr+0x151/0x200
[93.846467] notify_change+0x341/0x4e0
[93.847805] ? lock_acquire+0xf8/0x110
[93.849143] ? do_truncate+0x74/0xe0
[93.850452] ? do_truncate+0x84/0xe0
[93.851739] do_truncate+0x84/0xe0
[93.852974] do_sys_ftruncate+0x2b4/0x2f0
[93.854404] do_syscall_64+0x3f/0x90
[93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc
CC: stable@vger.kernel.org
Fixes: 2aa3048e03 ("iomap: switch iomap_zero_range to use iomap_iter")
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bcb8898913 upstream.
Commit ebeb20a9cd ("soc: qcom: mdt_loader: Always invoke PAS
mem_setup") dropped the relocate check and made pas_mem_setup run
unconditionally. The code was later moved with commit f4e526ff7e
("soc: qcom: mdt_loader: Extract PAS operations") to
qcom_mdt_pas_init() effectively losing track of what was actually
done.
The assumption that PAS mem_setup can be done anytime was effectively
wrong, with no good reason and this caused regression on some SoC
that use remoteproc to bringup ath11k. One example is IPQ8074 SoC that
effectively broke resulting in remoteproc silently die and ath11k not
working.
On this SoC FW relocate is not enabled and PAS mem_setup was correctly
skipped in previous kernel version resulting in correct bringup and
function of remoteproc and ath11k.
To fix the regression, reintroduce the relocate check in
qcom_mdt_pas_init() and correctly skip PAS mem_setup where relocate is
not enabled.
Fixes: ebeb20a9cd ("soc: qcom: mdt_loader: Always invoke PAS mem_setup")
Tested-by: Robert Marko <robimarko@gmail.com>
Co-developed-by: Robert Marko <robimarko@gmail.com>
Signed-off-by: Robert Marko <robimarko@gmail.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230526115511.3328-1-ansuelsmth@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c26bd4384 upstream.
If mas_store_gfp() in the gather loop failed, the 'error' variable that
ultimately gets returned was not being set. In many cases, its original
value of -ENOMEM was still in place, and that was fine. But if VMAs had
been split at the start or end of the range, then 'error' could be zero.
Change to the 'error = foo(); if (error) goto …' idiom to fix the bug.
Also clean up a later case which avoided the same bug by *explicitly*
setting error = -ENOMEM right before calling the function that might
return -ENOMEM.
In a final cosmetic change, move the 'Point of no return' comment to
*after* the goto. That's been in the wrong place since the preallocation
was removed, and this new error path was added.
Fixes: 606c812eb1 ("mm/mmap: Fix error path in do_vmi_align_munmap()")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6b6d6dcc7 upstream.
This patch reverts commit 2c3fa6ae4d ("dlm: check required context
while close"). The function dlm_midcomms_close(), which will call later
dlm_lowcomms_close(), is called when the cluster manager tells the node
got fenced which means on midcomms/lowcomms layer to disconnect the node
from the cluster communication. The node can rejoin the cluster later.
This patch was ensuring no new message were able to be triggered when we
are in the close() function context. This was done by checking if the
lockspace has been stopped. However there is a missing check that we
only need to check specific lockspaces where the fenced node is member
of. This is currently complicated because there is no way to easily
check if a node is part of a specific lockspace without stopping the
recovery. For now we just revert this commit as it is just a check to
finding possible leaks of stopping lockspaces before close() is called.
Cc: stable@vger.kernel.org
Fixes: 2c3fa6ae4d ("dlm: check required context while close")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de25d6e961 upstream.
In our fault injection test, we create an ext4 file, migrate it to
non-extent based file, then punch a hole and finally trigger a WARN_ON
in the ext4_da_update_reserve_space():
EXT4-fs warning (device sda): ext4_da_update_reserve_space:369:
ino 14, used 11 with only 10 reserved data blocks
When writing back a non-extent based file, if we enable delalloc, the
number of reserved blocks will be subtracted from the number of blocks
mapped by ext4_ind_map_blocks(), and the extent status tree will be
updated. We update the extent status tree by first removing the old
extent_status and then inserting the new extent_status. If the block range
we remove happens to be in an extent, then we need to allocate another
extent_status with ext4_es_alloc_extent().
use old to remove to add new
|----------|------------|------------|
old extent_status
The problem is that the allocation of a new extent_status failed due to a
fault injection, and __es_shrink() did not get free memory, resulting in
a return of -ENOMEM. Then do_writepages() retries after receiving -ENOMEM,
we map to the same extent again, and the number of reserved blocks is again
subtracted from the number of blocks in that extent. Since the blocks in
the same extent are subtracted twice, we end up triggering WARN_ON at
ext4_da_update_reserve_space() because used > ei->i_reserved_data_blocks.
For non-extent based file, we update the number of reserved blocks after
ext4_ind_map_blocks() is executed, which causes a problem that when we call
ext4_ind_map_blocks() to create a block, it doesn't always create a block,
but we always reduce the number of reserved blocks. So we move the logic
for updating reserved blocks to ext4_ind_map_blocks() to ensure that the
number of reserved blocks is updated only after we do succeed in allocating
some new blocks.
Fixes: 5f634d064c ("ext4: Fix quota accounting error with fallocate")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ef6c32a91 upstream.
This was noticed by a user who noticied that the mtime of a file
backing a loopback device was getting bumped when the loopback device
is mounted read/only. Note: This doesn't show up when doing a
loopback mount of a file directly, via "mount -o ro /tmp/foo.img
/mnt", since the loop device is set read-only when mount automatically
creates loop device. However, this is noticeable for a LUKS loop
device like this:
% cryptsetup luksOpen /tmp/foo.img test
% mount -o ro /dev/loop0 /mnt ; umount /mnt
or, if LUKS is not in use, if the user manually creates the loop
device like this:
% losetup /dev/loop0 /tmp/foo.img
% mount -o ro /dev/loop0 /mnt ; umount /mnt
The modified mtime causes rsync to do a rolling checksum scan of the
file on the local and remote side, incrementally increasing the time
to rsync the not-modified-but-touched image file.
Fixes: eee00237fa ("ext4: commit super block if fs record error when journal record without error")
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/ZIauBR7YiV3rVAHL@glitch
Reported-by: Sean Greenslade <sean@seangreenslade.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26fb529024 upstream.
Following process makes ext4 load stale buffer heads from last failed
mounting in a new mounting operation:
mount_bdev
ext4_fill_super
| ext4_load_and_init_journal
| ext4_load_journal
| jbd2_journal_load
| load_superblock
| journal_get_superblock
| set_buffer_verified(bh) // buffer head is verified
| jbd2_journal_recover // failed caused by EIO
| goto failed_mount3a // skip 'sb->s_root' initialization
deactivate_locked_super
kill_block_super
generic_shutdown_super
if (sb->s_root)
// false, skip ext4_put_super->invalidate_bdev->
// invalidate_mapping_pages->mapping_evict_folio->
// filemap_release_folio->try_to_free_buffers, which
// cannot drop buffer head.
blkdev_put
blkdev_put_whole
if (atomic_dec_and_test(&bdev->bd_openers))
// false, systemd-udev happens to open the device. Then
// blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
// truncate_inode_folio->truncate_cleanup_folio->
// folio_invalidate->block_invalidate_folio->
// filemap_release_folio->try_to_free_buffers will be skipped,
// dropping buffer head is missed again.
Second mount:
ext4_fill_super
ext4_load_and_init_journal
ext4_load_journal
ext4_get_journal
jbd2_journal_init_inode
journal_init_common
bh = getblk_unmovable
bh = __find_get_block // Found stale bh in last failed mounting
journal->j_sb_buffer = bh
jbd2_journal_load
load_superblock
journal_get_superblock
if (buffer_verified(bh))
// true, skip journal->j_format_version = 2, value is 0
jbd2_journal_recover
do_one_pass
next_log_block += count_tags(journal, bh)
// According to journal_tag_bytes(), 'tag_bytes' calculating is
// affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
// returns false because 'j->j_format_version >= 2' is not true,
// then we get wrong next_log_block. The do_one_pass may exit
// early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.
The filesystem is corrupted here, journal is partially replayed, and
new journal sequence number actually is already used by last mounting.
The invalidate_bdev() can drop all buffer heads even racing with bare
reading block device(eg. systemd-udev), so we can fix it by invalidating
bdev in error handling path in __ext4_fill_super().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
Fixes: 25ed6e8a54 ("jbd2: enable journal clients to enable v2 checksumming")
Cc: stable@vger.kernel.org # v3.5
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-2-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 531b3d1195 upstream.
After commit 0e96ea5c3e ("MIPS: Loongson64: Clean up use of
cc-ifversion") we get a build error when make modules_install:
cc1: error: '-mloongson-mmi' must be used with '-mhard-float'
The reason is when make modules_install, 'call cc-option' doesn't work
in $(KBUILD_CFLAGS) of 'CHECKFLAGS'. Then there is no -mno-loongson-mmi
applied and -march=loongson3a enable MMI instructions.
To be detail, the error message comes from the CHECKFLAGS invocation of
$(CC) but it has no impact on the final result of make modules_install,
it is purely a cosmetic issue. The error occurs because cc-option is
defined in scripts/Makefile.compiler, which is not included in Makefile
when running 'make modules_install', as install targets are not supposed
to require the compiler; see commit 805b2e1d42 ("kbuild: include
Makefile.compiler only when compiler is needed"). As a result, the call
to check for '-mno-loongson-mmi' just never happens.
Fix this by partially reverting to the old logic, use 'call cc-option'
to conditionally apply -march=loongson3a and -march=mips64r2.
By the way, Loongson-2E/2F is also broken in commit 13ceb48bc1
("MIPS: Loongson2ef: Remove unnecessary {as,cc}-option calls") so fix it
together.
Fixes: 13ceb48bc1 ("MIPS: Loongson2ef: Remove unnecessary {as,cc}-option calls")
Fixes: 0e96ea5c3e ("MIPS: Loongson64: Clean up use of cc-ifversion")
Cc: stable@vger.kernel.org
Cc: Feiyang Chen <chenfeiyang@loongson.cn>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65fee014dc upstream.
Commit 7db5e9e9e5 ("MIPS: loongson64: fix FTLB configuration")
move decode_configs() from the beginning of cpu_probe_loongson() to the
end in order to fix FTLB configuration. However, it breaks the CPUCFG
decoding because decode_configs() use "c->options = xxxx" rather than
"c->options |= xxxx", all information get from CPUCFG by decode_cpucfg()
is lost.
This causes error when creating a KVM guest on Loongson-3A4000:
Exception Code: 4 not handled @ PC: 0000000087ad5981, inst: 0xcb7a1898 BadVaddr: 0x0 Status: 0x0
Fix this by moving the c->cputype setting to the beginning and moving
decode_configs() after that.
Fixes: 7db5e9e9e5 ("MIPS: loongson64: fix FTLB configuration")
Cc: stable@vger.kernel.org
Cc: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5487a7b606 upstream.
Some CPU feature macros were using current_cpu_type to mark feature
availability.
However current_cpu_type will use smp_processor_id, which is prohibited
under preemptable context.
Since those features are all uniform on all CPUs in a SMP system, use
boot_cpu_type instead of current_cpu_type to fix preemptable kernel.
Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af22d6a869 upstream.
Currently, it is possible for us to access memory that we shouldn't.
Since, we acquire (possibly dangling) pointers to dirty rectangles
before doing a bounds check to make sure we can actually accommodate the
number of dirty rectangles userspace has requested to fill. This issue
is especially evident if a compositor requests both MPO and damage clips
at the same time, in which case I have observed a soft-hang. So, to
avoid this issue, perform the bounds check before filling a single dirty
rectangle and WARN() about it, if it is ever attempted in
fill_dc_dirty_rect().
Cc: stable@vger.kernel.org # 6.1+
Fixes: 30ebe41582 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5bcedc5931 upstream.
Nageswara reported that /proc/self/status was showing "vulnerable" for
the Speculation_Store_Bypass feature on Power10, eg:
$ grep Speculation_Store_Bypass: /proc/self/status
Speculation_Store_Bypass: vulnerable
But at the same time the sysfs files, and lscpu, were showing "Not
affected".
This turns out to simply be a bug in the reporting of the
Speculation_Store_Bypass, aka. PR_SPEC_STORE_BYPASS, case.
When SEC_FTR_STF_BARRIER was added, so that firmware could communicate
the vulnerability was not present, the code in ssb_prctl_get() was not
updated to check the new flag.
So add the check for SEC_FTR_STF_BARRIER being disabled. Rather than
adding the new check to the existing if block and expanding the comment
to cover both cases, rewrite the three cases to be separate so they can
be commented separately for clarity.
Fixes: 84ed26fd00 ("powerpc/security: Add a security feature for STF barrier")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230517074945.53188-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b4e32df3e upstream.
A process can spawn a PD on DSP with some attributes that can be
associated with the PD during spawn and run. The invocation
corresponding to the create request with attributes has total
4 buffers at the DSP side implementation. If this number is not
correct, the invocation is expected to fail on DSP. Added change
to use correct number of buffer count for creating fastrpc scalar.
Fixes: d73f71c7c6 ("misc: fastrpc: Add support for create remote init process")
Cc: stable <stable@kernel.org>
Tested-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Message-ID: <1686743685-21715-1-git-send-email-quic_ekangupt@quicinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb6e04a173 upstream.
gcc-13 warns about function definitions for builtin interfaces that have a
different prototype, e.g.:
In file included from kasan_test.c:31:
kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
574 | void __asan_register_globals(struct kasan_global *globals, size_t size);
kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
577 | void __asan_alloca_poison(unsigned long addr, size_t size);
kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
580 | void __asan_load1(unsigned long addr);
kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
581 | void __asan_store1(unsigned long addr);
kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void *, unsigned char, long int)' [-Werror=builtin-declaration-mismatch]
643 | void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size);
The two problems are:
- Addresses are passes as 'unsigned long' in the kernel, but gcc-13
expects a 'void *'.
- sizes meant to use a signed ssize_t rather than size_t.
Change all the prototypes to match these. Using 'void *' consistently for
addresses gets rid of a couple of type casts, so push that down to the
leaf functions where possible.
This now passes all randconfig builds on arm, arm64 and x86, but I have
not tested it on the other architectures that support kasan, since they
tend to fail randconfig builds in other ways. This might fail if any of
the 32-bit architectures expect a 'long' instead of 'int' for the size
argument.
The __asan_allocas_unpoison() function prototype is somewhat weird, since
it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'. This
looks like it is meant to be 'addr' and 'size' like the others, but the
implementation clearly treats them as 'top' and 'bottom'.
Link: https://lkml.kernel.org/r/20230509145735.9263-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc0649395d upstream.
Fix an issue where the kernel would stall during netboot, showing the
"sched: RT throttling activated" message. This stall was triggered by
the behavior of the mii_interrupt bit (Bit 7 - DP83TD510E_STS_MII_INT)
in the DP83TD510E's PHY_STS Register (Address = 0x10). The DP83TD510E
datasheet (2020) states that the bit clears on write, however, in
practice, the bit clears on read.
This discrepancy had significant implications on the driver's interrupt
handling. The PHY_STS Register was used by handle_interrupt() to check
for pending interrupts and by read_status() to get the current link
status. The call to read_status() was unintentionally clearing the
mii_interrupt status bit without deasserting the IRQ pin, causing
handle_interrupt() to miss other pending interrupts. This issue was most
apparent during netboot.
The fix refrains from using the PHY_STS Register for interrupt handling.
Instead, we now solely rely on the INTERRUPT_REG_1 Register (Address =
0x12) and INTERRUPT_REG_2 Register (Address = 0x13) for this purpose.
These registers directly influence the IRQ pin state and are latched
high until read.
Note: The INTERRUPT_REG_2 Register (Address = 0x13) exists and can also
be used for interrupt handling, specifically for "Aneg page received
interrupt" and "Polarity change interrupt". However, these features are
currently not supported by this driver.
Fixes: 165cd04fe2 ("net: phy: dp83td510: Add support for the DP83TD510 Ethernet PHY")
Cc: <stable@vger.kernel.org>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230621043848.3806124-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 481c2d1462 upstream.
After activation of interrupts for TPM TIS drivers 0-day reports an
interrupt storm on an Inspur NF5180M6 server.
Fix this by detecting the storm and falling back to polling:
Count the number of unhandled interrupts within a 10 ms time interval. In
case that more than 1000 were unhandled deactivate interrupts entirely,
deregister the handler and use polling instead.
Also print a note to point to the tpm_tis_dmi_table.
Since the interrupt deregistration function devm_free_irq() waits for all
interrupt handlers to finish, only trigger a worker in the interrupt
handler and do the unregistration in the worker to avoid a deadlock.
Note: the storm detection logic equals the implementation in
note_interrupt() which uses timestamps and counters stored in struct
irq_desc. Since this structure is private to the generic interrupt core
the TPM TIS core uses its own timestamps and counters. Furthermore the TPM
interrupt handler always returns IRQ_HANDLED to prevent the generic
interrupt core from processing the interrupt storm.
Cc: stable@vger.kernel.org # v6.4+
Fixes: e644b2f498 ("tpm, tpm_tis: Enable interrupt test")
Reported-by: kernel test robot <yujie.liu@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202305041325.ae8b0c43-yujie.liu@intel.com/
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f4032d615f upstream.
/dev/vtpmx is made visible before 'workqueue' is initialized, which can
lead to a memory corruption in the worst case scenario.
Address this by initializing 'workqueue' as the very first step of the
driver initialization.
Cc: stable@vger.kernel.org
Fixes: 6f99612e25 ("tpm: Proxy driver for supporting multiple emulated TPMs")
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@tuni.fi>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1c1b98962 upstream.
For Pluton TPM devices, it was assumed that there was no ACPI memory
regions. This is not true for ASUS ROG Ally. ACPI advertises
0xfd500000-0xfd5fffff.
Since remapping is already done in `crb_map_pluton`, remapping again
in `crb_map_io` causes EBUSY error:
[ 3.510453] tpm_crb MSFT0101:00: can't request region for resource [mem 0xfd500000-0xfd5fffff]
[ 3.510463] tpm_crb: probe of MSFT0101:00 failed with error -16
Cc: stable@vger.kernel.org # v6.3+
Fixes: 4d27328827 ("tpm_crb: Add support for CRB devices based on Pluton")
Signed-off-by: Valentin David <valentin.david@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65f6c7c91c upstream.
commit 4e5a04be88 ("pinctrl: amd: disable and mask interrupts on probe")
was well intentioned to mask a firmware issue on a surface laptop, but it
has a few problems:
1. It had a bug in the loop handling for iteration 63 that lead to other
problems with GPIO0 handling.
2. It disables interrupts that are used internally by the SOC but masked
by default.
3. It masked a real firmware problem in some chromebooks that should have
been caught during development but wasn't.
There has been a lot of other development around s2idle; particularly
around handling of the spurious wakeups. If there is still a problem on
the original reported surface laptop it should be avoided by adding a quirk
to gpiolib-acpi for that system instead.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230421120625.3366-5-mario.limonciello@amd.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 968ab92616 upstream.
commit 4e5a04be88 ("pinctrl: amd: disable and mask interrupts on probe")
had a mistake in loop iteration 63 that it would clear offset 0xFC instead
of 0x100. Offset 0xFC is actually `WAKE_INT_MASTER_REG`. This was
clearing bits 13 and 15 from the register which significantly changed the
expected handling for some platforms for GPIO0.
commit b26cd9325b ("pinctrl: amd: Disable and mask interrupts on resume")
actually fixed this bug, but lead to regressions on Lenovo Z13 and some
other systems. This is because there was no handling in the driver for bit
15 debounce behavior.
Quoting a public BKDG:
```
EnWinBlueBtn. Read-write. Reset: 0. 0=GPIO0 detect debounced power button;
Power button override is 4 seconds. 1=GPIO0 detect debounced power button
in S3/S5/S0i3, and detect "pressed less than 2 seconds" and "pressed 2~10
seconds" in S0; Power button override is 10 seconds
```
Cross referencing the same master register in Windows it's obvious that
Windows doesn't use debounce values in this configuration. So align the
Linux driver to do this as well. This fixes wake on lid when
WAKE_INT_MASTER_REG is properly programmed.
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217315
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230421120625.3366-2-mario.limonciello@amd.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ae071fc21 upstream.
Josh Triplett reports that initramfs-tools needs modules.builtin and
modules.builtin.modinfo to create a working initramfs for a non-modular
kernel.
If this is a general tooling issue not limited to Debian, I think it
makes sense to change modules_install.
This commit changes the targets as follows when CONFIG_MODULES=n.
In-tree builds:
make modules -> no-op
make modules_install -> install modules.builtin(.modinfo)
External module builds:
make modules -> show error message like before
make modules_install -> show error message like before
Link: https://lore.kernel.org/lkml/36a4014c73a52af27d930d3ca31d362b60f4461c.1686356364.git.josh@joshtriplett.org/
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Tested-by: Josh Triplett <josh@joshtriplett.org>
Stable-dep-of: 4243afdb93 ("kbuild: builddeb: always make modules_install, to install modules.builtin*")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 458c15dfbc upstream.
syzbot reports a bug as below:
general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942
Call Trace:
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691
__raw_write_lock include/linux/rwlock_api_smp.h:209 [inline]
_raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300
__drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100
f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116
f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664
f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838
vfs_fallocate+0x54b/0x6b0 fs/open.c:324
ksys_fallocate fs/open.c:347 [inline]
__do_sys_fallocate fs/open.c:355 [inline]
__se_sys_fallocate fs/open.c:353 [inline]
__x64_sys_fallocate+0xbd/0x100 fs/open.c:353
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The root cause is race condition as below:
- since it tries to remount rw filesystem, so that do_remount won't
call sb_prepare_remount_readonly to block fallocate, there may be race
condition in between remount and fallocate.
- in f2fs_remount(), default_options() will reset mount option to default
one, and then update it based on result of parse_options(), so there is
a hole which race condition can happen.
Thread A Thread B
- f2fs_fill_super
- parse_options
- clear_opt(READ_EXTENT_CACHE)
- f2fs_remount
- default_options
- set_opt(READ_EXTENT_CACHE)
- f2fs_fallocate
- f2fs_insert_range
- f2fs_drop_extent_tree
- __drop_extent_tree
- __may_extent_tree
- test_opt(READ_EXTENT_CACHE) return true
- write_lock(&et->lock) access NULL pointer
- parse_options
- clear_opt(READ_EXTENT_CACHE)
Cc: <stable@vger.kernel.org>
Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49024ec879 upstream.
Handle trailing and leading separators when parsing UNC and prefix
paths in smb3_parse_devname(). Then, store the sanitised paths in
smb3_fs_context::source.
This fixes the following cases
$ mount //srv/share// /mnt/1 -o ...
$ cat /mnt/1/d0/f0
cat: /mnt/1/d0/f0: Invalid argument
The -EINVAL was returned because the client sent SMB2_CREATE "\\d0\f0"
rather than SMB2_CREATE "\d0\f0".
$ mount //srv//share /mnt/1 -o ...
mount: Invalid argument
The -EINVAL was returned correctly although the client only realised
it after sending a couple of bad requests rather than bailing out
earlier when parsing mount options.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f2a0afa98 upstream.
Some servers may return error codes from REQ_GET_DFS_REFERRAL requests
that are unexpected by the client, so to make it easier, assume
non-DFS mounts when the client can't get the initial DFS referral of
@ctx->UNC in dfs_mount_share().
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b8f6446b68 ]
DMA direction should be taken in dma_unmap_page() for unmapping integrity
data.
Fix this DMA direction, and reported in Guangwu's test.
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Fixes: 4aedb70543 ("nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e337087c3 ]
Lion says:
-------
In the QFQ scheduler a similar issue to CVE-2023-31436
persists.
Consider the following code in net/sched/sch_qfq.c:
static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
unsigned int len = qdisc_pkt_len(skb), gso_segs;
// ...
if (unlikely(cl->agg->lmax < len)) {
pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
cl->agg->lmax, len, cl->common.classid);
err = qfq_change_agg(sch, cl, cl->agg->class_weight, len);
if (err) {
cl->qstats.drops++;
return qdisc_drop(skb, sch, to_free);
}
// ...
}
Similarly to CVE-2023-31436, "lmax" is increased without any bounds
checks according to the packet length "len". Usually this would not
impose a problem because packet sizes are naturally limited.
This is however not the actual packet length, rather the
"qdisc_pkt_len(skb)" which might apply size transformations according to
"struct qdisc_size_table" as created by "qdisc_get_stab()" in
net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc.
A user may choose virtually any size using such a table.
As a result the same issue as in CVE-2023-31436 can occur, allowing heap
out-of-bounds read / writes in the kmalloc-8192 cache.
-------
We can create the issue with the following commands:
tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \
overhead 999999999 linklayer ethernet qfq
tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k
tc filter add dev $DEV parent 1: matchall classid 1:1
ping -I $DEV 1.1.1.2
This is caused by incorrectly assuming that qdisc_pkt_len() returns a
length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX.
Fixes: 462dbc9101 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: Lion <nnamrec@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 158810b261 ]
25369891fc deletes a check for the case where no 'lmax' is
specified which 3037933448 previously fixed as 'lmax'
could be set to the device's MTU without any bound checking
for QFQ_LMAX_MIN and QFQ_LMAX_MAX. Therefore, reintroduce the check.
Fixes: 25369891fc ("net/sched: sch_qfq: refactor parsing of netlink parameters")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f72207a5c0 ]
The simple_write_to_buffer() function is designed to handle partial
writes. It returns negatives on error, otherwise it returns the number
of bytes that were able to be copied. This code doesn't check the
return properly. We only know that the first byte is written, the rest
of the buffer might be uninitialized.
There is no need to use the simple_write_to_buffer() function.
Partial writes are prohibited by the "if (*ppos != 0)" check at the
start of the function. Just use memdup_user() and copy the whole
buffer.
Fixes: d3cbb907ae ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/7c1f950b-3a7d-4252-82a6-876e53078ef7@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3f87278bc ]
The kernel does not currently validate that both the minimum and maximum
ports of a port range are specified. This can lead user space to think
that a filter matching on a port range was successfully added, when in
fact it was not. For example, with a patched (buggy) iproute2 that only
sends the minimum port, the following commands do not return an error:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
# tc filter show dev swp1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 2 ref 1 bind 1
Fix by returning an error unless both ports are specified:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
Error: Both min and max source ports must be specified.
We have an error talking to the kernel
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
Error: Both min and max destination ports must be specified.
We have an error talking to the kernel
Fixes: 5c72299fba ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e06c57d66 ]
Currently, verifier does not reject XDP programs that pass NULL pointer to
hints functions. At the same time, this case is not handled in any driver
implementation (including veth). For example, changing
bpf_xdp_metadata_rx_timestamp(ctx, ×tamp);
to
bpf_xdp_metadata_rx_timestamp(ctx, NULL);
in xdp_metadata test successfully crashes the system.
Add KF_TRUSTED_ARGS flag to hints kfunc definitions, so driver code
does not have to worry about getting invalid pointers.
Fixes: 3d76a4d3d4 ("bpf: XDP metadata RX kfuncs")
Reported-by: Stanislav Fomichev <sdf@google.com>
Closes: https://lore.kernel.org/bpf/ZKWo0BbpLfkZHbyE@google.com/
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230711105930.29170-1-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8191213a58 ]
z_erofs_do_read_page() may loop infinitely due to the inappropriate
truncation in the below statement. Since the offset is 64 bits and min_t()
truncates the result to 32 bits. The solution is to replace unsigned int
with a 64-bit type, such as erofs_off_t.
cur = end - min_t(unsigned int, offset + end - map->m_la, end);
- For example:
- offset = 0x400160000
- end = 0x370
- map->m_la = 0x160370
- offset + end - map->m_la = 0x400000000
- offset + end - map->m_la = 0x00000000 (truncated as unsigned int)
- Expected result:
- cur = 0
- Actual result:
- cur = 0x370
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230710093410.44071-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56b3c6ba53 ]
When the XDP feature is enabled and with heavy XDP frames to be
transmitted, there is a considerable probability that available
tx BDs are insufficient. This will lead to some XDP frames to be
discarded and the "NOT enough BD for SG!" error log will appear
in the console (as shown below).
[ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
In the case of heavy XDP traffic, sometimes the speed of recycling
tx BDs may be slower than the speed of sending XDP frames. There
may be several specific reasons, such as the interrupt is not
responsed in time, the efficiency of the NAPI callback function is
too low due to all the queues (tx queues and rx queues) share the
same NAPI, and so on.
After trying various methods, I think that increase the size of tx
BD ring is simple and effective. Maybe the best resolution is that
allocate NAPI for each queue to improve the efficiency of the NAPI
callback, but this change is a bit big and I didn't try this method.
Perheps this method will be implemented in a future patch.
This patch also updates the tx_wake_threshold of tx ring which is
related to the size of tx ring in the previous logic. Otherwise,
the tx_wake_threshold will be too high (403 BDs), which is more
likely to impact the slow path in the case of heavy XDP traffic,
because XDP path and slow path share the tx BD rings. According
to Jakub's suggestion, the tx_wake_threshold is at least equal to
tx_stop_threshold + 2 * MAX_SKB_FRAGS, if a queue of hundreds of
entries is overflowing, we should be able to apply a hysteresis
of a few tens of entries.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20f7973990 ]
Once the XDP frames have been successfully transmitted through the
ndo_xdp_xmit() interface, it's the driver responsibility to free
the frames so that the page_pool can recycle the pages and reuse
them. However, this action is not implemented in the fec driver.
This leads to a user-visible problem that the console will print
the following warning log.
[ 157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec
[ 217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec
[ 278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec
[ 338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec
[ 399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec
Therefore, to solve this issue, we free XDP frames via xdp_return_frame()
while cleaning the tx BD ring.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ae9c66b04 ]
This patch is a cleanup for fec driver. The fec_enet_reset_skb()
is used to free skb buffers for tx queues and is only invoked in
fec_restart(). However, fec_enet_bd_init() also resets skb buffers
and is invoked in fec_restart() too. So fec_enet_reset_skb() is
redundant and useless.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 20f7973990 ("net: fec: recycle pages for transmitted XDP frames")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c56fb2aab2 ]
In order to generate the prologue and epilogue, the BPF JIT needs to
know which registers that are clobbered. Therefore, the during
pre-final passes, the prologue is generated after the body of the
program body-prologue-epilogue. Then, in the final pass, a proper
prologue-body-epilogue JITted image is generated.
This scheme has worked most of the time. However, for some large
programs with many jumps, e.g. the test_kmod.sh BPF selftest with
hardening enabled (blinding constants), this has shown to be
incorrect. For the final pass, when the proper prologue-body-epilogue
is generated, the image has not converged. This will lead to that the
final image will have incorrect jump offsets. The following is an
excerpt from an incorrect image:
| ...
| 3b8: 00c50663 beq a0,a2,3c4 <.text+0x3c4>
| 3bc: 0020e317 auipc t1,0x20e
| 3c0: 49630067 jalr zero,1174(t1) # 20e852 <.text+0x20e852>
| ...
| 20e84c: 8796 c.mv a5,t0
| 20e84e: 6422 c.ldsp s0,8(sp) # Epilogue start
| 20e850: 6141 c.addi16sp sp,16
| 20e852: 853e c.mv a0,a5 # Incorrect jump target
| 20e854: 8082 c.jr ra
The image has shrunk, and the epilogue offset is incorrect in the
final pass.
Correct the problem by always generating proper prologue-body-epilogue
outputs, which means that the first pass will only generate the body
to track what registers that are touched.
Fixes: 2353ecc6f9 ("bpf, riscv: add BPF JIT for RV64G")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230710074131.19596-1-bjorn@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dceaafd668 ]
With commit 27267655c5 ("openrisc: Support floating point user api") I
added an entry to the struct sigcontext which caused an unwanted change
to the userspace ABI.
To fix this we use the previously unused oldmask field space for the
floating point fpcsr state. We do this with a union to restore the ABI
back to the pre kernel v6.4 ABI and keep API compatibility.
This does mean if there is some code somewhere that is setting oldmask
in an OpenRISC specific userspace sighandler it would end up setting the
floating point register status, but I think it's unlikely as oldmask was
never functional before.
Fixes: 27267655c5 ("openrisc: Support floating point user api")
Reported-by: Szabolcs Nagy <nsz@port70.net>
Closes: https://lore.kernel.org/openrisc/20230626213840.GA1236108@port70.net/
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0bcc62858d ]
The insertion of an empty frame was introduced with
commit db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
in order to ensure that the current cycle has at least one packet if
there is some packet to be scheduled for the next cycle.
However, the current implementation does not properly check if
a packet is already scheduled for the current cycle. Currently,
an empty packet is always inserted if and only if
txtime >= end_of_cycle && txtime > last_tx_cycle
but since last_tx_cycle is always either the end of the current
cycle (end_of_cycle) or the end of a previous cycle, the
second part (txtime > last_tx_cycle) is always true unless
txtime == last_tx_cycle.
What actually needs to be checked here is if the last_tx_cycle
was already written within the current cycle, so an empty frame
should only be inserted if and only if
txtime >= end_of_cycle && end_of_cycle > last_tx_cycle.
This patch does not only avoid an unnecessary insertion, but it
can actually be harmful to insert an empty packet if packets
are already scheduled in the current cycle, because it can lead
to a situation where the empty packet is actually processed
as the first packet in the upcoming cycle shifting the packet
with the first_flag even one cycle into the future, finally leading
to a TX hang.
The TX hang can be reproduced on a i225 with:
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <32>
TDT <3c>
next_to_use <3c>
next_to_clean <32>
buffer_info[next_to_clean]
time_stamp <ffff26a8>
next_to_watch <00000000632a1828>
jiffies <ffff27f8>
desc.status <1048000>
Fixes: db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1bca9ac0b ]
It is possible (verified on a running system) that frames are processed
by igc_tx_launchtime with a txtime before the start of the cycle
(baset_est).
However, the result of txtime - baset_est is written into a u32,
leading to a wrap around to a positive number. The following
launchtime > 0 check will only branch to executing launchtime = 0
if launchtime is already 0.
Fix it by using a s32 before checking launchtime > 0.
Fixes: db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b86f10ab6 ]
The flags IGC_TXQCTL_STRICT_CYCLE and IGC_TXQCTL_STRICT_END
prevent the packet transmission over slot and cycle boundaries.
This is important for taprio offload where the slots and
cycles correspond to the slots and cycles configured for the
network.
However, the Qbv offload feature of the i225 is also used for
enabling TX launchtime / ETF offload. In that case, however,
the cycle has no meaning for the network and is only used
internally to adapt the base time register after a second has
passed.
Enabling strict mode in this case would unnecessarily prevent
the transmission of certain packets (i.e. at the boundary of a
second) and thus interferes with the ETF qdisc that promises
transmission at a certain point in time.
Similar to ETF, this also applies to CBS offload that also should
not be influenced by strict mode unless taprio offload would be
enabled at the same time.
This fully reverts
commit d8f45be01d ("igc: Use strict cycles for Qbv scheduling")
but its commit message only describes what was already implemented
before that commit. The difference to a plain revert of that commit
is that it now copes with the base_time = 0 case that was fixed with
commit e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
In particular, enabling strict mode leads to TX hang situations
under high traffic if taprio is applied WITHOUT taprio offload
but WITH ETF offload, e.g. as in
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <d0>
TDT <f0>
next_to_use <f0>
next_to_clean <d0>
buffer_info[next_to_clean]
time_stamp <ffff661f>
next_to_watch <00000000245a4efb>
jiffies <ffff6e48>
desc.status <1048000>
Fixes: d8f45be01d ("igc: Use strict cycles for Qbv scheduling")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 82ff5f29b7 ]
Only set adapter->taprio_offload_enable after validating the arguments.
Otherwise, it stays set even if the offload was not enabled.
Since the subsequent code does not get executed in case of invalid
arguments, it will not be read at first.
However, by activating and then deactivating another offload
(e.g. ETF/TX launchtime offload), taprio_offload_enable is read
and erroneously keeps the offload feature of the NIC enabled.
This can be reproduced as follows:
# TAPRIO offload (flags == 0x2) and negative base-time leading to expected -ERANGE
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time -1000 \
sched-entry S 01 300000 \
flags 0x2
# IGC_TQAVCTRL is 0x0 as expected (iomem=relaxed for reading register)
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
# Activate ETF offload
sudo tc qdisc replace dev enp1s0 parent root handle 6666 mqprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
hw 0
sudo tc qdisc add dev enp1s0 parent 6666:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload
# IGC_TQAVCTRL is 0x9 as expected
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
# Deactivate ETF offload again
sudo tc qdisc delete dev enp1s0 parent 6666:1
# IGC_TQAVCTRL should now be 0x0 again, but is observed as 0x9
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8046063df8 ]
In the current implementation the flags adapter->qbv_enable
and IGC_FLAG_TSN_QBV_ENABLED have a similar name, but do not
have the same meaning. The first one is used only to indicate
taprio offload (i.e. when igc_save_qbv_schedule was called),
while the second one corresponds to the Qbv mode of the hardware.
However, the second one is also used to support the TX launchtime
feature, i.e. ETF qdisc offload. This leads to situations where
adapter->qbv_enable is false, but the flag IGC_FLAG_TSN_QBV_ENABLED
is set. This is prone to confusion.
The rename should reduce this confusion. Since it is a pure
rename, it has no impact on functionality.
Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d800bc500 ]
Inspired from struct flow_cls_offload :: cmd, in order for taprio to be
able to report statistics (which is future work), it seems that we need
to drill one step further with the ndo_setup_tc(TC_SETUP_QDISC_TAPRIO)
multiplexing, and pass the command as part of the common portion of the
muxed structure.
Since we already have an "enable" variable in tc_taprio_qopt_offload,
refactor all drivers to check for "cmd" instead of "enable", and reject
every other command except "replace" and "destroy" - to be future proof.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> # for lan966x
Acked-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 8046063df8 ("igc: Rename qbv_enable to taprio_offload_enable")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 028e6e204a ]
The while-loop may break on one of the two conditions, either ID string
is empty or GUID matches. The second one, may never be reached if the
parsed string is not correct GUID. In such a case the loop will never
advance to check the next ID.
Break possible infinite loop by factoring out guid_parse_and_compare()
helper which may be moved to the generic header for everyone later on
and preventing from similar mistake in the future.
Interestingly that firstly it appeared when WMI was turned into a bus
driver, but later when duplicated GUIDs were checked, the while-loop
has been replaced by for-loop and hence no mistake made again.
Fixes: a48e23385f ("platform/x86: wmi: add context pointer field to struct wmi_device_id")
Fixes: 844af950da ("platform/x86: wmi: Turn WMI into a bus driver")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230621151155.78279-1-andriy.shevchenko@linux.intel.com
Tested-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87355b7c3d ]
Add check for the return value of skb_copy in order to avoid NULL pointer
dereference.
Fixes: 2cd5485663 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f151364b1 ]
A previous patch addressed the fortified memcpy warning for most
builds, but I still see this one with gcc-9:
In file included from include/linux/string.h:254,
from drivers/hid/hid-hyperv.c:8:
In function 'fortify_memcpy_chk',
inlined from 'mousevsc_on_receive' at drivers/hid/hid-hyperv.c:272:3:
include/linux/fortify-string.h:583:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
583 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
My guess is that the WARN_ON() itself is what confuses gcc, so it no
longer sees that there is a correct range check. Rework the code in a
way that helps readability and avoids the warning.
Fixes: 542f25a944 ("HID: hyperv: Replace one-element array with flexible-array member")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230705140242.844167-1-arnd@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06a0716949 ]
Now in addrconf_mod_rs_timer(), reference idev depends on whether
rs_timer is not pending. Then modify rs_timer timeout.
There is a time gap in [1], during which if the pending rs_timer
becomes not pending. It will miss to hold idev, but the rs_timer
is activated. Thus rs_timer callback function addrconf_rs_timer()
will be executed and put idev later without holding idev. A refcount
underflow issue for idev can be caused by this.
if (!timer_pending(&idev->rs_timer))
in6_dev_hold(idev);
<--------------[1]
mod_timer(&idev->rs_timer, jiffies + when);
To fix the issue, hold idev if mod_timer() return 0.
Fixes: b7b1bfce0b ("ipv6: split duplicate address detection and router solicitation timer")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2790143f09 ]
As the devm_kcalloc may return NULL pointer,
it should be better to add check for the return
value, as same as the others.
Fixes: 7f46c8b3a5 ("NTB: ntb_tool: Add full multi-port NTB API support")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8623ccbfc5 ]
If device_register() returns error, the name allocated by
dev_set_name() need be freed. As comment of device_register()
says, it should use put_device() to give up the reference in
the error path. So fix this by calling put_device(), then the
name can be freed in kobject_cleanup(), and client_dev is freed
in ntb_transport_client_release().
Fixes: fce8a7bb5b ("PCI-Express Non-Transparent Bridge Support")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c3c796aca ]
A problem about ntb_hw_intel create debugfs failed is triggered with the
following log given:
[ 273.112733] Intel(R) PCI-E Non-Transparent Bridge Driver 2.0
[ 273.115342] debugfs: Directory 'ntb_hw_intel' with parent '/' already present!
The reason is that intel_ntb_pci_driver_init() returns
pci_register_driver() directly without checking its return value, if
pci_register_driver() failed, it returns without destroy the newly created
debugfs, resulting the debugfs of ntb_hw_intel can never be created later.
intel_ntb_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes: e26a5843f7 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98af0a33c1 ]
A problem about ntb_hw_amd create debugfs failed is triggered with the
following log given:
[ 618.431232] AMD(R) PCI-E Non-Transparent Bridge Driver 1.0
[ 618.433284] debugfs: Directory 'ntb_hw_amd' with parent '/' already present!
The reason is that amd_ntb_pci_driver_init() returns pci_register_driver()
directly without checking its return value, if pci_register_driver()
failed, it returns without destroy the newly created debugfs, resulting
the debugfs of ntb_hw_amd can never be created later.
amd_ntb_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes: a1b3695820 ("NTB: Add support for AMD PCI-Express Non-Transparent Bridge")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c012968259 ]
A problem about ntb_hw_idt create debugfs failed is triggered with the
following log given:
[ 1236.637636] IDT PCI-E Non-Transparent Bridge Driver 2.0
[ 1236.639292] debugfs: Directory 'ntb_hw_idt' with parent '/' already present!
The reason is that idt_pci_driver_init() returns pci_register_driver()
directly without checking its return value, if pci_register_driver()
failed, it returns without destroy the newly created debugfs, resulting
the debugfs of ntb_hw_idt can never be created later.
idt_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes: bf2a952d31 ("NTB: Add IDT 89HPESxNTx PCIe-switches support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 266deeea34 ]
When ism_unregister_client() is called but the client still has DMBs
registered it returns -EBUSY and prints an error. This only happens
after the client has already been unregistered however. This is
unexpected as the unregister claims to have failed. Furthermore as this
implies a client bug a WARN() is more appropriate. Thus move the
deregistration after the check and use WARN().
Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76631ffa2f ]
Previously the clients_lock was protecting the clients array against
concurrent addition/removal of clients but was also accessed from IRQ
context. This meant that it had to be a spinlock and that the add() and
remove() callbacks in which clients need to do allocation and take
mutexes can't be called under the clients_lock. To work around this these
callbacks were moved to workqueues. This not only introduced significant
complexity but is also subtly broken in at least one way.
In ism_dev_init() and ism_dev_exit() clients[i]->tgt_ism is used to
communicate the added/removed ISM device to the work function. While
write access to client[i]->tgt_ism is protected by the clients_lock and
the code waits that there is no pending add/remove work before and after
setting clients[i]->tgt_ism this is not enough. The problem is that the
wait happens based on per ISM device counters. Thus a concurrent
ism_dev_init()/ism_dev_exit() for a different ISM device may overwrite
a clients[i]->tgt_ism between unlocking the clients_lock and the
subsequent wait for the work to finnish.
Thankfully with the clients_lock no longer held in IRQ context it can be
turned into a mutex which can be held during the calls to add()/remove()
completely removing the need for the workqueues and the associated
broken housekeeping including the per ISM device counters and the
clients[i]->tgt_ism.
Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b5c13b591 ]
The clients array references all registered clients and is protected by
the clients_lock. Besides its use as general list of clients the clients
array is accessed in ism_handle_irq() to forward ISM device events to
clients.
While the clients_lock is taken in the IRQ handler when calling
handle_event() it is however incorrectly not held during the
client->handle_irq() call and for the preceding clients[] access leaving
it unprotected against concurrent client (un-)registration.
Furthermore the accesses to ism->sba_client_arr[] in ism_register_dmb()
and ism_unregister_dmb() are not protected by any lock. This is
especially problematic as the client ID from the ism->sba_client_arr[]
is not checked against NO_CLIENT and neither is the client pointer
checked.
Instead of expanding the use of the clients_lock further add a separate
array in struct ism_dev which references clients subscribed to the
device's events and IRQs. This array is protected by ism->lock which is
already taken in ism_handle_irq() and can be taken outside the IRQ
handler when adding/removing subscribers or the accessing
ism->sba_client_arr[]. This also means that the clients_lock is no
longer taken in IRQ context.
Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7731194fd ]
Turning IRQs off is done by accessing Ethernet controller registers.
That can't be done until device's clock is enabled. It results in a SoC
hang otherwise.
This bug remained unnoticed for years as most bootloaders keep all
Ethernet interfaces turned on. It seems to only affect a niche SoC
family BCM47189. It has two Ethernet controllers but CFE bootloader uses
only the first one.
Fixes: 34322615cb ("net: bgmac: Mask interrupts during probe")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8139dccd46 ]
The tracepoint has existed for 12 years, but it only covered udp
over the legacy IPv4 protocol. Having it enabled for udp6 removes
the unnecessary difference in error visibility.
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Fixes: 296f7ea75b ("udp: add tracepoints for queueing skb to rcvbuf")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit abfb2a58a5 ]
Remove unnecessary early code development check and the WARN_ON
that it uses. The irq alloc and free paths have long been
cleaned up and this check shouldn't have stuck around so long.
Fixes: 77ceb68e29 ("ionic: Add notifyq support")
Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7709fbd492 ]
Moved PTP pointer validation before its use to avoid smatch warning.
Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.
Fixes: 2ef4e45d99 ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af42088bda ]
In legacy silicon, promiscuous mode is only modified
through CGX mbox messages. In CN10KB silicon, it is modified
from CGX mbox and NIX. This breaks legacy application
behaviour. Fix this by removing call from NIX.
Fixes: d6c9784baf ("octeontx2-af: Invoke exact match functions if supported")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0503efeadb ]
Current duplex mode was unset in the driver, resulting in the default
parameter being set to 0, which corresponds to half duplex. It might
mislead users to have incorrect expectation about the driver's
transmission capabilities.
Set the default duplex configuration to full, as the driver runs in
full duplex mode at this point.
Fixes: 7e074d5a76 ("gve: Enable Link Speed Reporting in the driver.")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0323bce598 ]
In the event of a failure in tcf_change_indev(), fw_set_parms() will
immediately return an error after incrementing or decrementing
reference counter in tcf_bind_filter(). If attacker can control
reference counter to zero and make reference freed, leading to
use after free.
In order to prevent this, move the point of possible failure above the
point where the TC_FW_CLASSID is handled.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: M A Ramdhan <ramdhan@starlabs.sg>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Message-ID: <20230705161530.52003-1-ramdhan@starlabs.sg>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c6efb4ae38 ]
This switch implements Hold/Release in a strange way, with no control
from the user as required by IEEE 802.1Q-2018 through Set-And-Hold-MAC
and Set-And-Release-MAC, but rather, it emits HOLD requests implicitly
based on the schedule.
Namely, when the gate of a preemptible TC is about to close (actually
QSYS::PREEMPTION_CFG.HOLD_ADVANCE octet times in advance of this event),
the QSYS seems to emit a HOLD request pulse towards the MAC which
preempts the currently transmitted packet, and further packets are held
back in the queue system.
This allows large frames to be squeezed through small time slots,
because HOLD requests initiated by the gate events result in the frame
being segmented in multiple fragments, the bit time of which is equal to
the size of the time slot.
It has been reported that the vsc9959_tas_guard_bands_update() logic
breaks this, because it doesn't take preemptible TCs into account, and
enables oversized frame dropping when the time slot doesn't allow a full
MTU to be sent, but it does allow 2*minFragSize to be sent (128B).
Packets larger than 128B are dropped instead of being sent in multiple
fragments.
Confusingly, the manual says:
| For guard band, SDU calculation of a traffic class of a port, if
| preemption is enabled (through 'QSYS::PREEMPTION_CFG.P_QUEUES') then
| QSYS::PREEMPTION_CFG.HOLD_ADVANCE is used, otherwise
| QSYS::QMAXSDU_CFG_*.QMAXSDU_* is used.
but this only refers to the static guard band durations, and the
QMAXSDU_CFG_* registers have dual purpose - the other being oversized
frame dropping, which takes place irrespective of whether frames are
preemptible or express.
So, to fix the problem, we need to call vsc9959_tas_guard_bands_update()
from ocelot_port_update_active_preemptible_tcs(), and modify the guard
band logic to consider a different (lower) oversize limit for
preemptible traffic classes.
Fixes: 403ffc2c34 ("net: mscc: ocelot: add support for preemptible traffic classes")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Message-ID: <20230705104422.49025-4-vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c60819149b ]
In a future change we will need to make
ocelot_port_update_active_preemptible_tcs() call
vsc9959_tas_guard_bands_update(), but that is currently not possible,
since the ocelot switch lib does not have access to functions private to
the DSA wrapper.
Move the pointer to vsc9959_tas_guard_bands_update() from felix->info
(which is private to the DSA driver) to ocelot->ops (which is also
visible to the ocelot switch lib).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Message-ID: <20230705104422.49025-3-vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: c6efb4ae38 ("net: mscc: ocelot: fix oversize frame dropping for preemptible TCs")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5415ccd50a ]
The check_max_stack_depth pass happens after the verifier's symbolic
execution, and attempts to walk the call graph of the BPF program,
ensuring that the stack usage stays within bounds for all possible call
chains. There are two cases to consider: bpf_pseudo_func and
bpf_pseudo_call. In the former case, the callback pointer is loaded into
a register, and is assumed that it is passed to some helper later which
calls it (however there is no way to be sure), but the check remains
conservative and accounts the stack usage anyway. For this particular
case, asynchronous callbacks are skipped as they execute asynchronously
when their corresponding event fires.
The case of bpf_pseudo_call is simpler and we know that the call is
definitely made, hence the stack depth of the subprog is accounted for.
However, the current check still skips an asynchronous callback even if
a bpf_pseudo_call was made for it. This is erroneous, as it will miss
accounting for the stack usage of the asynchronous callback, which can
be used to breach the maximum stack depth limit.
Fix this by only skipping asynchronous callbacks when the instruction is
not a pseudo call to the subprog.
Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230705144730.235802-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2fb48d88e7 ]
When a device-mapper device is passing through the inline encryption
support of an underlying device, calls to blk_crypto_evict_key() take
the blk_crypto_profile::lock of the device-mapper device, then take the
blk_crypto_profile::lock of the underlying device (nested). This isn't
a real deadlock, but it causes a lockdep report because there is only
one lock class for all instances of this lock.
Lockdep subclasses don't really work here because the hierarchy of block
devices is dynamic and could have more than 2 levels.
Instead, register a dynamic lock class for each blk_crypto_profile, and
associate that with the lock.
This avoids false-positive lockdep reports like the following:
============================================
WARNING: possible recursive locking detected
6.4.0-rc5 #2 Not tainted
--------------------------------------------
fscryptctl/1421 is trying to acquire lock:
ffffff80829ca418 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0x44/0x1c0
but task is already holding lock:
ffffff8086b68ca8 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0xc8/0x1c0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&profile->lock);
lock(&profile->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
Fixes: 1b26283970 ("block: Keyslot Manager for Inline Encryption")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230610061139.212085-1-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84a192e461 ]
I225/6 hardware can be programmed to start PPS output once
the time in Target Time registers is reached. The time
programmed in these registers should always be into future.
Only then PPS output is triggered when SYSTIM register
reaches the programmed value. There are two modes in i225/6
hardware to program PPS, pulse and clock mode.
There were issues reported where PPS is not generated when
start time is in past.
Example 1, "echo 0 0 0 2 0 > /sys/class/ptp/ptp0/period"
In the current implementation, a value of '0' is programmed
into Target time registers and PPS output is in pulse mode.
Eventually an interrupt which is triggered upon SYSTIM
register reaching Target time is not fired. Thus no PPS
output is generated.
Example 2, "echo 0 0 0 1 0 > /sys/class/ptp/ptp0/period"
Above case, a value of '0' is programmed into Target time
registers and PPS output is in clock mode. Here, HW tries to
catch-up the current time by incrementing Target Time
register. This catch-up time seem to vary according to
programmed PPS period time as per the HW design. In my
experiments, the delay ranged between few tens of seconds to
few minutes. The PPS output is only generated after the
Target time register reaches current time.
In my experiments, I also observed PPS stopped working with
below test and could not recover until module is removed and
loaded again.
1) echo 0 <future time> 0 1 0 > /sys/class/ptp/ptp1/period
2) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period
3) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period
After this PPS did not work even if i re-program with proper
values. I could only get this back working by reloading the
driver.
This patch takes care of calculating and programming
appropriate future time value into Target Time registers.
Fixes: 5e91c72e56 ("igc: Fix PPS delta between two synchronized end-points")
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 25102893e4 ]
IEEE 802.1Q does not have clear definitions of what constitutes an
SDU (Service Data Unit), but IEEE Std 802.3 clause 3.1.2 does define
the MAC service primitives and clause 3.2.7 does define the MAC Client
Data for Q-tagged frames.
It shows that the mac_service_data_unit (MSDU) does NOT contain the
preamble, destination and source address, or FCS. The MSDU does contain
the length/type field, MAC client data, VLAN tag and any padding
data (prior to the FCS).
Thus, the maximum 802.3 frame size that is allowed to be transmitted
should be QueueMaxSDU (MSDU) + 16 (6 byte SA + 6 byte DA + 4 byte FCS).
Fixes: 92a0dcb842 ("igc: offload queue max SDU from tc-taprio")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7abd955a58 ]
Currently mlx5e releases pages directly to the page_pool for XDP_TX and
does page fragment counting for XDP_REDIRECT. RX pages from the
page_pool are leaking on XDP_REDIRECT because the xdp core will release
only one fragment out of MLX5E_PAGECNT_BIAS_MAX and subsequently the page
is marked as "skip release" which avoids the driver release.
A fix would be to take an extra fragment for XDP_REDIRECT and not set the
"skip release" bit so that the release on the driver side can handle the
remaining bias fragments. But this would be a shortsighted solution.
Instead, this patch converges the two XDP paths (XDP_TX and XDP_REDIRECT) to
always do fragment tracking. The "skip release" bit is no longer
necessary for XDP.
Fixes: 6f57428460 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6496357aa5 ]
On vport enable, where fw's hca caps are queried, the driver queries
hca_caps_2 without checking if fw truly supports them, causing a false
failure of vfs vport load and blocking SRIOV enablement on old devices
such as CX4 where hca_caps_2 support is missing.
Thus, add a check for the said caps support before accessing them.
Fixes: e5b9642a33 ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7a485115a ]
Non-clear CT action causes a flow rule split, while CT clear action
doesn't and is just a header-rewrite to the current flow rule.
But ct offload is done in post_parse and is per ct action instance,
so ct clear offload is parsed multiple times, while its deleted once.
Fix this by post_parsing the ct action only once per flow attribute
(which is per flow rule) by using a offloaded ct_attr flag.
Fixes: 08fe94ec5f ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 631079e08a ]
Prior to this patch only one "mlx5" thermal zone could have been
registered regardless of the number of individual mlx5 devices in the
system.
To fix this setup a unique name per device to register its own thermal
zone.
In order to not register a thermal zone for a virtual device (VF/SF) add
a check for PF device type.
The new name is a concatenation between "mlx5_" and "<PCI_DEV_BDF>", which
will also help associating a thermal zone with its PCI device.
$ lspci | grep ConnectX
00:04.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
00:05.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
$ cat /sys/devices/virtual/thermal/thermal_zone0/type
mlx5_0000:00:04.0
$ cat /sys/devices/virtual/thermal/thermal_zone1/type
mlx5_0000:00:05.0
Fixes: c1fef618d6 ("net/mlx5: Implement thermal zone")
CC: Sandipan Patra <spatra@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e2d196579 ]
Regular (non-XSK) RQs get flushed on XSK setup and re-activated on XSK
close. If the same regular RQ is closed (a config change for example)
soon after the XSK close, a double release occurs because the missing
wqes get released a second time.
Fixes: 3f93f82988 ("net/mlx5e: RX, Defer page release in legacy rq for better recycling")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d543b649ff ]
When kvzalloc_node or kvzalloc failed in mlx5e_ptp_open, the memory
pointed by "c" or "cparams" is not freed, which can lead to a memory
leak. Fix by freeing the array in the error path.
Fixes: 145e5637d9 ("net/mlx5e: Add TX PTP port object support")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3250affdc6 ]
The memory pointed to by the fs->any pointer is not freed in the error
path of mlx5e_fs_tt_redirect_any_create, which can lead to a memory leak.
Fix by freeing the memory in the error path, thereby making the error path
identical to mlx5e_fs_tt_redirect_any_destroy().
Fixes: 0f575c20bf ("net/mlx5e: Introduce Flow Steering ANY API")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 884abe45a9 ]
In function accel_fs_tcp_create_groups(), when the ft->g memory is
successfully allocated but the 'in' memory fails to be allocated, the
memory pointed to by ft->g is released once. And in function
accel_fs_tcp_create_table, mlx5e_destroy_flow_table is called to release
the memory pointed to by ft->g again. This will cause double free problem.
Fixes: c062d52ac2 ("net/mlx5e: Receive flow steering framework for accelerated TCP flows")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 175c241288 ]
If a user schedules a Gate Control List (GCL) to close one of
the QBV gates while also transmitting a packet to that closed gate,
TX Hang will be happen. HW would not drop any packet when the gate
is closed and keep queuing up in HW TX FIFO until the gate is re-opened.
This patch implements the solution to drop the packet for the closed
gate.
This patch will also reset the adapter to perform SW initialization
for each 1st Gate Control List (GCL) to avoid hang.
This is due to the HW design, where changing to TSN transmit mode
requires SW initialization. Intel Discrete I225/6 transmit mode
cannot be changed when in dynamic mode according to Software User
Manual Section 7.5.2.1. Subsequent Gate Control List (GCL) operations
will proceed without a reset, as they already are in TSN Mode.
Step to reproduce:
DUT:
1) Configure GCL List with certain gate close.
BASE=$(date +%s%N)
tc qdisc replace dev $IFACE parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \
queues 1@0 1@1 1@2 1@3 \
base-time $BASE \
sched-entry S 0x8 500000 \
sched-entry S 0x4 500000 \
flags 0x2
2) Transmit the packet to closed gate. You may use udp_tai
application to transmit UDP packet to any of the closed gate.
./udp_tai -i <interface> -P 100000 -p 90 -c 1 -t <0/1> -u 30004
Fixes: ec50a9d437 ("igc: Add support for taprio offloading")
Co-developed-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Tested-by: Chwee Lin Choong <chwee.lin.choong@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8416814fff ]
This implements XDP hints kfunc for RX-hash (xmo_rx_hash).
The HW rss hash type is handled via mapping table.
This igc driver (default config) does L3 hashing for UDP packets
(excludes UDP src/dest ports in hash calc). Meaning RSS hash type is
L3 based. Tested that the igc_rss_type_num for UDP is either
IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6.
This patch also updates AF_XDP zero-copy function igc_clean_rx_irq_zc()
to use the xdp_buff wrapper struct igc_xdp_buff.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182465285.616355.2701740913376314790.stgit@firesoul
Stable-dep-of: 175c241288 ("igc: Fix TX Hang issue when QBV Gate is closed")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 73b7123de0 ]
Driver specific metadata data for XDP-hints kfuncs are propagated via tail
extending the struct xdp_buff with a locally scoped driver struct.
Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for
extending the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define
catch size violations build time.
The changes needed for AF_XDP zero-copy in igc_clean_rx_irq_zc()
is done in next patch, because the member rx_desc isn't available
at this point.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182464779.616355.3761989884165609387.stgit@firesoul
Stable-dep-of: 175c241288 ("igc: Fix TX Hang issue when QBV Gate is closed")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cca28ceac7 ]
Remove unnecessary delay during the TX ring configuration.
This will cause delay, especially during link down and
link up activity.
Furthermore, old SKUs like as I225 will call the reset_adapter
to reset the controller during TSN mode Gate Control List (GCL)
setting. This will add more time to the configuration of the
real-time use case.
It doesn't mentioned about this delay in the Software User Manual.
It might have been ported from legacy code I210 in the past.
Fixes: 13b5b7fd6a ("igc: Add support for Tx/Rx rings")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed89b74d2d ]
Add condition to increase the qbv counter during taprio qbv
configuration only.
There might be a case when TC already been setup then user configure
the ETF/CBS qdisc and this counter will increase if no condition above.
Fixes: ae4fe46983 ("igc: Add qbv_config_change_errors counter")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 479cdfe388 ]
Configuring tx_maxrate via sysfs interface
/sys/class/net/eth0/queues/tx-1/tx_maxrate was not working when
TCs are configured because always main VSI was being used. Fix by
using correct VSI in ice_set_tx_maxrate when TCs are configured.
Fixes: 1ddef455f4 ("ice: Add NDO callback to set the maximum per-queue bitrate")
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f16da6ee6 ]
Remove incorrect check in ice_validate_mqprio_opt() that limits
filter configuration when sum of max_rates of all TCs exceeds
the link speed. The max rate of each TC is unrelated to value
used by other TCs and is valid as long as it is less than link
speed.
Fixes: fbc7b27af0 ("ice: enable ndo_setup_tc support for mqprio_qdisc")
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eaf9e7192e ]
Originally this used jhash2() over tuple and folded the zone id,
the pernet hash value, destination port and l4 protocol number into the
32bit seed value.
When the switch to siphash was done, I used an on-stack temporary
buffer to build a suitable key to be hashed via siphash().
But this showed up as performance regression, so I got rid of
the temporary copy and collected to-be-hashed data in 4 u64 variables.
This makes it easy to build tuples that produce the same hash, which isn't
desirable even though chain lengths are limited.
Switch back to plain siphash, but just like with jhash2(), take advantage
of the fact that most of to-be-hashed data is already in a suitable order.
Use an empty struct as annotation in 'struct nf_conntrack_tuple' to mark
last member that can be used as hash input.
The only remaining data that isn't present in the tuple structure are the
zone identifier and the pernet hash: fold those into the key.
Fixes: d2c806abcf ("netfilter: conntrack: use siphash_4u64")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1689f25924 ]
Overflow use refcount checks are not complete.
Add helper function to deal with object reference counter tracking.
Report -EMFILE in case UINT_MAX is reached.
nft_use_dec() splats in case that reference counter underflows,
which should not ever happen.
Add nft_use_inc_restore() and nft_use_dec_restore() which are used
to restore reference counter from error and abort paths.
Use u32 in nft_flowtable and nft_object since helper functions cannot
work on bitfields.
Remove the few early incomplete checks now that the helper functions
are in place and used to check for refcount overflow.
Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ac0406335 ]
Although the desired size of the SWIOTLB memory pool is increased in
swiotlb_adjust_nareas() to match the number of areas, the actual allocation
may be smaller, which may require reducing the number of areas.
For example, Xen uses swiotlb_init_late(), which in turn uses the page
allocator. On x86, page size is 4 KiB and MAX_ORDER is 10 (1024 pages),
resulting in a maximum memory pool size of 4 MiB. This corresponds to 2048
slots of 2 KiB each. The minimum area size is 128 (IO_TLB_SEGSIZE),
allowing at most 2048 / 128 = 16 areas.
If num_possible_cpus() is greater than the maximum number of areas, areas
are smaller than IO_TLB_SEGSIZE and contiguous groups of free slots will
span multiple areas. When allocating and freeing slots, only one area will
be properly locked, causing race conditions on the unlocked slots and
ultimately data corruption, kernel hangs and crashes.
Fixes: 20347fca71 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aabd12609f ]
The number of areas defaults to the number of possible CPUs. However, the
total number of slots may have to be increased after adjusting the number
of areas. Consequently, the number of areas must be determined before
allocating the memory pool. This is even explained with a comment in
swiotlb_init_remap(), but swiotlb_init_late() adjusts the number of areas
after slots are already allocated. The areas may end up being smaller than
IO_TLB_SEGSIZE, which breaks per-area locking.
While fixing swiotlb_init_late(), move all relevant comments before the
definition of swiotlb_adjust_nareas() and convert them to kernel-doc.
Fixes: 20347fca71 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7aa83fbd71 ]
Memory for the "struct device" for any given device isn't supposed to
be released until the device's release() is called. This is important
because someone might be holding a kobject reference to the "struct
device" and might try to access one of its members even after any
other cleanup/uninitialization has happened.
Code analysis of ti-sn65dsi86 shows that this isn't quite right. When
the code was written, it was believed that we could rely on the fact
that the child devices would all be freed before the parent devices
and thus we didn't need to worry about a release() function. While I
still believe that the parent's "struct device" is guaranteed to
outlive the child's "struct device" (because the child holds a kobject
reference to the parent), the parent's "devm" allocated memory is a
different story. That appears to be freed much earlier.
Let's make this better for ti-sn65dsi86 by allocating each auxiliary
with kzalloc and then free that memory in the release().
Fixes: bf73537f41 ("drm/bridge: ti-sn65dsi86: Break GPIO and MIPI-to-eDP bridge into sub-drivers")
Suggested-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230613065812.v2.1.I24b838a5b4151fb32bccd6f36397998ea2df9fbb@changeid
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98703e4e06 ]
Commit 5d844091f2 ("drm/scdc-helper: Pimp SCDC debugs") changed the scdc
interface to pick up an i2c adapter from a connector instead. However, in
the case of dw-hdmi, the wrong connector was being used to pass i2c adapter
information, since dw-hdmi's embedded connector structure is only populated
when the bridge attachment callback explicitly asks for it.
drm-meson is handling connector creation, so this won't happen, leading to
a NULL pointer dereference.
Fix it by having scdc functions access dw-hdmi's current connector pointer
instead, which is assigned during the bridge enablement stage.
Fixes: 5d844091f2 ("drm/scdc-helper: Pimp SCDC debugs")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reported-by: Lukas F. Hartmann <lukas@mntre.com>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
[narmstrong: moved Fixes tag before first S-o-b and added Reported-by tag]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230601123153.196867-1-adrian.larumbe@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8a796565ce upstream.
I observed poor performance of io_uring compared to synchronous IO. That
turns out to be caused by deeper CPU idle states entered with io_uring,
due to io_uring using plain schedule(), whereas synchronous IO uses
io_schedule().
The losses due to this are substantial. On my cascade lake workstation,
t/io_uring from the fio repository e.g. yields regressions between 20%
and 40% with the following command:
./t/io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S$use_sync -R 0 /mnt/t2/fio/write.0.0
This is repeatable with different filesystems, using raw block devices
and using different block devices.
Use io_schedule_prepare() / io_schedule_finish() in
io_cqring_wait_schedule() to address the difference.
After that using io_uring is on par or surpassing synchronous IO (using
registered files etc makes it reliably win, but arguably is a less fair
comparison).
There are other calls to schedule() in io_uring/, but none immediately
jump out to be similarly situated, so I did not touch them. Similarly,
it's possible that mutex_lock_io() should be used, but it's not clear if
there are cases where that matters.
Cc: stable@vger.kernel.org # 5.10+
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: io-uring@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andres Freund <andres@anarazel.de>
Link: https://lore.kernel.org/r/20230707162007.194068-1-andres@anarazel.de
[axboe: minor style fixup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f58d0a9b4c upstream.
Packets bound for peers can queue up prior to the device private key
being set. For example, if persistent keepalive is set, a packet is
queued up to be sent as soon as the device comes up. However, if the
private key hasn't been set yet, the handshake message never sends, and
no timer is armed to retry, since that would be pointless.
But, if a user later sets a private key, the expectation is that those
queued packets, such as a persistent keepalive, are actually sent. So
adjust the configuration logic to account for this edge case, and add a
test case to make sure this works.
Maxim noticed this with a wg-quick(8) config to the tune of:
[Interface]
PostUp = wg set %i private-key somefile
[Peer]
PublicKey = ...
Endpoint = ...
PersistentKeepalive = 25
Here, the private key gets set after the device comes up using a PostUp
script, triggering the bug.
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Reported-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Tested-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Link: https://lore.kernel.org/wireguard/87fs7xtqrv.fsf@gmail.com/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7387943fa3 upstream.
Using `% nr_cpumask_bits` is slow and complicated, and not totally
robust toward dynamic changes to CPU topologies. Rather than storing the
next CPU in the round-robin, just store the last one, and also return
that value. This simplifies the loop drastically into a much more common
pattern.
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Manuel Leiner <manuel.leiner@gmx.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6eef7a2b93 upstream.
If nf_conntrack_init_start() fails (for example due to a
register_nf_conntrack_bpf() failure), the nf_conntrack_helper_fini()
clean-up path frees the nf_ct_helper_hash map.
When built with NF_CONNTRACK=y, further netfilter modules (e.g:
netfilter_conntrack_ftp) can still be loaded and call
nf_conntrack_helpers_register(), independently of whether nf_conntrack
initialized correctly. This accesses the nf_ct_helper_hash dangling
pointer and causes a uaf, possibly leading to random memory corruption.
This patch guards nf_conntrack_helper_register() from accessing a freed
or uninitialized nf_ct_helper_hash pointer and fixes possible
uses-after-free when loading a conntrack module.
Cc: stable@vger.kernel.org
Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2dd05f107 upstream.
Let helper ovl_i_path_real() return the realinode to prepare for
checking non-null realinode in RCU walking path.
[msz] Use d_inode_rcu() since we are depending on the consitency
between dentry and inode being non-NULL in an RCU setting.
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Fixes: ffa5723c6d ("ovl: store lower path in ovl_inode")
Cc: <stable@vger.kernel.org> # v5.19
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit feb843a469 upstream.
When preprocessing arch/*/kernel/vmlinux.lds.S, the target triple is
not passed to $(CPP) because we add it only to KBUILD_{C,A}FLAGS.
As a result, the linker script is preprocessed with predefined macros
for the build host instead of the target.
Assuming you use an x86 build machine, compare the following:
$ clang -dM -E -x c /dev/null
$ clang -dM -E -x c /dev/null -target aarch64-linux-gnu
There is no actual problem presumably because our linker scripts do not
rely on such predefined macros, but it is better to define correct ones.
Move $(CLANG_FLAGS) to KBUILD_CPPFLAGS, so that all *.c, *.S, *.lds.S
will be processed with the proper target triple.
[Note]
After the patch submission, we got an actual problem that needs this
commit. (CBL issue 1859)
Link: https://github.com/ClangBuiltLinux/linux/issues/1859
Reported-by: Tom Rini <trini@konsulko.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43fc0a9990 upstream.
After commit feb843a469 ("kbuild: add $(CLANG_FLAGS) to
KBUILD_CPPFLAGS"), there is an error while building certain PowerPC
assembly files with clang:
arch/powerpc/lib/copypage_power7.S: Assembler messages:
arch/powerpc/lib/copypage_power7.S:34: Error: junk at end of line: `0b01000'
arch/powerpc/lib/copypage_power7.S:35: Error: junk at end of line: `0b01010'
arch/powerpc/lib/copypage_power7.S:37: Error: junk at end of line: `0b01000'
arch/powerpc/lib/copypage_power7.S:38: Error: junk at end of line: `0b01010'
arch/powerpc/lib/copypage_power7.S:40: Error: junk at end of line: `0b01010'
clang: error: assembler command failed with exit code 1 (use -v to see invocation)
as-option only uses KBUILD_AFLAGS, so after removing CLANG_FLAGS from
KBUILD_AFLAGS, there is no more '--target=' or '--prefix=' flags. As a
result of those missing flags, the host target
will be tested during as-option calls and likely fail, meaning necessary
flags may not get added when building assembly files, resulting in
errors like seen above.
Add KBUILD_CPPFLAGS to as-option invocations to clear up the errors.
This should have been done in commit d5c8d6e0fa ("kbuild: Update
assembler calls to use proper flags and language target"), which
switched from using the assembler target to the assembler-with-cpp
target, so flags that affect preprocessing are passed along in all
relevant tests. as-option now mirrors cc-option.
Fixes: feb843a469 ("kbuild: add $(CLANG_FLAGS) to KBUILD_CPPFLAGS")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/CA+G9fYs=koW9WardsTtora+nMgLR3raHz-LSLr58tgX4T5Mxag@mail.gmail.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cff6e7f50b upstream.
A future change will move CLANG_FLAGS from KBUILD_{A,C}FLAGS to
KBUILD_CPPFLAGS so that '--target' is available while preprocessing.
When that occurs, the following errors appear multiple times when
building ARCH=powerpc powernv_defconfig:
ld.lld: error: vmlinux.a(arch/powerpc/kernel/head_64.o):(.text+0x12d4): relocation R_PPC64_ADDR16_HI out of range: -4611686018409717520 is not in [-2147483648, 2147483647]; references '__start___soft_mask_table'
ld.lld: error: vmlinux.a(arch/powerpc/kernel/head_64.o):(.text+0x12e8): relocation R_PPC64_ADDR16_HI out of range: -4611686018409717392 is not in [-2147483648, 2147483647]; references '__stop___soft_mask_table'
Diffing the .o.cmd files reveals that -DHAVE_AS_ATHIGH=1 is not present
anymore, because as-instr only uses KBUILD_AFLAGS, which will no longer
contain '--target'.
Mirror Kconfig's as-instr and add CLANG_FLAGS explicitly to the
invocation to ensure the target information is always present.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7e5eb53bf upstream.
A future change will move CLANG_FLAGS from KBUILD_{A,C}FLAGS to
KBUILD_CPPFLAGS so that '--target' is available while preprocessing.
When that occurs, the following error appears when building the compat
PowerPC vDSO:
clang: error: unsupported option '-mbig-endian' for target 'x86_64-pc-linux-gnu'
make[3]: *** [.../arch/powerpc/kernel/vdso/Makefile:76: arch/powerpc/kernel/vdso/vdso32.so.dbg] Error 1
Explicitly add CLANG_FLAGS to ldflags-y, so that '--target' will always
be present.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08f6554ff9 upstream.
A future change will move CLANG_FLAGS from KBUILD_{A,C}FLAGS to
KBUILD_CPPFLAGS so that '--target' is available while preprocessing.
When that occurs, the following error appears when building ARCH=mips
with clang (tip of tree error shown):
clang: error: unsupported option '-mabi=' for target 'x86_64-pc-linux-gnu'
Add KBUILD_CPPFLAGS in the CHECKFLAGS invocation to keep everything
working after the move.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66d8fc0539 upstream.
The @source inode must be valid. It is even checked via IS_SWAPFILE()
above making it pretty clear. So no need to check it when we unlock.
What doesn't need to exist is the @target inode. The lock_two_inodes()
helper currently swaps the @inode1 and @inode2 arguments if @inode1 is
NULL to have consistent lock class usage. However, we know that at least
for vfs_rename() that @inode1 is @source and thus is never NULL as per
above. We also know that @source is a different inode than @target as
that is checked right at the beginning of vfs_rename(). So we know that
@source is valid and locked and that @target is locked. So drop the
check whether @source is non-NULL.
Fixes: 28eceeda13 ("fs: Lock moved directories")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202307030026.9sE2pk2x-lkp@intel.com
Message-Id: <20230703-vfs-rename-source-v1-1-37eebb29b65b@kernel.org>
[brauner: use commit message from patch I sent concurrently]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f050e56de8 upstream.
The driver's probe() first registers regulators in a loop and then in a
second loop passes them as irq data to the interrupt handlers. However
the function to get the regulator for given name
tps65219_get_rdev_by_name() was a no-op due to argument passed by value,
not pointer, thus the second loop assigned always same value - from
previous loop. The interrupts, when fired, where executed with wrong
data. Compiler also noticed it:
drivers/regulator/tps65219-regulator.c: In function ‘tps65219_get_rdev_by_name’:
drivers/regulator/tps65219-regulator.c:292:60: error: parameter ‘dev’ set but not used [-Werror=unused-but-set-parameter]
Fixes: c12ac5fc3e ("regulator: drivers: Add TI TPS65219 PMIC regulators support")
Cc: <stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com
Link: https://lore.kernel.org/r/20230507144656.192800-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40b0a74938 upstream.
At __btrfs_cow_block(), instead of doing a BUG_ON() in case we fail to
record a tree mod log root insertion operation, do a transaction abort
instead. There's really no need for the BUG_ON(), we can properly
release all resources in this context and turn the filesystem to RO mode
and in an error state instead.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ede600e497 upstream.
At split_node(), if we fail to log the tree mod log copy operation, we
return without unlocking the split extent buffer we just allocated and
without decrementing the reference we own on it. Fix this by unlocking
it and decrementing the ref count before returning.
Fixes: 5de865eebb ("Btrfs: fix tree mod logging")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d09c51521f upstream.
When COWing an extent buffer that is not the root node, we need to log in
the tree mod log that we replaced a pointer in the parent node, otherwise
a tree mod log user doing a search on the b+tree can return incorrect
results (that miss something). We are doing the call to
btrfs_tree_mod_log_insert_key() but we totally ignore its return value.
So fix this by adding the missing error handling, resulting in a
transaction abort and freeing the COWed extent buffer.
Fixes: f230475e62 ("Btrfs: put all block modifications into the tree mod log")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e27180994 upstream.
The reclaim process can temporarily fail. For example, if the space is
getting tight, it fails to make the block group read-only. If there are no
further writes on that block group, the block group will never get back to
the reclaim list, and the BG never gets reclaimed. In a certain workload,
we can leave many such block groups never reclaimed.
So, let's get it back to the list and give it a chance to be reclaimed.
Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a1b0e729d upstream.
The block group tree was not present among the lockdep classes. We could
get potentially lockdep warnings but so far none has been seen, also
because block-group-tree is a relatively new feature.
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93463ff7b5 upstream.
When a filesystem is read-only, we cannot reclaim a block group as it
cannot rewrite the data. Just bail out in that case.
Note that it can drop block groups in this case. As we did
sb_start_write(), read-only filesystem means we got a fatal error and
forced read-only. There is no chance to reclaim them again.
Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ed01616ba upstream.
The reclaiming process only starts after the filesystem volumes are
allocated to a certain level (75% by default). Thus, the list of
reclaiming target block groups can build up so huge at the time the
reclaim process kicks in. On a test run, there were over 1000 BGs in the
reclaim list.
As the reclaim involves rewriting the data, it takes really long time to
reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
won't proceed because the reclaim side is holding
fs_info->reclaim_bgs_lock. As a result, we will have a large number of
unused BGs kept in the unused list. On my test run, I got 1057 unused BGs.
Since deleting a block group is relatively easy and fast work, we can call
btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
unused BGs.
Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95c8e349d8 upstream.
The way that tree mod log tracks the ultimate length of the eb, the
variable 'n', eventually turns up the correct value, but at intermediate
steps during the rewind, n can be inaccurate as a representation of the
end of the eb. For example, it doesn't get updated on move rewinds, and
it does get updated for add/remove in the middle of the eb.
To detect cases with invalid moves, introduce a separate variable called
max_slot which tries to track the maximum valid slot in the rewind eb.
We can then warn if we do a move whose src range goes beyond the max
valid slot.
There is a commented caveat that it is possible to have this value be an
overestimate due to the challenge of properly handling 'add' operations
in the middle of the eb, but in practice it doesn't cause enough of a
problem to throw out the max idea in favor of tracking every valid slot.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cead5422a upstream.
There is a fairly unlikely race condition in tree mod log rewind that
can result in a kernel panic which has the following trace:
[530.569] BTRFS critical (device sda3): unable to find logical 0 length 4096
[530.585] BTRFS critical (device sda3): unable to find logical 0 length 4096
[530.602] BUG: kernel NULL pointer dereference, address: 0000000000000002
[530.618] #PF: supervisor read access in kernel mode
[530.629] #PF: error_code(0x0000) - not-present page
[530.641] PGD 0 P4D 0
[530.647] Oops: 0000 [#1] SMP
[530.654] CPU: 30 PID: 398973 Comm: below Kdump: loaded Tainted: G S O K 5.12.0-0_fbk13_clang_7455_gb24de3bdb045 #1
[530.680] Hardware name: Quanta Mono Lake-M.2 SATA 1HY9U9Z001G/Mono Lake-M.2 SATA, BIOS F20_3A15 08/16/2017
[530.703] RIP: 0010:__btrfs_map_block+0xaa/0xd00
[530.755] RSP: 0018:ffffc9002c2f7600 EFLAGS: 00010246
[530.767] RAX: ffffffffffffffea RBX: ffff888292e41000 RCX: f2702d8b8be15100
[530.784] RDX: ffff88885fda6fb8 RSI: ffff88885fd973c8 RDI: ffff88885fd973c8
[530.800] RBP: ffff888292e410d0 R08: ffffffff82fd7fd0 R09: 00000000fffeffff
[530.816] R10: ffffffff82e57fd0 R11: ffffffff82e57d70 R12: 0000000000000000
[530.832] R13: 0000000000001000 R14: 0000000000001000 R15: ffffc9002c2f76f0
[530.848] FS: 00007f38d64af000(0000) GS:ffff88885fd80000(0000) knlGS:0000000000000000
[530.866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[530.880] CR2: 0000000000000002 CR3: 00000002b6770004 CR4: 00000000003706e0
[530.896] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[530.912] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[530.928] Call Trace:
[530.934] ? btrfs_printk+0x13b/0x18c
[530.943] ? btrfs_bio_counter_inc_blocked+0x3d/0x130
[530.955] btrfs_map_bio+0x75/0x330
[530.963] ? kmem_cache_alloc+0x12a/0x2d0
[530.973] ? btrfs_submit_metadata_bio+0x63/0x100
[530.984] btrfs_submit_metadata_bio+0xa4/0x100
[530.995] submit_extent_page+0x30f/0x360
[531.004] read_extent_buffer_pages+0x49e/0x6d0
[531.015] ? submit_extent_page+0x360/0x360
[531.025] btree_read_extent_buffer_pages+0x5f/0x150
[531.037] read_tree_block+0x37/0x60
[531.046] read_block_for_search+0x18b/0x410
[531.056] btrfs_search_old_slot+0x198/0x2f0
[531.066] resolve_indirect_ref+0xfe/0x6f0
[531.076] ? ulist_alloc+0x31/0x60
[531.084] ? kmem_cache_alloc_trace+0x12e/0x2b0
[531.095] find_parent_nodes+0x720/0x1830
[531.105] ? ulist_alloc+0x10/0x60
[531.113] iterate_extent_inodes+0xea/0x370
[531.123] ? btrfs_previous_extent_item+0x8f/0x110
[531.134] ? btrfs_search_path_in_tree+0x240/0x240
[531.146] iterate_inodes_from_logical+0x98/0xd0
[531.157] ? btrfs_search_path_in_tree+0x240/0x240
[531.168] btrfs_ioctl_logical_to_ino+0xd9/0x180
[531.179] btrfs_ioctl+0xe2/0x2eb0
This occurs when logical inode resolution takes a tree mod log sequence
number, and then while backref walking hits a rewind on a busy node
which has the following sequence of tree mod log operations (numbers
filled in from a specific example, but they are somewhat arbitrary)
REMOVE_WHILE_FREEING slot 532
REMOVE_WHILE_FREEING slot 531
REMOVE_WHILE_FREEING slot 530
...
REMOVE_WHILE_FREEING slot 0
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
ADD slot 455
ADD slot 454
ADD slot 453
...
ADD slot 0
MOVE src slot 0 -> dst slot 456 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
When this sequence gets applied via btrfs_tree_mod_log_rewind, it
allocates a fresh rewind eb, and first inserts the correct key info for
the 533 elements, then overwrites the first 456 of them, then decrements
the count by 456 via the add ops, then rewinds the move by doing a
memmove from 456:988->0:532. We have never written anything past 532, so
that memmove writes garbage into the 0:532 range. In practice, this
results in a lot of fully 0 keys. The rewind then puts valid keys into
slots 0:455 with the last removes, but 456:532 are still invalid.
When search_old_slot uses this eb, if it uses one of those invalid
slots, it can then read the extent buffer and issue a bio for offset 0
which ultimately panics looking up extent mappings.
This bad tree mod log sequence gets generated when the node balancing
code happens to do a balance_node_right followed by a push_node_left
while logging in the tree mod log. Illustrated for ebs L and R (left and
right):
L R
start:
[XXX|YYY|...] [ZZZ|...|...]
balance_node_right:
[XXX|YYY|...] [...|ZZZ|...] move Z to make room for Y
[XXX|...|...] [YYY|ZZZ|...] copy Y from L to R
push_node_left:
[XXX|YYY|...] [...|ZZZ|...] copy Y from R to L
[XXX|YYY|...] [ZZZ|...|...] move Z into emptied space (NOT LOGGED!)
This is because balance_node_right logs a move, but push_node_left
explicitly doesn't. That is because logging the move would remove the
overwritten src < dst range in the right eb, which was already logged
when we called btrfs_tree_mod_log_eb_copy. The correct sequence would
include a move from 456:988 to 0:532 after remove 0:455 and before
removing 0:532. Reversing that sequence would entail creating keys for
0:532, then moving those keys out to 456:988, then creating more keys
for 0:455.
i.e.,
REMOVE_WHILE_FREEING slot 532
REMOVE_WHILE_FREEING slot 531
REMOVE_WHILE_FREEING slot 530
...
REMOVE_WHILE_FREEING slot 0
MOVE src slot 456 -> dst slot 0 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
ADD slot 455
ADD slot 454
ADD slot 453
...
ADD slot 0
MOVE src slot 0 -> dst slot 456 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
Fix this to log the move but avoid the double remove by putting all the
logging logic in btrfs_tree_mod_log_eb_copy which has enough information
to detect these cases and properly log moves, removes, and adds. Leave
btrfs_tree_mod_log_insert_move to handle insert_ptr and delete_ptr's
tree mod logging.
(Un)fortunately, this is quite difficult to reproduce, and I was only
able to reproduce it by adding sleeps in btrfs_search_old_slot that
would encourage more log rewinding during ino_to_logical ioctls. I was
able to hit the warning in the previous patch in the series without the
fix quite quickly, but not after this patch.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f18cc97845 upstream.
dirty_metadata_bytes is decremented in both places that clear the dirty
bit in a buffer, but only incremented in btrfs_mark_buffer_dirty, which
means that a buffer that is redirtied using btrfs_redirty_list_add won't
be added to dirty_metadata_bytes, but it will be subtracted when written
out, leading an inconsistency in the counter.
Move the dirty_metadata_bytes from btrfs_mark_buffer_dirty into
set_extent_buffer_dirty to also account for the redirty case, and remove
the now unused set_extent_buffer_dirty return value.
Fixes: d3575156f6 ("btrfs: zoned: redirty released extent buffers")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 160fe8f6fd upstream.
Callers of `btrfs_reduce_alloc_profile` expect it to return exactly
one allocation profile flag, and failing to do so may ultimately
result in a WARN_ON and remount-ro when allocating new blocks, like
the below transaction abort on 6.1.
`btrfs_reduce_alloc_profile` has two ways of determining the profile,
first it checks if a conversion balance is currently running and
uses the profile we're converting to. If no balance is currently
running, it returns the max-redundancy profile which at least one
block in the selected block group has.
This works by simply checking each known allocation profile bit in
redundancy order. However, `btrfs_reduce_alloc_profile` has not been
updated as new flags have been added - first with the `DUP` profile
and later with the RAID1C34 profiles.
Because of the way it checks, if we have blocks with different
profiles and at least one is known, that profile will be selected.
However, if none are known we may return a flag set with multiple
allocation profiles set.
This is currently only possible when a balance from one of the three
unhandled profiles to another of the unhandled profiles is canceled
after allocating at least one block using the new profile.
In that case, a transaction abort like the below will occur and the
filesystem will need to be mounted with -o skip_balance to get it
mounted rw again (but the balance cannot be resumed without a
similar abort).
[770.648] ------------[ cut here ]------------
[770.648] BTRFS: Transaction aborted (error -22)
[770.648] WARNING: CPU: 43 PID: 1159593 at fs/btrfs/extent-tree.c:4122 find_free_extent+0x1d94/0x1e00 [btrfs]
[770.648] CPU: 43 PID: 1159593 Comm: btrfs Tainted: G W 6.1.0-0.deb11.7-powerpc64le #1 Debian 6.1.20-2~bpo11+1a~test
[770.648] Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
[770.648] NIP: c00800000f6784fc LR: c00800000f6784f8 CTR: c000000000d746c0
[770.648] REGS: c000200089afe9a0 TRAP: 0700 Tainted: G W (6.1.0-0.deb11.7-powerpc64le Debian 6.1.20-2~bpo11+1a~test)
[770.648] MSR: 9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28848282 XER: 20040000
[770.648] CFAR: c000000000135110 IRQMASK: 0
GPR00: c00800000f6784f8 c000200089afec40 c00800000f7ea800 0000000000000026
GPR04: 00000001004820c2 c000200089afea00 c000200089afe9f8 0000000000000027
GPR08: c000200ffbfe7f98 c000000002127f90 ffffffffffffffd8 0000000026d6a6e8
GPR12: 0000000028848282 c000200fff7f3800 5deadbeef0000122 c00000002269d000
GPR16: c0002008c7797c40 c000200089afef17 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000001 c000200008bc5a98 0000000000000001
GPR24: 0000000000000000 c0000003c73088d0 c000200089afef17 c000000016d3a800
GPR28: c0000003c7308800 c00000002269d000 ffffffffffffffea 0000000000000001
[770.648] NIP [c00800000f6784fc] find_free_extent+0x1d94/0x1e00 [btrfs]
[770.648] LR [c00800000f6784f8] find_free_extent+0x1d90/0x1e00 [btrfs]
[770.648] Call Trace:
[770.648] [c000200089afec40] [c00800000f6784f8] find_free_extent+0x1d90/0x1e00 [btrfs] (unreliable)
[770.648] [c000200089afed30] [c00800000f681398] btrfs_reserve_extent+0x1a0/0x2f0 [btrfs]
[770.648] [c000200089afeea0] [c00800000f681bf0] btrfs_alloc_tree_block+0x108/0x670 [btrfs]
[770.648] [c000200089afeff0] [c00800000f66bd68] __btrfs_cow_block+0x170/0x850 [btrfs]
[770.648] [c000200089aff100] [c00800000f66c58c] btrfs_cow_block+0x144/0x288 [btrfs]
[770.648] [c000200089aff1b0] [c00800000f67113c] btrfs_search_slot+0x6b4/0xcb0 [btrfs]
[770.648] [c000200089aff2a0] [c00800000f679f60] lookup_inline_extent_backref+0x128/0x7c0 [btrfs]
[770.648] [c000200089aff3b0] [c00800000f67b338] lookup_extent_backref+0x70/0x190 [btrfs]
[770.648] [c000200089aff470] [c00800000f67b54c] __btrfs_free_extent+0xf4/0x1490 [btrfs]
[770.648] [c000200089aff5a0] [c00800000f67d770] __btrfs_run_delayed_refs+0x328/0x1530 [btrfs]
[770.648] [c000200089aff740] [c00800000f67ea2c] btrfs_run_delayed_refs+0xb4/0x3e0 [btrfs]
[770.648] [c000200089aff800] [c00800000f699aa4] btrfs_commit_transaction+0x8c/0x12b0 [btrfs]
[770.648] [c000200089aff8f0] [c00800000f6dc628] reset_balance_state+0x1c0/0x290 [btrfs]
[770.648] [c000200089aff9a0] [c00800000f6e2f7c] btrfs_balance+0x1164/0x1500 [btrfs]
[770.648] [c000200089affb40] [c00800000f6f8e4c] btrfs_ioctl+0x2b54/0x3100 [btrfs]
[770.648] [c000200089affc80] [c00000000053be14] sys_ioctl+0x794/0x1310
[770.648] [c000200089affd70] [c00000000002af98] system_call_exception+0x138/0x250
[770.648] [c000200089affe10] [c00000000000c654] system_call_common+0xf4/0x258
[770.648] --- interrupt: c00 at 0x7fff94126800
[770.648] NIP: 00007fff94126800 LR: 0000000107e0b594 CTR: 0000000000000000
[770.648] REGS: c000200089affe80 TRAP: 0c00 Tainted: G W (6.1.0-0.deb11.7-powerpc64le Debian 6.1.20-2~bpo11+1a~test)
[770.648] MSR: 900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE> CR: 24002848 XER: 00000000
[770.648] IRQMASK: 0
GPR00: 0000000000000036 00007fffc9439da0 00007fff94217100 0000000000000003
GPR04: 00000000c4009420 00007fffc9439ee8 0000000000000000 0000000000000000
GPR08: 00000000803c7416 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fff9467d120 0000000107e64c9c 0000000107e64d0a
GPR16: 0000000107e64d06 0000000107e64cf1 0000000107e64cc4 0000000107e64c73
GPR20: 0000000107e64c31 0000000107e64bf1 0000000107e64be7 0000000000000000
GPR24: 0000000000000000 00007fffc9439ee0 0000000000000003 0000000000000001
GPR28: 00007fffc943f713 0000000000000000 00007fffc9439ee8 0000000000000000
[770.648] NIP [00007fff94126800] 0x7fff94126800
[770.648] LR [0000000107e0b594] 0x107e0b594
[770.648] --- interrupt: c00
[770.648] Instruction dump:
[770.648] 3b00ffe4 e8898828 481175f5 60000000 4bfff4fc 3be00000 4bfff570 3d220000
[770.648] 7fc4f378 e8698830 4811cd95 e8410018 <0fe00000> f9c10060 f9e10068 fa010070
[770.648] ---[ end trace 0000000000000000 ]---
[770.648] BTRFS: error (device dm-2: state A) in find_free_extent_update_loop:4122: errno=-22 unknown
[770.648] BTRFS info (device dm-2: state EA): forced readonly
[770.648] BTRFS: error (device dm-2: state EA) in __btrfs_free_extent:3070: errno=-22 unknown
[770.648] BTRFS error (device dm-2: state EA): failed to run delayed ref for logical 17838685708288 num_bytes 24576 type 184 action 2 ref_mod 1: -22
[770.648] BTRFS: error (device dm-2: state EA) in btrfs_run_delayed_refs:2144: errno=-22 unknown
[770.648] BTRFS: error (device dm-2: state EA) in reset_balance_state:3599: errno=-22 unknown
Fixes: 47e6f7423b ("btrfs: add support for 3-copy replication (raid1c3)")
Fixes: 8d6fac0087 ("btrfs: add support for 4-copy replication (raid1c4)")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Matt Corallo <blnxfsl@bluematt.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7fbfd44c0 upstream.
power_supply_is_system_supplied() checks whether any power
supplies are present that aren't batteries to decide whether
the system is running on DC or AC. Downstream drivers use
this to make performance decisions.
Navi dGPUs include an UCSI function that has been exported
since commit 17631e8ca2 ("i2c: designware: Add driver
support for AMD NAVI GPU").
This UCSI function registers a power supply since commit
992a60ed0d ("usb: typec: ucsi: register with power_supply class")
but this is not a system power supply.
As the power supply for a dGPU is only for powering devices connected
to dGPU, create a device property to indicate that the UCSI endpoint
is only for the scope of `POWER_SUPPLY_SCOPE_DEVICE`.
Link: https://lore.kernel.org/lkml/20230516182541.5836-2-mario.limonciello@amd.com/
Reviewed-by: Evan Quan <evan.quan@amd.com>
Tested-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28eceeda13 upstream.
When a directory is moved to a different directory, some filesystems
(udf, ext4, ocfs2, f2fs, and likely gfs2, reiserfs, and others) need to
update their pointer to the parent and this must not race with other
operations on the directory. Lock the directories when they are moved.
Although not all filesystems need this locking, we perform it in
vfs_rename() because getting the lock ordering right is really difficult
and we don't want to expose these locking details to filesystems.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230601105830.13168-5-jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f23ce75718 upstream.
Currently the locking order of inode locks for directories that are not
in ancestor relationship is not defined because all operations that
needed to lock two directories like this were serialized by
sb->s_vfs_rename_mutex. However some filesystems need to lock two
subdirectories for RENAME_EXCHANGE operations and for this we need the
locking order established even for two tree-unrelated directories.
Provide a helper function lock_two_inodes() that establishes lock
ordering for any two inodes and use it in lock_two_directories().
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230601105830.13168-4-jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1168f09541 upstream.
Use kcalloc() for allocation/flush of 128 pointers table to
reduce stack usage.
Function now returns -ENOMEM or 0 on success.
stackusage
Before:
./fs/jffs2/xattr.c:775 jffs2_build_xattr_subsystem 1208
dynamic,bounded
After:
./fs/jffs2/xattr.c:775 jffs2_build_xattr_subsystem 192
dynamic,bounded
Also update definition when CONFIG_JFFS2_FS_XATTR is not enabled
Tested with an MTD mount point and some user set/getfattr.
Many current target on OpenWRT also suffer from a compilation warning
(that become an error with CONFIG_WERROR) with the following output:
fs/jffs2/xattr.c: In function 'jffs2_build_xattr_subsystem':
fs/jffs2/xattr.c:887:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
887 | }
| ^
Using dynamic allocation fix this compilation warning.
Fixes: c9f700f840 ("[JFFS2][XATTR] using 'delete marker' for xdatum/xref deletion")
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Ron Economos <re@w6rz.net>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Message-Id: <20230506045612.16616-1-ansuelsmth@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d8ae8c417 upstream.
We've aligned setgid behavior over multiple kernel releases. The details
can be found in commit cf619f8919 ("Merge tag 'fs.ovl.setgid.v6.2' of
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping") and
commit 426b4ca2d6 ("Merge tag 'fs.setgid.v6.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux").
Consistent setgid stripping behavior is now encapsulated in the
setattr_should_drop_sgid() helper which is used by all filesystems that
strip setgid bits outside of vfs proper. Usually ATTR_KILL_SGID is
raised in e.g., chown_common() and is subject to the
setattr_should_drop_sgid() check to determine whether the setgid bit can
be retained. Since nfsd is raising ATTR_KILL_SGID unconditionally it
will cause notify_change() to strip it even if the caller had the
necessary privileges to retain it. Ensure that nfsd only raises
ATR_KILL_SGID if the caller lacks the necessary privileges to retain the
setgid bit.
Without this patch the setgid stripping tests in LTP will fail:
> As you can see, the problem is S_ISGID (0002000) was dropped on a
> non-group-executable file while chown was invoked by super-user, while
[...]
> fchown02.c:66: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
[...]
> chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
With this patch all tests pass.
Reported-by: Sherry Yang <sherry.yang@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e910c8e3aa upstream.
Commit df8fc4e934 ("kbuild: Enable -fstrict-flex-arrays=3") introduced a warning
for the autofs_dev_ioctl structure:
In function 'check_name',
inlined from 'validate_dev_ioctl' at fs/autofs/dev-ioctl.c:131:9,
inlined from '_autofs_dev_ioctl' at fs/autofs/dev-ioctl.c:624:8:
fs/autofs/dev-ioctl.c:33:14: error: 'strchr' reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
33 | if (!strchr(name, '/'))
| ^~~~~~~~~~~~~~~~~
In file included from include/linux/auto_dev-ioctl.h:10,
from fs/autofs/autofs_i.h:10,
from fs/autofs/dev-ioctl.c:14:
include/uapi/linux/auto_dev-ioctl.h: In function '_autofs_dev_ioctl':
include/uapi/linux/auto_dev-ioctl.h:112:14: note: source object 'path' of size 0
112 | char path[0];
| ^~~~
This is easily fixed by changing the gnu 0-length array into a c99
flexible array. Since this is a uapi structure, we have to be careful
about possible regressions but this one should be fine as they are
equivalent here. While it would break building with ancient gcc versions
that predate c99, it helps building with --std=c99 and -Wpedantic builds
in user space, as well as non-gnu compilers. This means we probably
also want it fixed in stable kernels.
Cc: stable@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230523081944.581710-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9df6a4870d upstream.
When integrity_inode_get() is querying and inserting the cache, there
is a conditional race in the concurrent environment.
The race condition is the result of not properly implementing
"double-checked locking". In this case, it first checks to see if the
iint cache record exists before taking the lock, but doesn't check
again after taking the integrity_iint_lock.
Fixes: bf2276d10c ("ima: allocating iint improvements")
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 943211c874 upstream.
NULL the dangling pipe reference while clearing watch_queue.
If not done, a reference to a freed pipe remains in the watch_queue,
as this function is called before freeing a pipe in free_pipe_info()
(see line 834 of fs/pipe.c).
The sole use of wqueue->defunct is for checking if the watch queue has
been cleared, but wqueue->pipe is also NULLed while clearing.
Thus, wqueue->defunct is superfluous, as wqueue->pipe can be checked
for NULL. Hence, the former can be removed.
Tested with keyutils testsuite.
Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Acked-by: David Howells <dhowells@redhat.com>
Message-Id: <20230605143616.640517-1-code@siddh.me>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0854489fc upstream.
We get a kernel crash about "list_add corruption. next->prev should be
prev (ffff9c801bc01210), but was ffff9c77b688237c.
(next=ffffae586d8afe68)."
crash> struct list_head 0xffff9c801bc01210
struct list_head {
next = 0xffffae586d8afe68,
prev = 0xffffae586d8afe68
}
crash> struct list_head 0xffff9c77b688237c
struct list_head {
next = 0x0,
prev = 0x0
}
crash> struct list_head 0xffffae586d8afe68
struct list_head struct: invalid kernel virtual address: ffffae586d8afe68 type: "gdb_readmem_callback"
Cannot access memory at address 0xffffae586d8afe68
[230469.019492] Call Trace:
[230469.032041] prepare_to_wait+0x8a/0xb0
[230469.044363] ? bch_btree_keys_free+0x6c/0xc0 [escache]
[230469.056533] mca_cannibalize_lock+0x72/0x90 [escache]
[230469.068788] mca_alloc+0x2ae/0x450 [escache]
[230469.080790] bch_btree_node_get+0x136/0x2d0 [escache]
[230469.092681] bch_btree_check_thread+0x1e1/0x260 [escache]
[230469.104382] ? finish_wait+0x80/0x80
[230469.115884] ? bch_btree_check_recurse+0x1a0/0x1a0 [escache]
[230469.127259] kthread+0x112/0x130
[230469.138448] ? kthread_flush_work_fn+0x10/0x10
[230469.149477] ret_from_fork+0x35/0x40
bch_btree_check_thread() and bch_dirty_init_thread() may call
mca_cannibalize() to cannibalize other cached btree nodes. Only one thread
can do it at a time, so the op of other threads will be added to the
btree_cache_wait list.
We must call finish_wait() to remove op from btree_cache_wait before free
it's memory address. Otherwise, the list will be damaged. Also should call
bch_cannibalize_unlock() to release the btree_cache_alloc_lock and wake_up
other waiters.
Fixes: 8e7102273f ("bcache: make bch_btree_check() to be multithreaded")
Fixes: b144e45fc5 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Cc: stable@vger.kernel.org
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20230615121223.22502-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 525c469e5d upstream.
For some cases as below, we may encounter the unpreditable chip stats
in driver probe()
* The system reboot flow do not work properly, such as kernel oops while
rebooting, and then the driver do not go back to default status at
this moment.
* Similar to the flow above. If the device was enabled in BIOS or UEFI,
the system may switch to Linux without driver fully shutdown.
To avoid the problem, force push the device back to default in probe()
* mt7921e_mcu_fw_pmctrl() : return control privilege to chip side.
* mt7921_wfsys_reset() : cleanup chip config before resource init.
Error log
[59007.600714] mt7921e 0000:02:00.0: ASIC revision: 79220010
[59010.889773] mt7921e 0000:02:00.0: Message 00000010 (seq 1) timeout
[59010.889786] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59014.217839] mt7921e 0000:02:00.0: Message 00000010 (seq 2) timeout
[59014.217852] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59017.545880] mt7921e 0000:02:00.0: Message 00000010 (seq 3) timeout
[59017.545893] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59020.874086] mt7921e 0000:02:00.0: Message 00000010 (seq 4) timeout
[59020.874099] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59024.202019] mt7921e 0000:02:00.0: Message 00000010 (seq 5) timeout
[59024.202033] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59027.530082] mt7921e 0000:02:00.0: Message 00000010 (seq 6) timeout
[59027.530096] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59030.857888] mt7921e 0000:02:00.0: Message 00000010 (seq 7) timeout
[59030.857904] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59034.185946] mt7921e 0000:02:00.0: Message 00000010 (seq 8) timeout
[59034.185961] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59037.514249] mt7921e 0000:02:00.0: Message 00000010 (seq 9) timeout
[59037.514262] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59040.842362] mt7921e 0000:02:00.0: Message 00000010 (seq 10) timeout
[59040.842375] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59040.923845] mt7921e 0000:02:00.0: hardware init failed
Cc: stable@vger.kernel.org
Fixes: 5c14a5f944 ("mt76: mt7921: introduce mt7921e support")
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Juan Martinez <juan.martinez@amd.com>
Co-developed-by: Leon Yen <leon.yen@mediatek.com>
Signed-off-by: Leon Yen <leon.yen@mediatek.com>
Signed-off-by: Quan Zhou <quan.zhou@mediatek.com>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Message-ID: <39fcb7cee08d4ab940d38d82f21897483212483f.1688569385.git.deren.wu@mediatek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20dbd07ef0 upstream.
Bayhub SD host has hardware limitation:
1.The upper 32bit address is inhibited to be written at SD Host Register
[03E][13]=0 (32bits addressing) mode, is admitted to be written only at
SD Host Register [03E][13]=1 (64bits addressing) mode.
2.Because of above item#1, need to configure SD Host Register [03E][13] to
1(64bits addressing mode) before set 64bit ADMA system address's higher
32bits SD Host Register [05F~05C] if 64 bits addressing mode is used.
The hardware limitation is reasonable for below reasons:
1.Normal flow should set DMA working mode first, then do
DMA-transfer-related configuration, such as system address.
2.The hardware limitation may avoid the software to configure wrong higher
32bit address at 32bits addressing mode although it is redundant.
The change that set 32bits/64bits addressing mode before set ADMA address,
has no side-effect to other host IPs for below reason:
The setting order is reasonable and standard: DMA Mode setting first and
then DMA address setting. It meets all DMA setting sequence.
Signed-off-by: Chevron Li <chevron.li@bayhubtech.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230523111114.18124-1-chevron_li@126.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbfbddcddc upstream.
It seems that Micron MTFC4GACAJCN-1M despite advertising TRIM support does
not work when the core is trying to use REQ_OP_WRITE_ZEROES.
We are seeing the following errors in OpenWrt under 6.1 on Qnap Qhora 301W
that we did not previously have and tracked it down to REQ_OP_WRITE_ZEROES:
[ 18.085950] I/O error, dev loop0, sector 596 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Disabling TRIM makes the error go away, so lets add a quirk for this eMMC
to disable TRIM.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230530213259.1776512-1-robimarko@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1738a1f81 upstream.
It seems that Kingston EMMC04G-M627 despite advertising TRIM support does
not work when the core is trying to use REQ_OP_WRITE_ZEROES.
We are seeing I/O errors in OpenWrt under 6.1 on Zyxel NBG7815 that we did
not previously have and tracked it down to REQ_OP_WRITE_ZEROES.
Trying to use fstrim seems to also throw errors like:
[93010.835112] I/O error, dev loop0, sector 16902 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 2
Disabling TRIM makes the error go away, so lets add a quirk for this eMMC
to disable TRIM.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230619193621.437358-1-robimarko@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4826c59453 upstream.
WHen the ring exits, cleanup is done and the final cancelation and
waiting on completions is done by io_ring_exit_work. That function is
invoked by kworker, which doesn't take any signals. Because of that, it
doesn't really matter if we wait for completions in TASK_INTERRUPTIBLE
or TASK_UNINTERRUPTIBLE state. However, it does matter to the hung task
detection checker!
Normally we expect cancelations and completions to happen rather
quickly. Some test cases, however, will exit the ring and park the
owning task stopped (eg via SIGSTOP). If the owning task needs to run
task_work to complete requests, then io_ring_exit_work won't make any
progress until the task is runnable again. Hence io_ring_exit_work can
trigger the hung task detection, which is particularly problematic if
panic-on-hung-task is enabled.
As the ring exit doesn't take signals to begin with, have it wait
interruptibly rather than uninterruptibly. io_uring has a separate
stuck-exit warning that triggers independently anyway, so we're not
really missing anything by making this switch.
Cc: stable@vger.kernel.org # 5.10+
Link: https://lore.kernel.org/r/b0e4aaef-7088-56ce-244c-976edeac0e66@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f679616565 upstream.
In an ACPI-based dual-bridge system, IRQ of each bridge's
PCH PIC sent to CPU is always a zero-based number, which
means that the IRQ on PCH PIC of each bridge is mapped into
vector range from 0 to 63 of upstream irqchip(e.g. EIOINTC).
EIOINTC N: [0 ... 63 | 64 ... 255]
-------- ----------
^ ^
| |
PCH PIC N |
PCH MSI N
For example, the IRQ vector number of sata controller on
PCH PIC of each bridge is 16, which is sent to upstream
irqchip of EIOINTC when an interrupt occurs, which will set
bit 16 of EIOINTC. Since hwirq of 16 on EIOINTC has been
mapped to a irq_desc for sata controller during hierarchy
irq allocation, the related mapped IRQ will be found through
irq_resolve_mapping() in the IRQ domain of EIOINTC.
So, the IRQ number set in HT vector register should be fixed
to be a zero-based number.
Cc: stable@vger.kernel.org
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Co-developed-by: liuyun <liuyun@loongson.cn>
Signed-off-by: liuyun <liuyun@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230614115936.5950-2-lvjianmin@loongson.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 783422e704 upstream.
In DeviceTree path, when ht_vec_base is not zero, the hwirq of PCH PIC
will be assigned incorrectly. Because when pch_pic_domain_translate()
adds the ht_vec_base to hwirq, the hwirq does not have the ht_vec_base
subtracted when calling irq_domain_set_info().
The ht_vec_base is designed for the parent irq chip/domain of the PCH PIC.
It seems not proper to deal this in callbacks of the PCH PIC domain and
let's put this back like the initial commit ef8c01eb64 ("irqchip: Add
Loongson PCH PIC controller").
Fixes: bcdd75c596 ("irqchip/loongson-pch-pic: Add ACPI init support")
Cc: stable@vger.kernel.org
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Liu Peibao <liupeibao@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230614115936.5950-3-lvjianmin@loongson.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed9ab7346e upstream.
Commit f5f9d4a314 ("nfsd: move reply cache initialization into nfsd
startup") moved the initialization of the reply cache into nfsd startup,
but didn't account for the stats counters, which can be accessed before
nfsd is ever started. The result can be a NULL pointer dereference when
someone accesses /proc/fs/nfsd/reply_cache_stats while nfsd is still
shut down.
This is a regression and a user-triggerable oops in the right situation:
- non-x86_64 arch
- /proc/fs/nfsd is mounted in the namespace
- nfsd is not started in the namespace
- unprivileged user calls "cat /proc/fs/nfsd/reply_cache_stats"
Although this is easy to trigger on some arches (like aarch64), on
x86_64, calling this_cpu_ptr(NULL) evidently returns a pointer to the
fixed_percpu_data. That struct looks just enough like a newly
initialized percpu var to allow nfsd_reply_cache_stats_show to access
it without Oopsing.
Move the initialization of the per-net+per-cpu reply-cache counters
back into nfsd_init_net, while leaving the rest of the reply cache
allocations to be done at nfsd startup time.
Kudos to Eirik who did most of the legwork to track this down.
Cc: stable@vger.kernel.org # v6.3+
Fixes: f5f9d4a314 ("nfsd: move reply cache initialization into nfsd startup")
Reported-and-tested-by: Eirik Fuller <efuller@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2215429
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d7471b4e0 upstream.
For the INT_POLARITY register of Loongson-2K series IRQ
controller, '0' indicates high level or rising edge triggered,
'1' indicates low level or falling edge triggered, and we
can find out the information from the Loongson 2K1000LA User
Manual v1.0, Table 9-2, Section 9.3 (中断寄存器描述 / Description
of the Interrupt Registers).
For Loongson-3 CPU series, setting INT_POLARITY register is not
supported and writting it has no effect.
So trigger polarity setting shouled be fixed for Loongson-2K CPU
series.
Fixes: 17343d0b40 ("irqchip/loongson-liointc: Support to set IRQ type for ACPI path")
Cc: stable@vger.kernel.org
Reviewed-by: Huacai Chen <chenhuacai@kernel.org>
Co-developed-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230614115936.5950-4-lvjianmin@loongson.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 616cb2f4b1 upstream.
Currently when restoring the TPIDR2 signal context we set the new value
from the signal frame in the thread data structure but not the register,
following the pattern for the rest of the data we are restoring. This does
not work in the case of TPIDR2, the register always has the value for the
current task. This means that either we return to userspace and ignore the
new value or we context switch and save the register value on top of the
newly restored value.
Load the value from the signal context into the register instead.
Fixes: 39e5449928 ("arm64/signal: Include TPIDR2 in the signal context")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 6.3.x
Link: https://lore.kernel.org/r/20230621-arm64-fix-tpidr2-signal-restore-v2-1-c8e8fcc10302@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8344a3d44b ]
nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1. Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
end up writing 512x as many pages as it asked for.
Dave added:
: XFS is the only filesystem this would affect, right? AFAIA, nothing
: else enables large folios and uses writeback through
: write_cache_pages() at this point...
:
: In which case, I'd be surprised if much difference, if any, gets
: noticed by anyone.
Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org
Fixes: 793917d997 ("mm/readahead: Add large folio readahead")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb6e45c9a0 ]
In xiic_process, it is possible that error events such as arbitration
lost or TX error can be raised in conjunction with other interrupt flags
such as TX FIFO empty or bus not busy. Error events result in the
controller being reset and the error returned to the calling request,
but the function could potentially try to keep handling the other
events, such as by writing more messages into the TX FIFO. Since the
transaction has already failed, this is not helpful and will just cause
issues.
This problem has been present ever since:
commit 7f9906bd7f ("i2c: xiic: Service all interrupts in isr")
which allowed non-error events to be handled after errors, but became
more obvious after:
commit 743e227a89 ("i2c: xiic: Defer xiic_wakeup() and
__xiic_start_xfer() in xiic_process()")
which reworked the code to add a WARN_ON which triggers if both the
xfer_more and wakeup_req flags were set, since this combination is
not supposed to happen, but was occurring in this scenario.
Skip further interrupt handling after error flags are detected to avoid
this problem.
Fixes: 7f9906bd7f ("i2c: xiic: Service all interrupts in isr")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f442d42c0 ]
The transition table size was not being set by compat mappings
resulting in the profile verification code not being run. Unfortunately
the checks were also buggy not being correctly updated from the old
accept perms, to the new layout.
Also indicate to userspace that the kernel has the permstable verification
fixes.
BugLink: http://bugs.launchpad.net/bugs/2017903
Fixes: 670f31774a ("apparmor: verify permission table indexes")
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Jon Tourville <jontourville@me.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0bac2002b3 ]
If the extended permission table is present we should not be attempting
to do a compat_permission remap as the compat_permissions are not
stored in the dfa accept states.
Fixes: fd1b2b95a2 ("apparmor: add the ability for policy to specify a permission table")
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Jon Tourville <jontourville@me.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6600e9f692 ]
Add check for failure to allocate the permission table.
Fixes: caa9f579ca ("apparmor: isolate policy backwards compatibility to its own file")
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 000518bc5a ]
rhashtable_insert_fast() could return err value when memory allocation is
failed. but unpack_profile() do not check values and this always returns
success value. This patch just adds error check code.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e025be0f26 ("apparmor: support querying extended trusted helper extra data")
Signed-off-by: Danila Chernetsov <listdansp@mail.ru>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e82e475848 ]
Various SoCs of the SH3, SH4 and SH4A family, which use this driver,
feature a differing number of DMA channels, which can be distributed
between up to two DMAC modules. The existing implementation fails to
correctly accommodate for all those variations, resulting in wrong
channel offset calculations and leading to kernel panics.
Rewrite dma_base_addr() in order to properly calculate channel offsets
in a DMAC module. Fix dmaor_read_reg() and dmaor_write_reg(), so that
the correct DMAC module base is selected for the DMAOR register.
Fixes: 7f47c7189b ("sh: dma: More legacy cpu dma chainsawing.")
Signed-off-by: Artur Rojek <contact@artur-rojek.eu>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20230527164452.64797-2-contact@artur-rojek.eu
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 80de809bd3 ]
Change boolean parameter of function "qeth_l3_vipa_store" inside the
"qeth_l3_dev_vipa_del4_store" function from "true" to "false" because
"true" is used for adding a virtual ip address and "false" for deleting.
Fixes: 2390166a6b ("s390/qeth: clean up L3 sysfs code")
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03275585ca ]
When an AFS FS.StoreData RPC call is made, amongst other things it is
given the resultant file size to be. On the server, this is processed
by truncating the file to new size and then writing the data.
Now, kafs has a lock (vnode->io_lock) that serves to serialise
operations against a specific vnode (ie. inode), but the parameters for
the op are set before the lock is taken. This allows two writebacks
(say sync and kswapd) to race - and if writes are ongoing the writeback
for a later write could occur before the writeback for an earlier one if
the latter gets interrupted.
Note that afs_writepages() cannot take i_mutex and only takes a shared
lock on vnode->validate_lock.
Also note that the server does the truncation and the write inside a
lock, so there's no problem at that end.
Fix this by moving the calculation for the proposed new i_size inside
the vnode->io_lock. Also reset the iterator (which we might have read
from) and update the mtime setting there.
Fixes: bd80d8a80e ("afs: Use ITER_XARRAY for writing")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/3526895.1687960024@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 14bb236b29 ]
MAC block on CN10K (RPM) supports hardware timestamp configuration. The
previous patch which added timestamp configuration support has a bug.
Though the netdev driver requests to disable timestamp configuration,
the driver is always enabling it.
This patch fixes the same.
Fixes: d148920868 ("octeontx2-af: cn10k: RPM hardware timestamp configuration")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a372d66af4 ]
incl_srcpt has the limitation, mentioned in commit b4638af888 ("net:
dsa: sja1105: always enable the INCL_SRCPT option"), that frames with a
MAC DA of 01:80:c2:xx:yy:zz will be received as 01:80:c2:00:00:zz unless
PTP RX timestamping is enabled.
The incl_srcpt option was initially unconditionally enabled, then that
changed with commit 42824463d3 ("net: dsa: sja1105: Limit use of
incl_srcpt to bridge+vlan mode"), then again with b4638af888 ("net:
dsa: sja1105: always enable the INCL_SRCPT option"). Bottom line is that
it now needs to be always enabled, otherwise the driver does not have a
reliable source of information regarding source_port and switch_id for
link-local traffic (tag_8021q VLANs may be imprecise since now they
identify an entire bridging domain when ports are not standalone).
If we accept that PTP RX timestamping (and therefore, meta frame
generation) is always enabled in hardware, then that limitation could be
avoided and packets with any MAC DA can be properly received, because
meta frames do contain the original bytes from the MAC DA of their
associated link-local packet.
This change enables meta frame generation unconditionally, which also
has the nice side effects of simplifying the switch control path
(a switch reset is no longer required on hwtstamping settings change)
and the tagger data path (it no longer needs to be informed whether to
expect meta frames or not - it always does).
Fixes: 227d07a07e ("net: dsa: sja1105: Add support for traffic through standalone ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1dcf6efd5f ]
The SJA1105 manual says that at offset 4 into the meta frame payload we
have "MAC destination byte 2" and at offset 5 we have "MAC destination
byte 1". These are counted from the LSB, so byte 1 is h_dest[ETH_HLEN-2]
aka h_dest[4] and byte 2 is h_dest[ETH_HLEN-3] aka h_dest[3].
The sja1105_meta_unpack() function decodes these the other way around,
so a frame with MAC DA 01:80:c2:11:22:33 is received by the network
stack as having 01:80:c2:22:11:33.
Fixes: e53e18a6fe ("net: dsa: sja1105: Receive and decode meta frames")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84bef5b603 ]
PPTP uses pppox sockets (struct pppox_sock). These sockets don't embed
an inet_sock structure, so it's invalid to call inet_sk() on them.
Therefore, the ip_route_output_ports() call in pptp_connect() has two
problems:
* The tos variable is set with RT_CONN_FLAGS(sk), which calls
inet_sk() on the pppox socket.
* ip_route_output_ports() tries to retrieve routing flags using
inet_sk_flowi_flags(), which is also going to call inet_sk() on the
pppox socket.
While PPTP doesn't use inet sockets, it's actually really layered on
top of IP and therefore needs a proper way to do fib lookups. So let's
define pptp_route_output() to get a struct rtable from a pptp socket.
Let's also replace the ip_route_output_ports() call of pptp_xmit() for
consistency.
In practice, this means that:
* pptp_connect() sets ->flowi4_tos and ->flowi4_flags to zero instead
of using bits of unrelated struct pppox_sock fields.
* pptp_xmit() now respects ->sk_mark and ->sk_uid.
* pptp_xmit() now calls the security_sk_classify_flow() security
hook, thus allowing to set ->flowic_secid.
* pptp_xmit() now passes the pppox socket to xfrm_lookup_route().
Found by code inspection.
Fixes: 00959ade36 ("PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30c45b5361 ]
The attribute TCA_PEDIT_PARMS_EX is not be included in pedit_policy and
one malicious user could fake a TCA_PEDIT_PARMS_EX whose length is
smaller than the intended sizeof(struct tc_pedit). Hence, the
dereference in tcf_pedit_init() could access dirty heap data.
static int tcf_pedit_init(...)
{
// ...
pattr = tb[TCA_PEDIT_PARMS]; // TCA_PEDIT_PARMS is included
if (!pattr)
pattr = tb[TCA_PEDIT_PARMS_EX]; // but this is not
// ...
parm = nla_data(pattr);
index = parm->index; // parm is able to be smaller than 4 bytes
// and this dereference gets dirty skb_buff
// data created in netlink_sendmsg
}
This commit adds TCA_PEDIT_PARMS_EX length in pedit_policy which avoid
the above case, just like the TCA_PEDIT_PARMS.
Fixes: 71d0ed7079 ("net/act_pedit: Support using offset relative to the conventional network headers")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230703110842.590282-1-linma@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7306acec9 ]
Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. A
privileged process might create the socket and pass it to a non-privileged
process for later use. However, that process will be able to bind the socket
to any network interface. Even though it will not be able to receive any
traffic without modification of the BPF map, the situation is not ideal.
Sockets already have a mechanism that can be used to restrict what interface
they can be attached to. That is SO_BINDTODEVICE.
To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW.
Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer workflow
when non-privileged process is using AF_XDP.
The intended workflow is following:
1. First process creates a bare socket with socket(AF_XDP, ...).
2. First process loads the XSK program to the interface.
3. First process adds the socket fd to a BPF map.
4. First process ties socket fd to a particular interface using
SO_BINDTODEVICE.
5. First process sends socket fd to a second process.
6. Second process allocates UMEM.
7. Second process binds socket to the interface with bind(...).
8. Second process sends/receives the traffic.
All the steps above are possible today if the first process is privileged
and the second one has sufficient RLIMIT_MEMLOCK and no capabilities.
However, the second process will be able to bind the socket to any interface
it wants on step 7 and send traffic from it. With the proposed change, the
second process will be able to bind the socket only to a specific interface
chosen by the first process at step 4.
Fixes: 965a990984 ("xsk: add support for bind for Rx")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/bpf/20230703175329.3259672-1-i.maximets@ovn.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 998127cdb4 ]
request sockets are lockless, __tcp_oow_rate_limited() could be called
on the same object from different cpus. This is harmless.
Add READ_ONCE()/WRITE_ONCE() annotations to avoid a KCSAN report.
Fixes: 4ce7e93cb3 ("tcp: rate limit ACK sent by SYN_RECV request sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a398b9ea0c ]
There was a regression introduced by the blamed commit, where pinging to
a VLAN-unaware bridge would fail with the repeated message "Couldn't
decode source port" coming from the tagging protocol driver.
When receiving packets with a bridge_vid as determined by
dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode:
- source_port = 0 (which isn't really valid, more like "don't know")
- switch_id = 0 (which isn't really valid, more like "don't know")
- vbid = value in range 1-7
Since the blamed patch has reversed the order of the checks, we are now
going to believe that source_port != -1 and switch_id != -1, so they're
valid, but they aren't.
The minimal solution to the problem is to only populate source_port and
switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero,
i.e. the source port information is trustworthy.
Fixes: c1ae02d876 ("net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ca3c005d0 ]
According to the synchronization rules for .ndo_get_stats() as seen in
Documentation/networking/netdevices.rst, acquiring a plain spin_lock()
should not be illegal, but the bridge driver implementation makes it so.
After running these commands, I am being faced with the following
lockdep splat:
$ ip link add link swp0 name macsec0 type macsec encrypt on && ip link set swp0 up
$ ip link add dev br0 type bridge vlan_filtering 1 && ip link set br0 up
$ ip link set macsec0 master br0 && ip link set macsec0 up
========================================================
WARNING: possible irq lock inversion dependency detected
6.4.0-04295-g31b577b4bd4a #603 Not tainted
--------------------------------------------------------
swapper/1/0 just changed the state of lock:
ffff6bd348724cd8 (&br->lock){+.-.}-{3:3}, at: br_forward_delay_timer_expired+0x34/0x198
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&ocelot->stats_lock){+.+.}-{3:3}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:
&br->lock --> &br->hash_lock --> &ocelot->stats_lock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ocelot->stats_lock);
local_irq_disable();
lock(&br->lock);
lock(&br->hash_lock);
<Interrupt>
lock(&br->lock);
*** DEADLOCK ***
(details about the 3 locks skipped)
swp0 is instantiated by drivers/net/dsa/ocelot/felix.c, and this
only matters to the extent that its .ndo_get_stats64() method calls
spin_lock(&ocelot->stats_lock).
Documentation/locking/lockdep-design.rst says:
| A lock is irq-safe means it was ever used in an irq context, while a lock
| is irq-unsafe means it was ever acquired with irq enabled.
(...)
| Furthermore, the following usage based lock dependencies are not allowed
| between any two lock-classes::
|
| <hardirq-safe> -> <hardirq-unsafe>
| <softirq-safe> -> <softirq-unsafe>
Lockdep marks br->hash_lock as softirq-safe, because it is sometimes
taken in softirq context (for example br_fdb_update() which runs in
NET_RX softirq), and when it's not in softirq context it blocks softirqs
by using spin_lock_bh().
Lockdep marks ocelot->stats_lock as softirq-unsafe, because it never
blocks softirqs from running, and it is never taken from softirq
context. So it can always be interrupted by softirqs.
There is a call path through which a function that holds br->hash_lock:
fdb_add_hw_addr() will call a function that acquires ocelot->stats_lock:
ocelot_port_get_stats64(). This can be seen below:
ocelot_port_get_stats64+0x3c/0x1e0
felix_get_stats64+0x20/0x38
dsa_slave_get_stats64+0x3c/0x60
dev_get_stats+0x74/0x2c8
rtnl_fill_stats+0x4c/0x150
rtnl_fill_ifinfo+0x5cc/0x7b8
rtmsg_ifinfo_build_skb+0xe4/0x150
rtmsg_ifinfo+0x5c/0xb0
__dev_notify_flags+0x58/0x200
__dev_set_promiscuity+0xa0/0x1f8
dev_set_promiscuity+0x30/0x70
macsec_dev_change_rx_flags+0x68/0x88
__dev_set_promiscuity+0x1a8/0x1f8
__dev_set_rx_mode+0x74/0xa8
dev_uc_add+0x74/0xa0
fdb_add_hw_addr+0x68/0xd8
fdb_add_local+0xc4/0x110
br_fdb_add_local+0x54/0x88
br_add_if+0x338/0x4a0
br_add_slave+0x20/0x38
do_setlink+0x3a4/0xcb8
rtnl_newlink+0x758/0x9d0
rtnetlink_rcv_msg+0x2f0/0x550
netlink_rcv_skb+0x128/0x148
rtnetlink_rcv+0x24/0x38
the plain English explanation for it is:
The macsec0 bridge port is created without p->flags & BR_PROMISC,
because it is what br_manage_promisc() decides for a VLAN filtering
bridge with a single auto port.
As part of the br_add_if() procedure, br_fdb_add_local() is called for
the MAC address of the device, and this results in a call to
dev_uc_add() for macsec0 while the softirq-safe br->hash_lock is taken.
Because macsec0 does not have IFF_UNICAST_FLT, dev_uc_add() ends up
calling __dev_set_promiscuity() for macsec0, which is propagated by its
implementation, macsec_dev_change_rx_flags(), to the lower device: swp0.
This triggers the call path:
dev_set_promiscuity(swp0)
-> rtmsg_ifinfo()
-> dev_get_stats()
-> ocelot_port_get_stats64()
with a calling context that lockdep doesn't like (br->hash_lock held).
Normally we don't see this, because even though many drivers that can be
bridge ports don't support IFF_UNICAST_FLT, we need a driver that
(a) doesn't support IFF_UNICAST_FLT, *and*
(b) it forwards the IFF_PROMISC flag to another driver, and
(c) *that* driver implements ndo_get_stats64() using a softirq-unsafe
spinlock.
Condition (b) is necessary because the first __dev_set_rx_mode() calls
__dev_set_promiscuity() with "bool notify=false", and thus, the
rtmsg_ifinfo() code path won't be entered.
The same criteria also hold true for DSA switches which don't report
IFF_UNICAST_FLT. When the DSA master uses a spin_lock() in its
ndo_get_stats64() method, the same lockdep splat can be seen.
I think the deadlock possibility is real, even though I didn't reproduce
it, and I'm thinking of the following situation to support that claim:
fdb_add_hw_addr() runs on a CPU A, in a context with softirqs locally
disabled and br->hash_lock held, and may end up attempting to acquire
ocelot->stats_lock.
In parallel, ocelot->stats_lock is currently held by a thread B (say,
ocelot_check_stats_work()), which is interrupted while holding it by a
softirq which attempts to lock br->hash_lock.
Thread B cannot make progress because br->hash_lock is held by A. Whereas
thread A cannot make progress because ocelot->stats_lock is held by B.
When taking the issue at face value, the bridge can avoid that problem
by simply making the ports promiscuous from a code path with a saner
calling context (br->hash_lock not held). A bridge port without
IFF_UNICAST_FLT is going to become promiscuous as soon as we call
dev_uc_add() on it (which we do unconditionally), so why not be
preemptive and make it promiscuous right from the beginning, so as to
not be taken by surprise.
With this, we've broken the links between code that holds br->hash_lock
or br->lock and code that calls into the ndo_change_rx_flags() or
ndo_get_stats64() ops of the bridge port.
Fixes: 2796d0c648 ("bridge: Automatically manage port promiscuous mode.")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit abaa02fc94 ]
Freescale PCIe controllers on their PCIe Root Ports do not have any
mappable PCI BAR allocate from PCIe MEM.
Information about 1MB window on BAR0 of PCIe Root Port was misleading
because Freescale PCIe controllers have at BAR0 position different register
PEXCSRBAR, and kernel correctly skipts BAR0 for these Freescale PCIe Root
Ports.
So update comment about P2020 PCIe Root Port and decrease PCIe MEM size
required for PCIe controller (pci2 node) on which is on-board xHCI
controller.
lspci confirms that on P2020 PCIe Root Port is no PCI BAR and /proc/iomem
sees that only c0000000-c000ffff and c0010000-c0011fff ranges are used.
Fixes: 54c15ec3b7 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routers")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230505172818.18416-1-pali@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c675ddffb ]
Here is a BUG report from syzbot:
BUG: KASAN: slab-out-of-bounds in ntfs_list_ea fs/ntfs3/xattr.c:191 [inline]
BUG: KASAN: slab-out-of-bounds in ntfs_listxattr+0x401/0x570 fs/ntfs3/xattr.c:710
Read of size 1 at addr ffff888021acaf3d by task syz-executor128/3632
Call Trace:
ntfs_list_ea fs/ntfs3/xattr.c:191 [inline]
ntfs_listxattr+0x401/0x570 fs/ntfs3/xattr.c:710
vfs_listxattr fs/xattr.c:457 [inline]
listxattr+0x293/0x2d0 fs/xattr.c:804
Fix the logic of ea_all iteration. When the ea->name_len is 0,
return immediately, or Add2Ptr() would visit invalid memory
in the next loop.
Fixes: be71b5cba2 ("fs/ntfs3: Add attrib operations")
Reported-by: syzbot+9fcea5ef6dc4dc72d334@syzkaller.appspotmail.com
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
[almaz.alexandrovich@paragon-software.com: lines of the patch have changed]
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e3e94c2f5 ]
AF driver configures MAC features like internal loopback and PFC upon
receiving the request from PF and its VF netdev. But these
features are not getting reset in FLR. This patch fixes the issue by
resetting the same.
Fixes: 23999b30ae ("octeontx2-af: Enable or disable CGX internal loopback")
Fixes: 1121f6b02e ("octeontx2-af: Priority flow control configuration support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79ebb53772 ]
with the addition of new MAC blocks like CN10K RPM and CN10KB
RPM_USX, LMACs are noncontiguous and CGX blocks are also
noncontiguous. But during RVU driver initialization, the driver
is assuming they are contiguous and trying to access
cgx or lmac with their id which is resulting in kernel panic.
This patch fixes the issue by adding proper checks.
[ 23.219150] pc : cgx_lmac_read+0x38/0x70
[ 23.219154] lr : rvu_program_channels+0x3f0/0x498
[ 23.223852] sp : ffff000100d6fc80
[ 23.227158] x29: ffff000100d6fc80 x28: ffff00010009f880 x27:
000000000000005a
[ 23.234288] x26: ffff000102586768 x25: 0000000000002500 x24:
fffffffffff0f000
Fixes: 91c6945ea1 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e7bc57b97 ]
Firmware configures NIX block mapping for all MAC blocks.
The current implementation reads the configuration and
creates the mapping between RVU PF and NIX blocks. But
this configuration is only valid for silicons that support
multiple blocks. For all other silicons, all MAC blocks
map to NIX0.
This patch corrects the mapping by adding a check for the same.
Fixes: c5a73b632b ("octeontx2-af: Map NIX block from CGX connection")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c5a331cac ]
The current design is that, for asynchronous events like link_up and
link_down firmware raises the interrupt to kernel. The previous patch
which added RPM_USX driver has a bug where it uses old csr addresses
for configuring interrupts. Which is resulting in losing interrupts
from source firmware.
This patch fixes the issue by correcting csr addresses.
Fixes: b9d0fedc62 ("octeontx2-af: cn10kb: Add RPM_USX MAC support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0135c482fa ]
If truncate_node() fails in truncate_dnode(), it missed to call
f2fs_put_page(), fix it.
Fixes: 7735730d39 ("f2fs: fix to propagate error from __get_meta_page()")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b712f18c4 ]
Sec proxy/message manager data buffer is 60 bytes with the last of the
registers indicating transmission completion. This however poses a bit
of a challenge.
The backing memory for sec_proxy / message manager is regular memory,
and all sec proxy does is to trigger a burst of all 60 bytes of data
over to the target thread backing ring accelerator. It doesn't do a
memory scrub when it moves data out in the burst. When we transmit
multiple messages, remnants of previous message is also transmitted
which results in some random data being set in TISCI fields of
messages that have been expanded forward.
The entire concept of backward compatibility hinges on the fact that
the unused message fields remain 0x0 allowing for 0x0 value to be
specially considered when backward compatibility of message extension
is done.
So, instead of just writing the completion register, we continue
to fill the message buffer up with 0x0 (note: for partial message
involving completion, we already do this).
This allows us to scale and introduce ABI changes back also work with
other boot stages that may have left data in the internal memory.
While at this, be consistent and explicit with the data_reg pointer
increment.
Fixes: aace66b170 ("mailbox: Introduce TI message manager driver")
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c1f23ad34 ]
If neither a "hif_mspi" nor "mspi" resource is present, the driver will
just early exit in probe but still return success. Apart from not doing
anything meaningful, this would then also lead to a null pointer access
on removal, as platform_get_drvdata() would return NULL, which it would
then try to dereference when trying to unregister the spi master.
Fix this by unconditionally calling devm_ioremap_resource(), as it can
handle a NULL res and will then return a viable ERR_PTR() if we get one.
The "return 0;" was previously a "goto qspi_resource_err;" where then
ret was returned, but since ret was still initialized to 0 at this place
this was a valid conversion in 63c5395bb7 ("spi: bcm-qspi: Fix
use-after-free on unbind"). The issue was not introduced by this commit,
only made more obvious.
Fixes: fa236a7ef2 ("spi: bcm-qspi: Add Broadcom MSPI driver")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Kamal Dasu <kamal.dasu@broadcom.com>
Link: https://lore.kernel.org/r/20230629134306.95823-1-jonas.gorski@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 48538ccb82 ]
All ibmvnic resets, make a call to netdev_tx_reset_queue() when
re-opening the device. netdev_tx_reset_queue() resets the num_queued
and num_completed byte counters. These stats are used in Byte Queue
Limit (BQL) algorithms. The difference between these two stats tracks
the number of bytes currently sitting on the physical NIC. ibmvnic
increases the number of queued bytes though calls to
netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports
that it is done transmitting bytes, the ibmvnic device increases the
number of completed bytes through calls to netdev_tx_completed_queue().
It is important to note that the driver batches its transmit calls and
num_queued is increased every time that an skb is added to the next
batch, not necessarily when the batch is sent to VIOS for transmission.
Unlike other reset types, a NON FATAL reset will not flush the sub crq
tx buffers. Therefore, it is possible for the batched skb array to be
partially full. So if there is call to netdev_tx_reset_queue() when
re-opening the device, the value of num_queued (0) would not account
for the skb's that are currently batched. Eventually, when the batch
is sent to VIOS, the call to netdev_tx_completed_queue() would increase
num_completed to a value greater than the num_queued. This causes a
BUG_ON crash:
ibmvnic 30000002: Firmware reports error, cause: adapter problem.
Starting recovery...
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:27!
Oops: Exception in kernel mode, sig: 5
[....]
NIP dql_completed+0x28/0x1c0
LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic]
Call Trace:
ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable)
ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic]
__handle_irq_event_percpu+0x98/0x270
---[ end trace ]---
Therefore, do not reset the dql stats when performing a NON_FATAL reset.
Fixes: 0d97338818 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 915057ae79 ]
On systems without MAE permission efx->mae is not initialised,
and trying to lookup an mport results in a NULL pointer
dereference.
Fixes: 25414b2a64 ("sfc: add devlink port support for ef100")
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b9545dc9f ]
When reconfiguring CIG after disconnection of the last CIS, LE Remove
CIG shall be sent before LE Set CIG Parameters. Otherwise, it fails
because CIG is in the inactive state and not configurable (Core v5.3
Vol 6 Part B Sec. 4.5.14.3). This ordering is currently wrong under
suitable timing conditions, because LE Remove CIG is sent via the
hci_sync queue and may be delayed, but Set CIG Parameters is via
hci_send_cmd.
Make the ordering well-defined by sending also Set CIG Parameters via
hci_sync.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cb7365850 ]
Devices that lack persistent storage for the device address can indicate
this by setting the HCI_QUIRK_INVALID_BDADDR which causes the controller
to be marked as unconfigured until user space has set a valid address.
Once configured, the device address must be set on every setup for
controllers with HCI_QUIRK_NON_PERSISTENT_SETUP to avoid marking the
controller as unconfigured and requiring the address to be set again.
Fixes: 740011cfe9 ("Bluetooth: Add new quirk for non-persistent setup settings")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1ae02d876 ]
Currently the sja1105 tagging protocol prefers using the source port
information from the VLAN header if that is available, falling back to
the INCL_SRCPT option if it isn't. The VLAN header is available for all
frames except for META frames initiated by the switch (containing RX
timestamps), and thus, the "if (is_link_local)" branch is practically
dead.
The tag_8021q source port identification has become more loose
("imprecise") and will report a plausible rather than exact bridge port,
when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local
traffic always needs to know the precise source port. With incorrect
source port reporting, for example PTP traffic over 2 bridged ports will
all be seen on sockets opened on the first such port, which is incorrect.
Now that the tagging protocol has been changed to make link-local frames
always contain source port information, we can reverse the order of the
checks so that we always give precedence to that information (which is
always precise) in lieu of the tag_8021q VID which is only precise for a
standalone port.
Fixes: d7f9787a76 ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID")
Fixes: 91495f21fc ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b4638af888 ]
Link-local traffic on bridged SJA1105 ports is sometimes tagged by the
hardware with source port information (when the port is under a VLAN
aware bridge).
The tag_8021q source port identification has become more loose
("imprecise") and will report a plausible rather than exact bridge port,
when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local
traffic always needs to know the precise source port.
Modify the driver logic (and therefore: the tagging protocol itself) to
always include the source port information with link-local packets,
regardless of whether the port is standalone, under a VLAN-aware or
VLAN-unaware bridge. This makes it possible for the tagging driver to
give priority to that information over the tag_8021q VLAN header.
The big drawback with INCL_SRCPT is that it makes it impossible to
distinguish between an original MAC DA of 01:80:C2:XX:YY:ZZ and
01:80:C2:AA:BB:ZZ, because the tagger just patches MAC DA bytes 3 and 4
with zeroes. Only if PTP RX timestamping is enabled, the switch will
generate a META follow-up frame containing the RX timestamp and the
original bytes 3 and 4 of the MAC DA. Those will be used to patch up the
original packet. Nonetheless, in the absence of PTP RX timestamping, we
have to live with this limitation, since it is more important to have
the more precise source port information for link-local traffic.
Fixes: d7f9787a76 ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID")
Fixes: 91495f21fc ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2edcfcbb3c ]
The driver implements a workaround for the fact that it doesn't have an
IRQ source to tell it whether PTP frames are available through the
extraction registers, for those frames to be processed and passed
towards the network stack. That workaround is to configure the switch,
through felix_hwtstamp_set() -> felix_update_trapping_destinations(),
to create two copies of PTP packets: one sent over Ethernet to the DSA
master, and one to be consumed through the aforementioned CPU extraction
queue registers.
The reason why we want PTP packets to be consumed through the CPU
extraction registers in the first place is because we want to see their
hardware RX timestamp. With tag_8021q, that is only visible that way,
and it isn't visible with the copy of the packet that's transmitted over
Ethernet.
The problem with the workaround implementation is that it drops the
packet received over Ethernet, in expectation of its copy being present
in the CPU extraction registers. However, if felix_hwtstamp_set() hasn't
run (aka PTP RX timestamping is disabled), the driver will drop the
original PTP frame and there will be no copy of it in the CPU extraction
registers. So, the network stack will simply not see any PTP frame.
Look at the port's trapping configuration to see whether the driver has
previously enabled the CPU extraction registers. If it hasn't, just
don't RX timestamp the frame and let it be passed up the stack by DSA,
which is perfectly fine.
Fixes: 0a6f17c6ae ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45d0fcb5bc ]
In a future change, the driver will need to determine whether PTP RX
timestamping is enabled on a port (including whether traps were set up
on that port in particular) and that is currently not possible.
The driver supports different RX filters (L2, L4) and kinds of TX
timestamping (one-step, two-step) on its ports, but it saves all
configuration in a single struct hwtstamp_config that is global to the
switch. So, the latest timestamping configuration on one port
(including a request to disable timestamping) affects what gets reported
for all ports, even though the configuration itself is still individual
to each port.
The port timestamping configurations are only coupled because of the
common structure, so replace the hwtstamp_config with a mask of trapped
protocols saved per port. We also have the ptp_cmd to distinguish
between one-step and two-step PTP timestamping, so with those 2 bits of
information we can fully reconstruct a descriptive struct
hwtstamp_config for each port, during the SIOCGHWTSTAMP ioctl.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Fixes: 96ca08c058 ("net: mscc: ocelot: set up traps for PTP packets")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fd44b82b7 ]
PTP RX timestamping should be enabled when the user requests it, not by
default. If it is enabled by default, it can be problematic when the
ocelot driver is a DSA master, and it sidesteps what DSA tries to avoid
through __dsa_master_hwtstamp_validate().
Additionally, after the change which made ocelot trap PTP packets only
to the CPU at ocelot_hwtstamp_set() time, it is no longer even true that
RX timestamping is enabled by default, because until ocelot_hwtstamp_set()
is called, the PTP traps are actually not set up. So the rx_filter field
of ocelot->hwtstamp_config reflects an incorrect reality.
Fixes: 96ca08c058 ("net: mscc: ocelot: set up traps for PTP packets")
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93d75d475c ]
xtables relies on skb being owned by ip stack, i.e. with ipv4
check in place skb->cb is supposed to be IPCB.
I don't see an immediate problem (REJECT target cannot be used anymore
now that PRE/POSTROUTING hook validation has been fixed), but better be
safe than sorry.
A much better patch would be to either mark act_ipt as
"depends on BROKEN" or remove it altogether. I plan to do this
for -next in the near future.
This tc extension is broken in the sense that tc lacks an
equivalent of NF_STOLEN verdict.
With NF_STOLEN, target function takes complete ownership of skb, caller
cannot dereference it anymore.
ACT_STOLEN cannot be used for this: it has a different meaning, caller
is allowed to dereference the skb.
At this time NF_STOLEN won't be returned by any targets as far as I can
see, but this may change in the future.
It might be possible to work around this via list of allowed
target extensions known to only return DROP or ACCEPT verdicts, but this
is error prone/fragile.
Existing selftest only validates xt_LOG and act_ipt is restricted
to ipv4 so I don't think this action is used widely.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2dc32dcba ]
Netfilter targets make assumptions on the skb state, for example
iphdr is supposed to be in the linear area.
This is normally done by IP stack, but in act_ipt case no
such checks are made.
Some targets can even assume that skb_dst will be valid.
Make a minimum effort to check for this:
- Don't call the targets eval function for non-ipv4 skbs.
- Don't call the targets eval function for POSTROUTING
emulation when the skb has no dst set.
v3: use skb_protocol helper (Davide Caratti)
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b4ee93380b ]
Looks like "tc" hard-codes "mangle" as the only supported table
name, but on kernel side there are no checks.
This is wrong. Not all xtables targets are safe to call from tc.
E.g. "nat" targets assume skb has a conntrack object assigned to it.
Normally those get called from netfilter nat core which consults the
nat table to obtain the address mapping.
"tc" userspace either sets PRE or POSTROUTING as hook number, but there
is no validation of this on kernel side, so update netlink policy to
reject bogus numbers. Some targets may assume skb_dst is set for
input/forward hooks, so prevent those from being used.
act_ipt uses the hook number in two places:
1. the state hook number, this is fine as-is
2. to set par.hook_mask
The latter is a bit mask, so update the assignment to make
xt_check_target() to the right thing.
Followup patch adds required checks for the skb/packet headers before
calling the targets evaluation function.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6feb37b3b0 ]
As &net->sctp.addr_wq_lock is also acquired by the timer
sctp_addr_wq_timeout_handler() in protocal.c, the same lock acquisition
at sctp_auto_asconf_init() seems should disable irq since it is called
from sctp_accept() under process context.
Possible deadlock scenario:
sctp_accept()
-> sctp_sock_migrate()
-> sctp_auto_asconf_init()
-> spin_lock(&net->sctp.addr_wq_lock)
<timer interrupt>
-> sctp_addr_wq_timeout_handler()
-> spin_lock_bh(&net->sctp.addr_wq_lock); (deadlock here)
This flaw was found using an experimental static analysis tool we are
developing for irq-related deadlock.
The tentative patch fix the potential deadlock by spin_lock_bh().
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Fixes: 34e5b01186 ("sctp: delay auto_asconf init until binding the first addr")
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230627120340.19432-1-dg573847474@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29f96ac236 ]
Selecting only REGMAP_I2C can leave REGMAP unset, causing build errors,
so also select REGMAP to prevent the build errors.
../drivers/media/cec/i2c/ch7322.c:158:21: error: variable 'ch7322_regmap' has initializer but incomplete type
158 | static const struct regmap_config ch7322_regmap = {
../drivers/media/cec/i2c/ch7322.c:159:10: error: 'const struct regmap_config' has no member named 'reg_bits'
159 | .reg_bits = 8,
../drivers/media/cec/i2c/ch7322.c:159:21: warning: excess elements in struct initializer
159 | .reg_bits = 8,
../drivers/media/cec/i2c/ch7322.c:160:10: error: 'const struct regmap_config' has no member named 'val_bits'
160 | .val_bits = 8,
../drivers/media/cec/i2c/ch7322.c:160:21: warning: excess elements in struct initializer
160 | .val_bits = 8,
../drivers/media/cec/i2c/ch7322.c:161:10: error: 'const struct regmap_config' has no member named 'max_register'
161 | .max_register = 0x7f,
../drivers/media/cec/i2c/ch7322.c:161:25: warning: excess elements in struct initializer
161 | .max_register = 0x7f,
../drivers/media/cec/i2c/ch7322.c:162:10: error: 'const struct regmap_config' has no member named 'disable_locking'
162 | .disable_locking = true,
../drivers/media/cec/i2c/ch7322.c:162:28: warning: excess elements in struct initializer
162 | .disable_locking = true,
../drivers/media/cec/i2c/ch7322.c: In function 'ch7322_probe':
../drivers/media/cec/i2c/ch7322.c:468:26: error: implicit declaration of function 'devm_regmap_init_i2c' [-Werror=implicit-function-declaration]
468 | ch7322->regmap = devm_regmap_init_i2c(client, &ch7322_regmap);
../drivers/media/cec/i2c/ch7322.c:468:24: warning: assignment to 'struct regmap *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
468 | ch7322->regmap = devm_regmap_init_i2c(client, &ch7322_regmap);
../drivers/media/cec/i2c/ch7322.c: At top level:
../drivers/media/cec/i2c/ch7322.c:158:35: error: storage size of 'ch7322_regmap' isn't known
158 | static const struct regmap_config ch7322_regmap = {
Link: https://lore.kernel.org/linux-media/20230608025435.29249-1-rdunlap@infradead.org
Fixes: 21b9a47e0e ("media: cec: i2c: ch7322: Add ch7322 CEC controller driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jeff Chase <jnchase@google.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Joe Tessler <jrt@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mark Brown <broonie@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 582d4ad468 ]
The tc358746 driver selects CONFIG_GENERIC_PHY_MIPI_DPHY and links to
that, but this fails when CONFIG_GENERIC_PHY is disabled, because Kbuild
then never enters the drivers/phy directory for building object files:
ERROR: modpost: "phy_mipi_dphy_get_default_config_for_hsclk" [drivers/media/i2c/tc358746.ko] undefined!
Add an explicit 'select GENERIC_PHY' here to ensure that the directory
is entered, and add another dependency on that symbol so make it
more obvious what is going on if another driver has the same problem,
as this will produce a Kconfig warning.
Link: https://lore.kernel.org/linux-media/20230623152318.2276816-1-arnd@kernel.org
Fixes: 80a21da360 ("media: tc358746: add Toshiba TC358746 Parallel to CSI-2 bridge driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8bec7dd1b3 ]
freeze_super() can fail, it needs to check its return value and do
error handling in f2fs_resize_fs().
Fixes: 04f0b2eaa3 ("f2fs: ioctl for removing a range from F2FS")
Fixes: b4b10061ef ("f2fs: refactor resize_fs to avoid meta updates in progress")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e49de73fb ]
The scenario being fixed here is depicted in the following sequence-
modprobe i915
echo 1 > /sys/class/drm/card0/gt/gt0/slpc_ignore_eff_freq
echo 300 > /sys/class/drm/card0/gt_min_freq_mhz (RPn)
cat /sys/class/drm/card0/gt_cur_freq_mhz --> cur == RPn as expected
echo 1 > /sys/kernel/debug/dri/0/gt0/reset --> reset
cat /sys/class/drm/card0/gt_min_freq_mhz --> cached freq is RPn
cat /sys/class/drm/card0/gt_cur_freq_mhz --> it's not RPn, but RPe!!
When SLPC reinitializes, it sets SLPC min freq to efficient frequency.
Even if we disable efficient freq post that, we should restore the cached
min freq (via H2G) for it to take effect.
v2: Clarify commit message (Ashutosh)
Fixes: 95ccf312a1 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency")
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230621014257.1769564-1-vinay.belgaumkar@intel.com
(cherry picked from commit da86b2b13f)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad7c3b41e8 ]
After commit f382fb0bce ("block: remove legacy IO schedulers"),
blkio.throttle.io_serviced and blkio.throttle.io_service_bytes become
the only stable io stats interface of cgroup v1, and these statistics
are done in the blk-throttle code. But the current code only counts the
bios that are actually throttled. When the user does not add the throttle
limit, the io stats for cgroup v1 has nothing. I fix it according to the
statistical method of v2, and made it count all ios accurately.
Fixes: a7b36ee6ba ("block: move blk-throtl fast path inline")
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Jinke Han <hanjinke.666@bytedance.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230507170631.89607-1-hanjinke.666@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 125bfc7cd7 ]
/sys/block/[device]/queue/iostats is used to control whether to count io
stat. Write 0 to it will clear queue_flags QUEUE_FLAG_IO_STAT which means
iostats is disabled. If we disable iostats and later endable it, the io
issued during this period will be counted incorrectly, inflight will be
decreased to -1.
//T1 set iostats
echo 0 > /sys/block/md0/queue/iostats
clear QUEUE_FLAG_IO_STAT
//T2 issue io
if (QUEUE_FLAG_IO_STAT) -> false
bio_start_io_acct
inflight++
echo 1 > /sys/block/md0/queue/iostats
set QUEUE_FLAG_IO_STAT
//T3 io end
if (QUEUE_FLAG_IO_STAT) -> true
bio_end_io_acct
inflight-- -> -1
Also, if iostats is enabled while issuing io but disabled while io end,
inflight will never be decreased.
Fix it by checking start_time when io end. If start_time is not 0, call
bio_end_io_acct().
Fixes: 528bc2cf2f ("md/raid10: enable io accounting")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230609094320.2397604-1-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38ba835986 ]
If the PWM is exported but not enabled, do not call pwm_class_apply_state().
First of all, in this case, period may still be unconfigured and this would
make pwm_class_apply_state() return -EINVAL, and then suspend would fail.
Second, it makes little sense to apply state onto PWM that is not enabled
before suspend.
Failing case:
"
$ echo 1 > /sys/class/pwm/pwmchip4/export
$ echo mem > /sys/power/state
...
pwm pwmchip4: PM: dpm_run_callback(): pwm_class_suspend+0x1/0xa8 returns -22
pwm pwmchip4: PM: failed to suspend: error -22
PM: Some devices failed to suspend, or early wake event detected
"
Working case:
"
$ echo 1 > /sys/class/pwm/pwmchip4/export
$ echo 100 > /sys/class/pwm/pwmchip4/pwm1/period
$ echo 10 > /sys/class/pwm/pwmchip4/pwm1/duty_cycle
$ echo mem > /sys/power/state
...
"
Do not call pwm_class_apply_state() in case the PWM is disabled
to fix this issue.
Fixes: 7fd4edc57b ("pwm: sysfs: Add suspend/resume support")
Signed-off-by: Marek Vasut <marex@denx.de>
Fixes: ef2bf4997f ("pwm: Improve args checking in pwm_apply_state()")
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 661dfb7f46 ]
During suspend, all the tpm registers will lose values.
So the 'real_period' value of struct 'imx_tpm_pwm_chip'
should be forced to be zero to force the period update
code can be executed after system resume back.
Signed-off-by: Fancy Fang <chen.fang@nxp.com>
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 738a1cfec2 ("pwm: Add i.MX TPM PWM driver support")
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1d2ba10f5 ]
bitmap_{from,to}_arr64() optimization is overly optimistic on 32-bit LE
architectures when it's wired to bitmap_copy_clear_tail().
bitmap_copy_clear_tail() takes care of unused bits in the bitmap up to
the next word boundary. But on 32-bit machines when copying bits from
bitmap to array of 64-bit words, it's expected that the unused part of
a recipient array must be cleared up to 64-bit boundary, so the last 4
bytes may stay untouched when nbits % 64 <= 32.
While the copying part of the optimization works correct, that clear-tail
trick makes corresponding tests reasonably fail:
test_bitmap: bitmap_to_arr64(nbits == 1): tail is not safely cleared: 0xa5a5a5a500000001 (must be 0x0000000000000001)
Fix it by removing bitmap_{from,to}_arr64() optimization for 32-bit LE
arches.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/lkml/20230225184702.GA3587246@roeck-us.net/
Fixes: 0a97953fd2 ("lib: add bitmap_{from,to}_arr64")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c50384ef8 ]
We're using pci_irq_vector() to obtain the interrupt number and then
bind it to the CPU start perf under the protection of spinlock in
pmu::start(). pci_irq_vector() might sleep since [1] because it will
call msi_domain_get_virq() to get the MSI interrupt number and it
needs to acquire dev->msi.data->mutex. Getting a mutex will sleep on
contention. So use pci_irq_vector() in an atomic context is problematic.
This patch cached the interrupt number in the probe() and uses the
cached data instead to avoid potential sleep.
[1] commit 82ff8e6b78 ("PCI/MSI: Use msi_get_virq() in pci_get_vector()")
Fixes: ff0de066b4 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230621092804.15120-6-yangyicong@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f85534113f ]
The MT6380 regulator typically used together with MT7622 does not
support the current maximum processor and SRAM voltage in the cpufreq
driver (1360000uV).
For MT7622 limit processor and SRAM supply voltages to 1350000uV to
avoid having the tracking algorithm request unsupported voltages from
the regulator.
On MT7623 there is no separate SRAM supply and the maximum voltage used
is 1300000uV. Create dedicated platform data for MT7623 to cover that
case as well.
Fixes: 0883426fd0 ("cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623")
Suggested-by: Jia-wei Chang <Jia-wei.Chang@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7dae593cd2 ]
In a couple of situations like
name = kstrndup(buf, count, GFP_KERNEL);
if (!name)
return -ENOSPC;
the error is not actually "No space left on device", but "Out of memory".
It is semantically correct to return -ENOMEM in all failed kstrndup()
and kzalloc() cases in this driver, as it is not a problem with disk
space, but with kernel memory allocator failing allocation.
The semantically correct should be:
name = kstrndup(buf, count, GFP_KERNEL);
if (!name)
return -ENOMEM;
Cc: Dan Carpenter <error27@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@ruslug.rutgers.edu>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Brian Norris <briannorris@chromium.org>
Fixes: c92316bf8e ("test_firmware: add batched firmware tests")
Fixes: 0a8adf5847 ("test: add firmware_class loader test")
Fixes: 548193cba2 ("test_firmware: add support for firmware_request_platform")
Fixes: eb910947c8 ("test: firmware_class: add asynchronous request trigger")
Fixes: 061132d2b9 ("test_firmware: add test custom fallback trigger")
Fixes: 7feebfa487 ("test_firmware: add support for request_firmware_into_buf")
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Message-ID: <20230606070808.9300-1-mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 095bb8ba45 ]
Smatch reports:
drivers/nvmem/sunplus-ocotp.c:205 sp_ocotp_probe()
warn: 'otp->clk' from clk_prepare() not released on lines: 196.
In the function sp_ocotp_probe(struct platform_device *pdev), otp->clk may
not be released before return.
To fix this issue, using function clk_unprepare() to release otp->clk.
Fixes: 8747ec2e97 ("nvmem: Add driver for OCOTP in Sunplus SP7021")
Signed-off-by: Yi Yingao <m202271736@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Message-ID: <20230509085237.5917-1-m202271736@hust.edu.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39d422555e ]
The fwnode_irq_get() and the fwnode_irq_get_byname() return 0 upon
device-tree IRQ mapping failure. This is contradicting the
fwnode_irq_get_byname() function documentation and can potentially be a
source of errors like:
int probe(...) {
...
irq = fwnode_irq_get_byname();
if (irq <= 0)
return irq;
...
}
Here we do correctly check the return value from fwnode_irq_get_byname()
but the driver probe will now return success. (There was already one
such user in-tree).
Change the fwnode_irq_get_byname() to work as documented and make also the
fwnode_irq_get() follow same common convention returning a negative errno
upon failure.
Fixes: ca0acb511c ("device property: Add fwnode_irq_get_byname")
Suggested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Suggested-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-ID: <3e64fe592dc99e27ef9a0b247fc49fa26b6b8a58.1685340157.git.mazziesaccount@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20a41a6261 ]
We should not rely on autosuspend timeout for system suspend. Instead,
let's use force_suspend and force_resume functions. Otherwise the serial
port controller device may not be idled on suspend.
As we are doing a register write on suspend to configure the serial port,
we still need to runtime PM resume the port on suspend.
While at it, let's switch to pm_runtime_resume_and_get() and check for
errors returned. And let's add the missing line break before return to the
suspend function while at it.
Fixes: 09d8b2bdbc ("serial: 8250: omap: Provide ability to enable/disable UART as wakeup source")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-by: Dhruva Gole <d-gole@ti.com>
Message-ID: <20230614045922.4798-1-tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit edd60d24bd ]
Currently if we bootup a device without cable connected, then
usb-conn-gpio won't call set_role() since last_role is same as
current role. This happens because during probe last_role gets
initialised to zero.
To avoid this, added a new constant in enum usb_role, last_role
is set to USB_ROLE_UNKNOWN before performing initial detection.
While at it, also handle default case for the usb_role switch
in cdns3, intel-xhci-usb-role-switch & musb/jz4740 to avoid
build warnings.
Fixes: 4602f3bff2 ("usb: common: add USB GPIO based connection detection driver")
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Message-ID: <1685544074-17337-1-git-send-email-quic_prashk@quicinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 285cff4c04 ]
The KVM_S390_GET_CMMA_BITS ioctl may return incorrect values when userspace
specifies a start_gfn outside of memslots.
This can occur when a VM has multiple memslots with a hole in between:
+-----+----------+--------+--------+
| ... | Slot N-1 | <hole> | Slot N |
+-----+----------+--------+--------+
^ ^ ^ ^
| | | |
GFN A A+B | |
A+B+C |
A+B+C+D
When userspace specifies a GFN in [A+B, A+B+C), it would expect to get the
CMMA values of the first dirty page in Slot N. However, userspace may get a
start_gfn of A+B+C+D with a count of 0, hence completely skipping over any
dirty pages in slot N.
The error is in kvm_s390_next_dirty_cmma(), which assumes
gfn_to_memslot_approx() will return the memslot _below_ the specified GFN
when the specified GFN lies outside a memslot. In reality it may return
either the memslot below or above the specified GFN.
When a memslot above the specified GFN is returned this happens:
- ofs is calculated, but since the memslot's base_gfn is larger than the
specified cur_gfn, ofs will underflow to a huge number.
- ofs is passed to find_next_bit(). Since ofs will exceed the memslot's
number of pages, the number of pages in the memslot is returned,
completely skipping over all bits in the memslot userspace would be
interested in.
Fix this by resetting ofs to zero when a memslot _above_ cur_gfn is
returned (cur_gfn < ms->base_gfn).
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: afdad61615 ("KVM: s390: Fix storage attributes migration with memory slots")
Message-Id: <20230324145424.293889-2-nrb@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 901c12d144 ]
In IRQ context, it wakes up workqueue to record errors into on-disk
superblock fields rather than in-memory fields.
Fixes: 1aa161e431 ("f2fs: fix scheduling while atomic in decompression path")
Fixes: 95fa90c9e5 ("f2fs: support recording errors into superblock")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8189834d4 ]
butt3rflyh4ck reports a bug as below:
When a thread always calls F2FS_IOC_RESIZE_FS to resize fs, if resize fs is
failed, f2fs kernel thread would invoke callback function to update f2fs io
info, it would call f2fs_write_end_io and may trigger null-ptr-deref in
NODE_MAPPING.
general protection fault, probably for non-canonical address
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
RIP: 0010:NODE_MAPPING fs/f2fs/f2fs.h:1972 [inline]
RIP: 0010:f2fs_write_end_io+0x727/0x1050 fs/f2fs/data.c:370
<TASK>
bio_endio+0x5af/0x6c0 block/bio.c:1608
req_bio_endio block/blk-mq.c:761 [inline]
blk_update_request+0x5cc/0x1690 block/blk-mq.c:906
blk_mq_end_request+0x59/0x4c0 block/blk-mq.c:1023
lo_complete_rq+0x1c6/0x280 drivers/block/loop.c:370
blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1101
__do_softirq+0x1d4/0x8ef kernel/softirq.c:571
run_ksoftirqd kernel/softirq.c:939 [inline]
run_ksoftirqd+0x31/0x60 kernel/softirq.c:931
smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164
kthread+0x33e/0x440 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
The root cause is below race case can cause leaving dirty metadata
in f2fs after filesystem is remount as ro:
Thread A Thread B
- f2fs_ioc_resize_fs
- f2fs_readonly --- return false
- f2fs_resize_fs
- f2fs_remount
- write_checkpoint
- set f2fs as ro
- free_segment_range
- update meta_inode's data
Then, if f2fs_put_super() fails to write_checkpoint due to readonly
status, and meta_inode's dirty data will be writebacked after node_inode
is put, finally, f2fs_write_end_io will access NULL pointer on
sbi->node_inode.
Thread A IRQ context
- f2fs_put_super
- write_checkpoint fails
- iput(node_inode)
- node_inode = NULL
- iput(meta_inode)
- write_inode_now
- f2fs_write_meta_page
- f2fs_write_end_io
- NODE_MAPPING(sbi)
: access NULL pointer on node_inode
Fixes: b4b10061ef ("f2fs: refactor resize_fs to avoid meta updates in progress")
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Closes: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn
Tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f082c6b205 ]
If S_NOQUOTA is cleared from inode during data page writeback of quota
file, it may miss to unlock node_write lock, result in potential
deadlock, fix to use the lock in paired.
Kworker Thread
- writepage
if (IS_NOQUOTA())
f2fs_down_read(&sbi->node_write);
- vfs_cleanup_quota_inode
- inode->i_flags &= ~S_NOQUOTA;
if (IS_NOQUOTA())
f2fs_up_read(&sbi->node_write);
Fixes: 79963d967b ("f2fs: shrink node_write lock coverage")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c8ed1b3593 ]
In gfs2_file_buffered_write(), we currently jump from the second call of
function should_fault_in_pages() to above the first call, so
should_fault_in_pages() is getting called twice in a row, causing it to
accidentally fall back to single-page writes rather than trying the more
efficient multi-page writes first.
Fix that by moving the retry label to the correct place, behind the
first call to should_fault_in_pages().
Fixes: e1fa9ea85c ("gfs2: Stop using glock holder auto-demotion for now")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d45a43225 ]
Stop using half pixelclock for binned modes this fixes:
1. The exposure being twice as high for binned mods (due to the half clk)
2. The framerate being 15 fps instead of 30 fps
The original code with fixed per mode register lists did use half pixel
clk, but this should be combined with using half for the VTS value too.
Using half VTS fixes the framerate issue, but this has the undesired
side-effect of change the exposure ctrl range (half the range, double
the amount of exposure per step).
Link: https://lore.kernel.org/r/20230604161406.69369-3-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 927e78ac8b ]
ALIGN() expects its second argument to be a power of 2, otherwise
incorrect results are produced for some inputs. The output can be
both larger or smaller than what is expected.
For example, ALIGN(304, 192) equals 320 instead of 384, and
ALIGN(65, 192) equals 256 instead of 192.
However, nestling two ALIGN() as is done in this case seem to only
produce results equal to or bigger than the expected result if ALIGN()
had handled non powers of two, and that in turn results in framesizes
that are either the correct size or too large.
Fortunately, since 192 * 4 / 3 equals 256, it turns out that one ALIGN()
is sufficient.
Fixes: ab1eda449c ("media: venus: vdec: handle 10bit bitstreams")
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d80a86a99 ]
The current coding make 'charger-enable-gpio' control as real hardware
level. This conflicts with the default binding example. For driver
behavior, no need to use real hardware level, just logic level is
enough. This change can make this flexibility keep in dts gpio active
level about this pin.
Fixes: 6f7f70e3a8 ("power: supply: rt9467: Add Richtek RT9467 charger driver")
Signed-off-by: ChiYuan Huang <cy_huang@richtek.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c45b2835e7 ]
child_fwnode should be a read only property based on the DT or ACPI. If
it's cleared on the parent device when a child is unloaded, then when
the child is loaded again the connection won't be remade.
child_dev should be cleared instead which signifies that the connection
should be remade when the child_fwnode registers a new coresight_device.
Similarly the reference count shouldn't be decremented as long as the
parent device exists. The correct place to drop the reference is in
coresight_release_platform_data() which is already done.
Reproducible on Juno with the following steps:
# load all coresight modules.
$ cd /sys/bus/coresight/devices/
$ echo 1 > tmc_etr0/enable_sink
$ echo 1 > etm0/enable_source
# Works fine ^
$ echo 0 > etm0/enable_source
$ rmmod coresight-funnel
$ modprobe coresight-funnel
$ echo 1 > etm0/enable_source
-bash: echo: write error: Invalid argument
Fixes: 37ea1ffddf ("coresight: Use fwnode handle instead of device names")
Fixes: 2af89ebacf ("coresight: Clear the connection field properly")
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-2-james.clark@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b290df0681 ]
Function ll_rw_block was removed in commit 79f5978420 ("fs/buffer:
remove ll_rw_block() helper"). There is no unified function to sumbit
read or write buffer in block layer for now. Consider similar sematics,
we can choose submit_bh() to replace ll_rw_block() as predefined crash
point. In submit_bh(), it also takes read or write flag as the first
argument and invoke submit_bio() to submit I/O request to block layer.
Fixes: 79f5978420 ("fs/buffer: remove ll_rw_block() helper")
Signed-off-by: Yue Zhao <findns94@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230503162944.3969-1-findns94@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 25614735a6 ]
omap8250_irq() accesses UART_IER. This register is modified twice
by each console write (serial8250_console_write()) under the port
lock. omap8250_irq() must also take the port lock to guanentee
synchronized access to UART_IER.
Since the port lock is already being taken for the stop_rx() callback
and since it is safe to call cancel_delayed_work() while holding the
port lock, simply extend the port lock region to include UART_IER
access.
Fixes: 1fe0e1fa32 ("serial: 8250_omap: Handle optional overrun-throttle-ms property")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20230525093159.223817-8-john.ogness@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e6bd945e6 ]
The declaration is in an #ifdef, which causes warnings when building
with 'make W=1' and without CONFIG_PM:
drivers/usb/core/devio.c:742:6: error: no previous prototype for 'usbfs_notify_suspend'
drivers/usb/core/devio.c:747:6: error: no previous prototype for 'usbfs_notify_resume'
Use the same #ifdef check around the function definitions to avoid
the warnings and slightly shrink the USB core.
Fixes: 7794f486ed ("usbfs: Add ioctls for runtime power management")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20230516202103.558301-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 73346b9965 ]
Kernel documentation has to be synchronized with a code, otherwise
the validator is not happy:
Function parameter or member 'usb_bits' not described in 'extcon_cable'
Function parameter or member 'chg_bits' not described in 'extcon_cable'
Function parameter or member 'jack_bits' not described in 'extcon_cable'
Function parameter or member 'disp_bits' not described in 'extcon_cable'
Describe the fields added in the past.
Fixes: ceaa98f442 ("extcon: Add the support for the capability of each property")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e77e0b7a9 ]
Kernel documentation has to be synchronized with a code, otherwise
the validator is not happy:
Function parameter or member 'usb_propval' not described in 'extcon_cable'
Function parameter or member 'chg_propval' not described in 'extcon_cable'
Function parameter or member 'jack_propval' not described in 'extcon_cable'
Function parameter or member 'disp_propval' not described in 'extcon_cable'
Describe the fields added in the past.
Fixes: 067c1652e7 ("extcon: Add the support for extcon property according to extcon type")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f6ecb89fe ]
Consider a case where gserial_disconnect has already cleared
gser->ioport. And if gserial_suspend gets called afterwards,
it will lead to accessing of gser->ioport and thus causing
null pointer dereference.
Avoid this by adding a null pointer check. Added a static
spinlock to prevent gser->ioport from becoming null after
the newly added null pointer check.
Fixes: aba3a8d01d ("usb: gadget: u_serial: add suspend resume callbacks")
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
Link: https://lore.kernel.org/r/1683278317-11774-1-git-send-email-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f9914b178 ]
This reverts commit 57ed510b05 ("soundwire: qcom: use
pm_runtime_resume_and_get()") which introduced unbalanced
pm_runtime_put(), when device did not have runtime PM enabled.
If pm_runtime_resume_and_get() failed with -EACCES, the driver continued
execution and finally called pm_runtime_put_autosuspend(). Since
pm_runtime_resume_and_get() drops the usage counter on every error, this
lead to double decrement of that counter visible in certain debugfs
actions on unattached devices (still in reset state):
$ cat /sys/kernel/debug/soundwire/master-0-0/sdw:0:0217:f001:00:0/registers
qcom-soundwire 3210000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
soundwire sdw-master-0: trf on Slave 1 failed:-5 read addr e36 count 1
soundwire sdw:0:0217:f001:00:0: Runtime PM usage count underflow!
Fixes: 57ed510b05 ("soundwire: qcom: use pm_runtime_resume_and_get()")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20230517163750.997629-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e152c58d7a ]
This function has no callers from other files, and the declaration
was removed a while ago, causing a W=1 warning:
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:465:5: error: no previous prototype for 'vchiq_platform_init'
Marking it static solves this problem but introduces a new warning
since gcc determines that 'g_fragments_base' is never initialized
in some kernel configurations:
In file included from include/linux/string.h:254,
from include/linux/bitmap.h:11,
from include/linux/cpumask.h:12,
from include/linux/mm_types_task.h:14,
from include/linux/mm_types.h:5,
from include/linux/buildid.h:5,
from include/linux/module.h:14,
from drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:8:
In function 'memcpy_to_page',
inlined from 'free_pagelist' at drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:433:4:
include/linux/fortify-string.h:57:33: error: argument 2 null where non-null expected [-Werror=nonnull]
include/linux/highmem.h:427:9: note: in expansion of macro 'memcpy'
427 | memcpy(to + offset, from, len);
| ^~~~~~
Add a NULL pointer check for this in addition to the static annotation
to avoid both.
Fixes: 89cc4218f6 ("staging: vchiq_arm: drop unnecessary declarations")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Umang Jain <umang.jain@ideasonboard.com>
Link: https://lore.kernel.org/r/20230516202603.560554-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 63d56adf04 ]
GPLL0_OUT_DIV (.fw_name = "gcc_disp_gpll0_div_clk_src") was previously
made to reuse the same parent enum entry as GPLL0_OUT_MAIN
(.fw_name = "gcc_disp_gpll0_clk_src") in parent_map_2.
Resolve it by introducing its own entry in the parent enum and
correctly assigning it in disp_cc_parent_map_2[].
Fixes: cc517ea333 ("clk: qcom: Add display clock controller driver for QCM2290")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230412-topic-qcm_dispcc-v2-2-bce7dd512fe4@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2864e304fa ]
Adding the definition of decoder status to separate different decoder
period for core hardware.
core_work_cnt is the number of core work queued to work queue, the control
is very complex, leading to some unreasonable test result.
Using parameter status to indicate whether queue core work to work queue.
Fixes: 2e0ef56d81 ("media: mediatek: vcodec: making sure queue_work successfully")
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 04fc06f6dc ]
pm_runtime_get_if_in_use() does not only return nonzero values when
the device is in use, it can return a negative errno too.
And especially during resuming from system suspend, when runtime pm
is not yet up again, -EAGAIN is being returned, so the subsequent
pm_runtime_put() call results in a refcount underflow.
Fix system-resume by handling -EAGAIN of pm_runtime_get_if_in_use().
Signed-off-by: Martin Kepplinger <martin.kepplinger@puri.sm>
Fixes: e8c0882685 ("media: i2c: add driver for the SK Hynix Hi-846 8M pixel camera")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59a9597963 ]
On R-Car M2-W:
rcar_fdp1 fe940000.fdp1: FDP1 Unidentifiable (0x02010101)
rcar_fdp1 fe944000.fdp1: FDP1 Unidentifiable (0x02010101)
Although the IP Internal Data Register on R-Car Gen2 is documented to
contain all zeros, the actual register contents seem to match the FDP1
version ID of R-Car H3 ES1.*, which has just been removed.
Fortunately this version is not used for any other purposes yet.
Fix this by re-adding the ID, now using an R-Car Gen2-specific name.
Fixes: af4273b43f ("media: renesas: fdp1: remove R-Car H3 ES1.* handling")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 306c3190b3 ]
Format propagation in the st-mipid02 driver is incorrect in that when
setting format for V4L2_SUBDEV_FORMAT_TRY on the source pad, the
_active_ rather than _try_ format from the sink pad is propagated.
This causes problems with format negotiation - update the function to
propagate the correct format.
Fixes: 642bb5e88f ("media: st-mipid02: MIPID02 CSI-2 to PARALLEL bridge driver")
Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f489a966f ]
The previous commit ebad8e731c ("media: usb: siano: Fix use after
free bugs caused by do_submit_urb") adds cancel_work_sync() in
smsusb_stop_streaming(). But smsusb_stop_streaming() may be called,
even if the work_struct surb->wq has not been initialized. As a result,
the warning will occur. One of the processes that could lead to warning
is shown below:
smsusb_probe()
smsusb_init_device()
if (!dev->in_ep || !dev->out_ep || align < 0) {
smsusb_term_device(intf);
smsusb_stop_streaming()
cancel_work_sync(&dev->surbs[i].wq);
__cancel_work_timer()
__flush_work()
if (WARN_ON(!work->func)) // work->func is null
The log reported by syzbot is shown below:
WARNING: CPU: 0 PID: 897 at kernel/workqueue.c:3066 __flush_work+0x798/0xa80 kernel/workqueue.c:3063
Modules linked in:
CPU: 0 PID: 897 Comm: kworker/0:2 Not tainted 6.2.0-rc1-syzkaller #0
RIP: 0010:__flush_work+0x798/0xa80 kernel/workqueue.c:3066
...
RSP: 0018:ffffc9000464ebf8 EFLAGS: 00010246
RAX: 1ffff11002dbb420 RBX: 0000000000000021 RCX: 1ffffffff204fa4e
RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff888016dda0e8
RBP: ffffc9000464ed98 R08: 0000000000000001 R09: ffffffff90253b2f
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888016dda0e8
R13: ffff888016dda0e8 R14: ffff888016dda100 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd4331efe8 CR3: 000000000b48e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__cancel_work_timer+0x315/0x460 kernel/workqueue.c:3160
smsusb_stop_streaming drivers/media/usb/siano/smsusb.c:182 [inline]
smsusb_term_device+0xda/0x2d0 drivers/media/usb/siano/smsusb.c:344
smsusb_init_device+0x400/0x9ce drivers/media/usb/siano/smsusb.c:419
smsusb_probe+0xbbd/0xc55 drivers/media/usb/siano/smsusb.c:567
...
This patch adds check before cancel_work_sync(). If surb->wq has not
been initialized, the cancel_work_sync() will not be executed.
Reported-by: syzbot+27b0b464864741b18b99@syzkaller.appspotmail.com
Fixes: ebad8e731c ("media: usb: siano: Fix use after free bugs caused by do_submit_urb")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26ae58f65e ]
VIDIOC_ENUMINPUT documentation describes the tuner field of
struct v4l2_input as index:
Documentation/userspace-api/media/v4l/vidioc-enuminput.rst
"
* - __u32
- ``tuner``
- Capture devices can have zero or more tuners (RF demodulators).
When the ``type`` is set to ``V4L2_INPUT_TYPE_TUNER`` this is an
RF connector and this field identifies the tuner. It corresponds
to struct :c:type:`v4l2_tuner` field ``index``. For
details on tuners see :ref:`tuner`.
"
Drivers I could find also use the 'tuner' field as an index, e.g.:
drivers/media/pci/bt8xx/bttv-driver.c bttv_enum_input()
drivers/media/usb/go7007/go7007-v4l2.c vidioc_enum_input()
However, the UAPI comment claims this field is 'enum v4l2_tuner_type':
include/uapi/linux/videodev2.h
This field being 'enum v4l2_tuner_type' is unlikely as it seems to be
never used that way in drivers, and documentation confirms it. It seem
this comment got in accidentally in the commit which this patch fixes.
Fix the UAPI comment to stop confusion.
This was pointed out by Dmitry while reviewing VIDIOC_ENUMINPUT
support for strace.
Fixes: 6016af82ea ("[media] v4l2: use __u32 rather than enums in ioctl() structs")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 076b6289b2 ]
The last buffer from before the change must be marked
with the V4L2_BUF_FLAG_LAST flag,
similarly to the Drain sequence above.
initiate a drain of the capture queue in dynamic resolution change
Fixes: 6de8d628df ("media: amphion: add v4l2 m2m vpu decoder stateful driver")
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f6375a2d1 ]
Use the intended pointer types for p_s32 and p_64 in the union of the
struct v4l2_ext_control.
Fixes: e77eb66342 ("videodev2.h: add p_s32 and p_s64 pointers")
Signed-off-by: Daniel Lundberg Pedersen <dlp@qtec.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2aa8ac6f9 ]
Commit in Fixes turned a BUG() into a "normal" memory allocation failure.
While at it, it introduced a memory leak.
So fix it.
Also update the comment on top of the function to reflect what has been
change by the commit in Fixes.
Fixes: 40e986c996 ("media: common: saa7146: replace BUG_ON by WARN_ON")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e1d2ccc2cd ]
For format V4L2_PIX_FMT_VC1_ANNEX_G,
the separate codec data is required only once.
The repeated codec data may introduce some decoding error.
so drop the repeated codec data.
It's amphion vpu's limitation
Fixes: e670f5d672 ("media: amphion: only insert the first sequence startcode for vc1l format")
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Tested-by: xiahong.bao <xiahong.bao@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 668ee1a3a1 ]
For format V4L2_PIX_FMT_VC1_ANNEX_L,
the codec data is replaced with startcode,
and then driver drop it, otherwise it may led to decoding error.
It's amphion vpu's limitation
Driver has dropped the first codec data,
but need to drop the repeated codec data too.
Fixes: e670f5d672 ("media: amphion: only insert the first sequence startcode for vc1l format")
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Tested-by: xiahong.bao <xiahong.bao@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fdaca63186 ]
If az6007_read() returns error, there is no sence to continue.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 3af2f4f15a ("[media] az6007: Change the az6007 read/write routine parameter")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ff7aedcdc ]
Commit dd42ec8ea5 ("interconnect: qcom: rpm: Use _optional func for provider clocks")
relaxed the requirements around probing bus clocks. This was a decent
solution for making sure MSM8996 would still boot with old DTs, but
now that there's a proper fix in place that both old and new DTs
will be happy about, revert back to the safer variant of the
function.
Fixes: dd42ec8ea5 ("interconnect: qcom: rpm: Use _optional func for provider clocks")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20230228-topic-qos-v8-7-ee696a2c15a9@linaro.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c024241d0 ]
Avoid extra 120ms delay during system resume.
The xHC controller may signal wake up to 120ms before showing which usb
device caused the wake on the xHC port registers.
The xhci driver therefore checks for port activity up to 120ms during
resume, making sure that the hub driver can see the port change, and
won't immediately runtime suspend back due to no port activity.
This is however only needed for runtime resume as system resume will
resume all child hubs and other child usb devices anyway.
Fixes: 253f588c70 ("xhci: Improve detection of device initiated wake signal.")
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20230428140056.1318981-3-Basavaraj.Natikar@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ab24b0486 ]
If the probe needs to be deferred, some resources still need to be
released. So branch to the error handling path instead of returning
directly.
Fixes: f41e1442ac ("cpufreq: tegra194: add OPP support and set bandwidth")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7cd2e5f75b ]
If a file has FI_COMPRESS_RELEASED, all writes for it should not be
allowed.
Fixes: 5fdb322ff2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE")
Signed-off-by: Qi Han <hanqi@vivo.com>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 83f3fcf96f ]
The __w1_remove_master_device() function calls:
list_del(&dev->w1_master_entry);
So presumably this can cause an endless loop.
Fixes: 7785925dd8 ("[PATCH] w1: cleanups.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dca5480ab7 ]
The commit 67b392f7b8 ("w1_therm: optimizing temperature read timings")
accidentially inverted the logic for lock handling of the bus mutex.
Before:
pullup -> release lock before sleep
no pullup -> release lock after sleep
After:
pullup -> release lock after sleep
no pullup -> release lock before sleep
This cause spurious measurements of 85 degree (powerup value) on the
Tarragon board with connected 1-w temperature sensor
(w1_therm.w1_strong_pull=0).
In the meantime a new feature for polling the conversion
completion has been integrated in these branches with
commit 021da53e65 ("w1: w1_therm: Add sysfs entries to control
conversion time and driver features"). But this feature isn't
available for parasite power mode, so handle this separately.
Link: https://lore.kernel.org/regressions/2023042645-attentive-amends-7b0b@gregkh/T/
Fixes: 67b392f7b8 ("w1_therm: optimizing temperature read timings")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: https://lore.kernel.org/r/20230427112152.12313-1-stefan.wahren@i2se.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0303c9729a ]
Niklāvs reported a boot regression on an Alderlake machine and bisected it
to commit 9df9d2f047 ("init: Invoke arch_cpu_finalize_init() earlier").
By moving the invocation of arch_cpu_finalize_init() further down he
identified that efi_enter_virtual_mode() is the function which causes the
boot hang.
The main difference of the earlier invocation is that the boot CPU is
already fully initialized and mitigations and alternatives are applied.
But the only really interesting change turned out to be IBT, which is now
enabled before efi_enter_virtual_mode(). "ibt=off" on the kernel command
line cured the problem.
Inspection of the involved calls in efi_enter_virtual_mode() unearthed that
efi_set_virtual_address_map() is the only place in the kernel which invokes
an EFI call without the IBT safe wrapper. This went obviously unnoticed so
far as IBT was enabled later.
Use arch_efi_call_virt() instead of efi_call() to cure that.
Fixes: fe379fa4d1 ("x86/ibt: Disable IBT around firmware")
Fixes: 9df9d2f047 ("init: Invoke arch_cpu_finalize_init() earlier")
Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217602
Link: https://lore.kernel.org/r/87jzvm12q0.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 893b24181b ]
The FFR is a predicate register which can vary between 16 and 256 bits
in size depending upon the configured vector length. When saving the
SVE state in streaming SVE mode, the FFR register is inaccessible and
so commit 9f58486657 ("arm64/sve: Make access to FFR optional") simply
clears the FFR field of the in-memory context structure. Unfortunately,
it achieves this using an unconditional 8-byte store and so if the SME
vector length is anything other than 64 bytes in size we will either
fail to clear the entire field or, worse, we will corrupt memory
immediately following the structure. This has led to intermittent kfence
splats in CI [1] and can trigger kmalloc Redzone corruption messages
when running the 'fp-stress' kselftest:
| =============================================================================
| BUG kmalloc-1k (Not tainted): kmalloc Redzone overwritten
| -----------------------------------------------------------------------------
|
| 0xffff000809bf1e22-0xffff000809bf1e27 @offset=7714. First byte 0x0 instead of 0xcc
| Allocated in do_sme_acc+0x9c/0x220 age=2613 cpu=1 pid=531
| __kmalloc+0x8c/0xcc
| do_sme_acc+0x9c/0x220
| ...
Replace the 8-byte store with a store of a predicate register which has
been zero-initialised with PFALSE, ensuring that the entire field is
cleared in memory.
[1] https://lore.kernel.org/r/CA+G9fYtU7HsV0R0dp4XEH5xXHSJFw8KyDf5VQrLLfMxWfxQkag@mail.gmail.com
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: 9f58486657 ("arm64/sve: Make access to FFR optional")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20230628155605.22296-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9cedc58bdb ]
clang warns about a possible field overflow in a memcpy:
In file included from fs/smb/server/smb_common.c:7:
include/linux/fortify-string.h:583:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
__write_overflow_field(p_size_field, size);
It appears to interpret the "&out[baselen + 4]" as referring to a single
byte of the character array, while the equivalen "out + baselen + 4" is
seen as an offset into the array.
I don't see that kind of warning elsewhere, so just go with the simple
rework.
Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ae872de41 ]
When having two DFS root mounts that are connected to same namespace,
same mount options but different prefix paths, we can't really use the
shared @server->origin_fullpath when chasing DFS links in them.
Move the origin_fullpath field to cifs_tcon structure so when having
shared DFS root mounts with different prefix paths, and we need to
chase any DFS links, dfs_get_automount_devname() will pick up the
correct full path out of the @tcon that will be used for the new
mount.
Before patch
mount.cifs //dom/dfs/dir /mnt/1 -o ...
mount.cifs //dom/dfs /mnt/2 -o ...
# shared server, ses, tcon
# server: origin_fullpath=//dom/dfs/dir
# @server->origin_fullpath + '/dir/link1'
$ ls /mnt/2/dir/link1
ls: cannot open directory '/mnt/2/dir/link1': No such file or directory
After patch
mount.cifs //dom/dfs/dir /mnt/1 -o ...
mount.cifs //dom/dfs /mnt/2 -o ...
# shared server & ses
# tcon_1: origin_fullpath=//dom/dfs/dir
# tcon_2: origin_fullpath=//dom/dfs
# @tcon_2->origin_fullpath + '/dir/link1'
$ ls /mnt/2/dir/link1
dir0 dir1 dir10 dir3 dir5 dir6 dir7 dir9 target2_file.txt tsub
Fixes: 8e3554150d ("cifs: fix sharing of DFS connections")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 326a8d04f1 ]
All the server credits and in-flight info is protected by req_lock.
Once the req_lock is held, and we've determined that we have enough
credits to continue, this lock cannot be dropped till we've made the
changes to credits and in-flight count.
However, we used to drop the lock in order to avoid deadlock with
the recent srv_lock. This could cause the checks already made to be
invalidated.
Fixed it by moving the server status check to before locking req_lock.
Fixes: d7d7a66aac ("cifs: avoid use of global locks for high contention data")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33f736187d ]
In smb2_compound_op we have a possible use-after-free
which can cause hard to debug problems later on.
This was revealed during stress testing with KASAN enabled
kernel. Fixing it by moving the cfile free call to
a few lines below, after the usage.
Fixes: 76894f3e2f ("cifs: improve symlink handling for smb2+")
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e28a798c3 ]
Currently, the EFI stub will disable PCI DMA as the very last thing it
does before calling ExitBootServices(), to avoid interfering with the
firmware's normal operation as much as possible.
However, the stub will invoke DisconnectController() on all endpoints
downstream of the PCI bridges it disables, and this may affect the
layout of the EFI memory map, making it substantially more likely that
ExitBootServices() will fail the first time around, and that the EFI
memory map needs to be reloaded.
This, in turn, increases the likelihood that the slack space we
allocated is insufficient (and we can no longer allocate memory via boot
services after having called ExitBootServices() once), causing the
second call to GetMemoryMap (and therefore the boot) to fail. This makes
the PCI DMA disable feature a bit more fragile than it already is, so
let's make it more robust, by allocating the space for the EFI memory
map after disabling PCI DMA.
Fixes: 4444f8541d ("efi: Allow disabling PCI busmastering on bridges during boot")
Reported-by: Glenn Washburn <development@efficientek.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1240dabe8d ]
When CONFIG_MODULES is disabled for ARCH=um, 'make (bin)deb-pkg' fails
with an error like follows:
cp: cannot create regular file 'debian/linux-image/usr/lib/uml/modules/6.4.0-rc2+/System.map': No such file or directory
Remove the CONFIG_MODULES check completely so ${pdir}/usr/lib/uml/modules
will always be created and modules.builtin.(modinfo) will be installed
under it for ARCH=um.
Fixes: b611daae5e ("kbuild: deb-pkg: split image and debug objects staging out into functions")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4243afdb93 ]
Even for a non-modular kernel, the kernel builds modules.builtin and
modules.builtin.modinfo, with information about the built-in modules.
Tools such as initramfs-tools need these files to build a working
initramfs on some systems, such as those requiring firmware.
Now that `make modules_install` works even in non-modular kernels and
installs these files, unconditionally invoke it when building a Debian
package.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Stable-dep-of: 1240dabe8d ("kbuild: deb-pkg: remove the CONFIG_MODULES check in buildeb")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ab47045ac ]
cxl_region_decode_reset() walks all the decoders associated with a given
region and disables them. Due to decoder ordering rules it is possible
that a switch in the topology notices that a given decoder can not be
shutdown before another region with a higher HPA is shutdown first. That
can leave the region in a partially committed state.
Capture that state in a new CXL_REGION_F_NEEDS_RESET flag and require
that a successful cxl_region_decode_reset() attempt must be completed
before cxl_region_probe() accepts the region.
This is a corollary for the bug that Jonathan identified in "CXL/region
: commit reset of out of order region appears to succeed." [1].
Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Link: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com [1]
Fixes: 176baefb2e ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168696507423.3590522.16254212607926684429.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1257d098a ]
Vikram raised a concern with the theoretical case of a CPU sending
MemClnEvict to a device that is not prepared to receive. MemClnEvict is
a message that is sent after a CPU has taken ownership of a cacheline
from accelerator memory (HDM-DB). In the case of hotplug or HDM decoder
reconfiguration it is possible that the CPU is holding old contents for
a new device that has taken over the physical address range being cached
by the CPU.
To avoid this scenario, invalidate caches prior to tearing down an HDM
decoder configuration.
Now, this poses another problem that it is possible for something to
speculate into that space while the decode configuration is still up, so
to close that gap also invalidate prior to establish new contents behind
a given physical address range.
With this change the cache invalidation is now explicit and need not be
checked in cxl_region_probe(), and that obviates the need for
CXL_REGION_F_INCOHERENT.
Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Fixes: d18bc74ace ("cxl/region: Manage CPU caches relative to DPA invalidation events")
Reported-by: Vikram Sethi <vsethi@nvidia.com>
Closes: http://lore.kernel.org/r/BYAPR12MB33364B5EB908BF7239BB996BBD53A@BYAPR12MB3336.namprd12.prod.outlook.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168696506886.3590522.4597053660991916591.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 25a21fbb93 ]
With GCOV_PROFILE_ALL, Clang injects __llvm_gcov_* functions to each
object file, including the *.mod.o. As we filter out CC_FLAGS_CFI
for *.mod.o, the compiler won't generate type hashes for the
injected functions, and therefore indirectly calling them during
module loading trips indirect call checking.
Enabling CFI for *.mod.o isn't sufficient to fix this issue after
commit 0c3e806ec0 ("x86/cfi: Add boot time hash randomization"),
as *.mod.o aren't processed by objtool, which means any hashes
emitted there won't be randomized. Therefore, in addition to
disabling CFI for *.mod.o, also disable GCOV, as the object files
don't otherwise contain any executable code.
Fixes: cf68fffb66 ("add support for Clang CFI")
Reported-by: Joe Fradley <joefradley@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ddf56288ee ]
With GCOV_PROFILE_ALL, Clang injects __llvm_gcov_* functions to
each object file, and the functions are indirectly called during
boot. However, when code is injected to object files that are not
part of vmlinux.o, it's also not processed by objtool, which breaks
CFI hash randomization as the hashes in these files won't be
included in the .cfi_sites section and thus won't be randomized.
Similarly to commit 42633ed852 ("kbuild: Fix CFI hash
randomization with KASAN"), disable GCOV for .vmlinux.export.o and
init/version-timestamp.o to avoid emitting unnecessary functions to
object files that don't otherwise have executable code.
Fixes: 0c3e806ec0 ("x86/cfi: Add boot time hash randomization")
Reported-by: Joe Fradley <joefradley@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit fc80fc2d4e upstream.
After the listener svc_sock is freed, and before invoking svc_tcp_accept()
for the established child sock, there is a window that the newsock
retaining a freed listener svc_sock in sk_user_data which cloning from
parent. In the race window, if data is received on the newsock, we will
observe use-after-free report in svc_tcp_listen_data_ready().
Reproduce by two tasks:
1. while :; do rpc.nfsd 0 ; rpc.nfsd; done
2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done
KASAN report:
==================================================================
BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc]
Read of size 8 at addr ffff888139d96228 by task nc/102553
CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Call Trace:
<IRQ>
dump_stack_lvl+0x33/0x50
print_address_description.constprop.0+0x27/0x310
print_report+0x3e/0x70
kasan_report+0xae/0xe0
svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc]
tcp_data_queue+0x9f4/0x20e0
tcp_rcv_established+0x666/0x1f60
tcp_v4_do_rcv+0x51c/0x850
tcp_v4_rcv+0x23fc/0x2e80
ip_protocol_deliver_rcu+0x62/0x300
ip_local_deliver_finish+0x267/0x350
ip_local_deliver+0x18b/0x2d0
ip_rcv+0x2fb/0x370
__netif_receive_skb_one_core+0x166/0x1b0
process_backlog+0x24c/0x5e0
__napi_poll+0xa2/0x500
net_rx_action+0x854/0xc90
__do_softirq+0x1bb/0x5de
do_softirq+0xcb/0x100
</IRQ>
<TASK>
...
</TASK>
Allocated by task 102371:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x7b/0x90
svc_setup_socket+0x52/0x4f0 [sunrpc]
svc_addsock+0x20d/0x400 [sunrpc]
__write_ports_addfd+0x209/0x390 [nfsd]
write_ports+0x239/0x2c0 [nfsd]
nfsctl_transaction_write+0xac/0x110 [nfsd]
vfs_write+0x1c3/0xae0
ksys_write+0xed/0x1c0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Freed by task 102551:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x50
__kasan_slab_free+0x106/0x190
__kmem_cache_free+0x133/0x270
svc_xprt_free+0x1e2/0x350 [sunrpc]
svc_xprt_destroy_all+0x25a/0x440 [sunrpc]
nfsd_put+0x125/0x240 [nfsd]
nfsd_svc+0x2cb/0x3c0 [nfsd]
write_threads+0x1ac/0x2a0 [nfsd]
nfsctl_transaction_write+0xac/0x110 [nfsd]
vfs_write+0x1c3/0xae0
ksys_write+0xed/0x1c0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fix the UAF by simply doing nothing in svc_tcp_listen_data_ready()
if state != TCP_LISTEN, that will avoid dereferencing svsk for all
child socket.
Link: https://lore.kernel.org/lkml/20230507091131.23540-1-dinghui@sangfor.com.cn/
Fixes: fa9251afc3 ("SUNRPC: Call the default socket callbacks instead of open coding")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39020d8abc upstream.
At balance_level(), instead of doing a BUG_ON() in case we fail to record
tree mod log operations, do a transaction abort and return the error to
the callers. There's really no need for the BUG_ON() as we can release
all resources in this context, and we have to abort because other future
tree searches that use the tree mod log (btrfs_search_old_slot()) may get
inconsistent results if other operations modify the tree after that
failure and before the tree mod log based search.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b90ecc0379 upstream.
Currently, associating a loop device with a different file descriptor
does not increment its diskseq. This allows the following race
condition:
1. Program X opens a loop device
2. Program X gets the diskseq of the loop device.
3. Program X associates a file with the loop device.
4. Program X passes the loop device major, minor, and diskseq to
something.
5. Program X exits.
6. Program Y detaches the file from the loop device.
7. Program Y attaches a different file to the loop device.
8. The opener finally gets around to opening the loop device and checks
that the diskseq is what it expects it to be. Even though the
diskseq is the expected value, the result is that the opener is
accessing the wrong file.
From discussions with Christoph Hellwig, it appears that
disk_force_media_change() was supposed to call inc_diskseq(), but in
fact it does not. Adding a Fixes: tag to indicate this. Christoph's
Reported-by is because he stated that disk_force_media_change()
calls inc_diskseq(), which is what led me to discover that it should but
does not.
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Fixes: e6138dc12d ("block: add a helper to raise a media changed event")
Cc: stable@vger.kernel.org # 5.15+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95a55437dc upstream.
The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.
Use u64 as type for sector address and size to allow using disks up to
2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD
format allows to specify disk sizes up to 2^128 bytes (though native
OS limitations reduce this somewhat, to max 2^68 bytes), so check for
u64 overflow carefully to protect against overflowing sector_t.
This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted (now resubmitted as patch 1 of this series).
Patch 3 (this series) adds additional error checking and warning
messages. One of the error checks now makes use of the previously
unused rdb_CylBlocks field, which causes a 'sparse' warning
(cast to restricted __be32).
Annotate all 32 bit fields in affs_hardblocks.h as __be32, as the
on-disk format of RDB and partition blocks is always big endian.
Reported-by: Martin Steigerwald <Martin@lichtvoll.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Message-ID: <201206192146.09327.Martin@lichtvoll.de>
Cc: <stable@vger.kernel.org> # 5.2
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20230620201725.7020-3-schmitzmic@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6f3f28f60 upstream.
The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.
Use u64 as type for sector address and size to allow using disks up to
2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD
format allows to specify disk sizes up to 2^128 bytes (though native
OS limitations reduce this somewhat, to max 2^68 bytes), so check for
u64 overflow carefully to protect against overflowing sector_t.
Bail out if sector addresses overflow 32 bits on kernels without LBD
support.
This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted (now resubmitted as patch 1 in this series).
This patch adds additional error checking and warning messages.
Reported-by: Martin Steigerwald <Martin@lichtvoll.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Message-ID: <201206192146.09327.Martin@lichtvoll.de>
Cc: <stable@vger.kernel.org> # 5.2
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/r/20230620201725.7020-4-schmitzmic@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd55842ed9 upstream.
The PCM memory allocation helpers have a sanity check against too many
buffer allocations. However, the check is performed without a proper
lock and the allocation isn't serialized; this allows user to allocate
more memories than predefined max size.
Practically seen, this isn't really a big problem, as it's more or
less some "soft limit" as a sanity check, and it's not possible to
allocate unlimitedly. But it's still better to address this for more
consistent behavior.
The patch covers the size check in do_alloc_pages() with the
card->memory_mutex, and increases the allocated size there for
preventing the further overflow. When the actual allocation fails,
the size is decreased accordingly.
Reported-by: BassCheck <bass@buaa.edu.cn>
Reported-by: Tuo Li <islituo@gmail.com>
Link: https://lore.kernel.org/r/CADm8Tek6t0WedK+3Y6rbE5YEt19tML8BUL45N2ji4ZAz1KcN_A@mail.gmail.com
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230703112430.30634-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89dbb335cb upstream.
snd_jack_report() is supposed to be callable from an IRQ context, too,
and it's indeed used in that way from virtsnd driver. The fix for
input_dev race in commit 1b6a6fc528 ("ALSA: jack: Access input_dev
under mutex"), however, introduced a mutex lock in snd_jack_report(),
and this resulted in a potential sleep-in-atomic.
For addressing that problem, this patch changes the relevant code to
use the object get/put and removes the mutex usage. That is,
snd_jack_report(), it takes input_get_device() and leaves with
input_put_device() for assuring the input_dev being assigned.
Although the whole mutex could be reduced, we keep it because it can
be still a protection for potential races between creation and
deletion.
Fixes: 1b6a6fc528 ("ALSA: jack: Access input_dev under mutex")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/cf95f7fe-a748-4990-8378-000491b40329@moroto.mountain
Tested-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230706155357.3470-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 501e197a02 ]
The st-rng driver uses devres to register itself with the hwrng core,
the driver will be unregistered from hwrng when its device goes out of
scope. This happens after the driver's remove function is called.
However, st-rng's clock is disabled in the remove function. There's a
short timeframe where st-rng is still registered with the hwrng core
although its clock is disabled. I suppose the clock must be active to
access the hardware and serve requests from the hwrng core.
Switch to devm_clk_get_enabled and let devres disable the clock and
unregister the hwrng. This avoids the race condition.
Fixes: 3e75241be8 ("hwrng: drivers - Use device-managed registration API")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46e66dab85 ]
memory_group_register_static takes maximum number of pages as the argument
while dev_dax_kmem_probe passes total_len (in bytes) as the argument.
IIUC, I don't see any crash/panic impact as such. As,
memory_group_register_static just set the max_pages limit which is used in
auto_movable_zone_for_pfn to determine the zone.
which might cause these condition to behave differently,
This will be true always so jump will happen to kernel_zone
...
if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages))
goto kernel_zone;
...
kernel_zone:
return default_kernel_zone_for_pfn(nid, pfn, nr_pages);
Here, In below, zone_intersects compare range will be larger as nr_pages
will be higher (derived from total_len passed in dev_dax_kmem_probe).
...
static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
unsigned long nr_pages)
{
struct pglist_data *pgdat = NODE_DATA(nid);
int zid;
for (zid = 0; zid < ZONE_NORMAL; zid++) {
struct zone *zone = &pgdat->node_zones[zid];
if (zone_intersects(zone, start_pfn, nr_pages))
return zone;
}
return &pgdat->node_zones[ZONE_NORMAL];
}
Incorrect zone will be returned here, which in later time might cause bigger
problem.
Fixes: eedf634aac ("dax/kmem: use a single static memory group for a single probed unit")
Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
Link: https://lore.kernel.org/r/20230621155025.370672-1-tsahu@linux.ibm.com
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d24b170a9 ]
A CONFIG_DEBUG_KOBJECT_RELEASE test of removing a device-dax region
provider (like modprobe -r dax_hmem) yields:
kobject: 'mapping0' (ffff93eb460e8800): kobject_release, parent 0000000000000000 (delayed 2000)
[..]
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 23 PID: 282 at kernel/locking/lockdep.c:232 __lock_acquire+0x9fc/0x2260
[..]
RIP: 0010:__lock_acquire+0x9fc/0x2260
[..]
Call Trace:
<TASK>
[..]
lock_acquire+0xd4/0x2c0
? ida_free+0x62/0x130
_raw_spin_lock_irqsave+0x47/0x70
? ida_free+0x62/0x130
ida_free+0x62/0x130
dax_mapping_release+0x1f/0x30
device_release+0x36/0x90
kobject_delayed_cleanup+0x46/0x150
Due to attempting ida_free() on an ida object that has already been
freed. Devices typically only hold a reference on their parent while
registered. If a child needs a parent object to complete its release it
needs to hold a reference that it drops from its release callback.
Arrange for a dax_mapping to pin its parent dev_dax instance until
dax_mapping_release().
Fixes: 0b07ce872a ("device-dax: introduce 'mapping' devices")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/168577283412.1672036.16111545266174261446.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da787d5b74 ]
In case if all existing file handles are deferred handles and if all of
them gets closed due to handle lease break then we dont need to send
lease break acknowledgment to server, because last handle close will be
considered as lease break ack.
After closing deferred handels, we check for openfile list of inode,
if its empty then we skip sending lease break ack.
Fixes: 59a556aebc ("SMB3: drop reference to cfile before sending oplock break")
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c907e72f58 ]
When the client received NFS4ERR_BADSESSION, it schedules recovery
and start the state manager thread which in turn freezes the
session table and does not allow for any new requests to use the
no-longer valid session. However, it is possible that before
the state manager thread runs, a new operation would use the
released slot that received BADSESSION and was therefore not
updated its sequence number. Such re-use of the slot can lead
the application errors.
Fixes: 5c441544f0 ("NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f7ab33689 ]
Currently, the list_lru::shrinker_id corresponding to the nfs4_xattr
shrinkers is wrong:
>>> prog["nfs4_xattr_cache_lru"].shrinker_id
(int)0
>>> prog["nfs4_xattr_entry_lru"].shrinker_id
(int)0
>>> prog["nfs4_xattr_large_entry_lru"].shrinker_id
(int)0
>>> prog["nfs4_xattr_cache_shrinker"].id
(int)18
>>> prog["nfs4_xattr_entry_shrinker"].id
(int)19
>>> prog["nfs4_xattr_large_entry_shrinker"].id
(int)20
This is not what we expect, which will cause these shrinkers
not to be found in shrink_slab_memcg().
We should assign shrinker::id before calling list_lru_init_memcg(),
so that the corresponding list_lru::shrinker_id will be assigned
the correct value like below:
>>> prog["nfs4_xattr_cache_lru"].shrinker_id
(int)16
>>> prog["nfs4_xattr_entry_lru"].shrinker_id
(int)17
>>> prog["nfs4_xattr_large_entry_lru"].shrinker_id
(int)18
>>> prog["nfs4_xattr_cache_shrinker"].id
(int)16
>>> prog["nfs4_xattr_entry_shrinker"].id
(int)17
>>> prog["nfs4_xattr_large_entry_shrinker"].id
(int)18
So just do it.
Fixes: 95ad37f90c ("NFSv4.2: add client side xattr caching.")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92e2921eea ]
ASM_NL is useful not only in *.S files but also in .c files for using
inline assembler in C code.
On ARC, however, ASM_NL is evaluated inconsistently. It is expanded to
a backquote (`) in *.S files, but a semicolon (;) in *.c files because
arch/arc/include/asm/linkage.h defines it inside #ifdef __ASSEMBLY__,
so the definition for C code falls back to the default value defined in
include/linux/linkage.h.
If ASM_NL is used in inline assembler in .c files, it will result in
wrong assembly code because a semicolon is not an instruction separator,
but the start of a comment for ARC.
Move ASM_NL (also __ALIGN and __ALIGN_STR) out of the #ifdef.
Fixes: 9df62f0544 ("arch: use ASM_NL instead of ';' for assembler new line character in the macro")
Fixes: 8d92e992a7 ("ARC: define __ALIGN_STR and __ALIGN symbols for ARC")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a3f1e573a ]
The > comparison should be >= to prevent an out of bounds array
access.
Fixes: 52dc0595d5 ("modpost: handle relocations mismatch in __ex_table.")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec336aa831 ]
The backslash characters escaping '$' in the command to sed (intended to
prevent it from interpreting '$' as "end-of-line") are currently being
consumed by the Shell (where they mean that sh should not evaluate what
follows '$' as a variable name). This means that
sed -e "/ \$/d"
executes the script
/ $/d
instead of the intended
/ \$/d
So escape twice in mksysmap any '$' that actually needs to reach sed
escaped so that the backslash survives the Shell.
Fixes: c4802044a0 ("scripts/mksysmap: use sed with in-line comments")
Fixes: 320e7c9d44 ("scripts/kallsyms: move compiler-generated symbol patterns to mksysmap")
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d23659769a ]
With the update of the permanent and intermittent health errors, the
actual indicator for the health test indicates a potential error only
for the one offending time stamp gathered in the current iteration
round. The next iteration round will "overwrite" the health test result.
Thus, the entropy collection loop in jent_gen_entropy checks for
the health test failure upon each loop iteration. However, the
initialization operation checked for the APT health test once for
an APT window which implies it would not catch most errors.
Thus, the check for all health errors is now invoked unconditionally
during each loop iteration for the startup test.
With the change, the error JENT_ERCT becomes unused as all health
errors are only reported with the JENT_HEALTH return code. This
allows the removal of the error indicator.
Fixes: 3fde2fe99a ("crypto: jitter - permanent and intermittent health errors"
)
Reported-by: Joachim Vandersmissen <git@jvdsn.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit efbc7764c4 ]
Commit df8fc4e934 ("kbuild: Enable -fstrict-flex-arrays=3") uncovered
a type mismatch in cesa 3des support that leads to a memcpy beyond the
end of a structure:
In function 'fortify_memcpy_chk',
inlined from 'mv_cesa_des3_ede_setkey' at drivers/crypto/marvell/cesa/cipher.c:307:2:
include/linux/fortify-string.h:583:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
583 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is probably harmless as the actual data that is copied has the correct
type, but clearly worth fixing nonetheless.
Fixes: 4ada483978 ("crypto: marvell/cesa - add Triple-DES support")
Cc: Kees Cook <keescook@chromium.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56a24b8ce6 ]
addend_arm_rel() processes R_ARM_PC24, R_ARM_CALL, R_ARM_JUMP24 in a
wrong way.
Here, test code.
[test code for R_ARM_JUMP24]
.section .init.text,"ax"
bar:
bx lr
.section .text,"ax"
.globl foo
foo:
b bar
[test code for R_ARM_CALL]
.section .init.text,"ax"
bar:
bx lr
.section .text,"ax"
.globl foo
foo:
push {lr}
bl bar
pop {pc}
If you compile it with ARM multi_v7_defconfig, modpost will show the
symbol name, (unknown).
WARNING: modpost: vmlinux.o: section mismatch in reference: foo (section: .text) -> (unknown) (section: .init.text)
(You need to use GNU linker instead of LLD to reproduce it.)
Fix the code to make modpost show the correct symbol name.
I imported (with adjustment) sign_extend32() from include/linux/bitops.h.
The '+8' is the compensation for pc-relative instruction. It is
documented in "ELF for the Arm Architecture" [1].
"If the relocation is pc-relative then compensation for the PC bias
(the PC value is 8 bytes ahead of the executing instruction in Arm
state and 4 bytes in Thumb state) must be encoded in the relocation
by the object producer."
[1]: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst
Fixes: 56a974fa2d ("kbuild: make better section mismatch reports on arm")
Fixes: 6e2e340b59 ("ARM: 7324/1: modpost: Fix section warnings for ARM for many compilers")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7c63520f6 ]
addend_arm_rel() processes R_ARM_ABS32 in a wrong way.
Here, test code.
[test code 1]
#include <linux/init.h>
int __initdata foo;
int get_foo(void) { return foo; }
If you compile it with ARM versatile_defconfig, modpost will show the
symbol name, (unknown).
WARNING: modpost: vmlinux.o: section mismatch in reference: get_foo (section: .text) -> (unknown) (section: .init.data)
(You need to use GNU linker instead of LLD to reproduce it.)
If you compile it for other architectures, modpost will show the correct
symbol name.
WARNING: modpost: vmlinux.o: section mismatch in reference: get_foo (section: .text) -> foo (section: .init.data)
For R_ARM_ABS32, addend_arm_rel() sets r->r_addend to a wrong value.
I just mimicked the code in arch/arm/kernel/module.c.
However, there is more difficulty for ARM.
Here, test code.
[test code 2]
#include <linux/init.h>
int __initdata foo;
int get_foo(void) { return foo; }
int __initdata bar;
int get_bar(void) { return bar; }
With this commit applied, modpost will show the following messages
for ARM versatile_defconfig:
WARNING: modpost: vmlinux.o: section mismatch in reference: get_foo (section: .text) -> foo (section: .init.data)
WARNING: modpost: vmlinux.o: section mismatch in reference: get_bar (section: .text) -> foo (section: .init.data)
The reference from 'get_bar' to 'foo' seems wrong.
I have no solution for this because it is true in assembly level.
In the following output, relocation at 0x1c is no longer associated
with 'bar'. The two relocation entries point to the same symbol, and
the offset to 'bar' is encoded in the instruction 'r0, [r3, #4]'.
Disassembly of section .text:
00000000 <get_foo>:
0: e59f3004 ldr r3, [pc, #4] @ c <get_foo+0xc>
4: e5930000 ldr r0, [r3]
8: e12fff1e bx lr
c: 00000000 .word 0x00000000
00000010 <get_bar>:
10: e59f3004 ldr r3, [pc, #4] @ 1c <get_bar+0xc>
14: e5930004 ldr r0, [r3, #4]
18: e12fff1e bx lr
1c: 00000000 .word 0x00000000
Relocation section '.rel.text' at offset 0x244 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0000000c 00000c02 R_ARM_ABS32 00000000 .init.data
0000001c 00000c02 R_ARM_ABS32 00000000 .init.data
When find_elf_symbol() gets into a situation where relsym->st_name is
zero, there is no guarantee to get the symbol name as written in C.
I am keeping the current logic because it is useful in many architectures,
but the symbol name is not always correct depending on the optimization.
I left some comments in find_tosym().
Fixes: 56a974fa2d ("kbuild: make better section mismatch reports on arm")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b04b076fb5 ]
Fix build warnings when DEBUG_FS is not enabled by using an empty
do-while loop instead of a value:
In file included from ../drivers/crypto/nx/nx.c:27:
../drivers/crypto/nx/nx.c: In function 'nx_register_algs':
../drivers/crypto/nx/nx.h:173:33: warning: statement with no effect [-Wunused-value]
173 | #define NX_DEBUGFS_INIT(drv) (0)
../drivers/crypto/nx/nx.c:573:9: note: in expansion of macro 'NX_DEBUGFS_INIT'
573 | NX_DEBUGFS_INIT(&nx_driver);
../drivers/crypto/nx/nx.c: In function 'nx_remove':
../drivers/crypto/nx/nx.h:174:33: warning: statement with no effect [-Wunused-value]
174 | #define NX_DEBUGFS_FINI(drv) (0)
../drivers/crypto/nx/nx.c:793:17: note: in expansion of macro 'NX_DEBUGFS_FINI'
793 | NX_DEBUGFS_FINI(&nx_driver);
Also, there is no need to build nx_debugfs.o when DEBUG_FS is not
enabled, so change the Makefile to accommodate that.
Fixes: ae0222b728 ("powerpc/crypto: nx driver code supporting nx encryption")
Fixes: aef7b31c88 ("powerpc/crypto: Build files for the nx device driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Breno Leitão <leitao@debian.org>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: Paulo Flabiano Smorigo <pfsmorigo@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d0acc76a49 ]
find_extable_entry_size() is completely broken. It has awesome comments
about how to calculate sizeof(struct exception_table_entry).
It was based on these assumptions:
- struct exception_table_entry has two fields
- both of the fields have the same size
Then, we came up with this equation:
(offset of the second field) * 2 == (size of struct)
It was true for all architectures when commit 52dc0595d5 ("modpost:
handle relocations mismatch in __ex_table.") was applied.
Our mathematics broke when commit 548acf1923 ("x86/mm: Expand the
exception table logic to allow new handling options") introduced the
third field.
Now, the definition of exception_table_entry is highly arch-dependent.
For x86, sizeof(struct exception_table_entry) is apparently 12, but
find_extable_entry_size() sets extable_entry_size to 8.
I could fix it, but I do not see much value in this code.
extable_entry_size is used just for selecting a slightly different
error message.
If the first field ("insn") references to a non-executable section,
The relocation at %s+0x%lx references
section "%s" which is not executable, IOW
it is not possible for the kernel to fault
at that address. Something is seriously wrong
and should be fixed.
If the second field ("fixup") references to a non-executable section,
The relocation at %s+0x%lx references
section "%s" which is not executable, IOW
the kernel will fault if it ever tries to
jump to it. Something is seriously wrong
and should be fixed.
Merge the two error messages rather than adding even more complexity.
Change fatal() to error() to make it continue running and catch more
possible errors.
Fixes: 548acf1923 ("x86/mm: Expand the exception table logic to allow new handling options")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac52578d6e ]
The virtio rng device kicks off a new entropy request whenever the
data available reaches zero. When a new request occurs at the end
of a read operation, that is, when the result of that request is
only needed by the next reader, then there is a race between the
writing of the new data and the next reader.
This is because there is no synchronisation whatsoever between the
writer and the reader.
Fix this by writing data_avail with smp_store_release and reading
it with smp_load_acquire when we first enter read. The subsequent
reads are safe because they're either protected by the first load
acquire, or by the completion mechanism.
Also remove the redundant zeroing of data_idx in random_recv_done
(data_idx must already be zero at this point) and data_avail in
request_entropy (ditto).
Reported-by: syzbot+726dc8c62c3536431ceb@syzkaller.appspotmail.com
Fixes: f7f510ec19 ("virtio: An entropy device, as suggested by hpa.")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff598081e5 ]
The pointer to mdev_bus_compat_class is statically defined at the top
of mdev_core, and was originally (commit 7b96953bc6 ("vfio: Mediated
device Core driver") serialized by the parent_list_lock. The blamed
commit removed this mutex, leaving the pointer initialization
unserialized. As a result, the creation of multiple MDEVs in parallel
(such as during boot) can encounter errors during the creation of the
sysfs entries, such as:
[ 8.337509] sysfs: cannot create duplicate filename '/class/mdev_bus'
[ 8.337514] vfio_ccw 0.0.01d8: MDEV: Registered
[ 8.337516] CPU: 13 PID: 946 Comm: driverctl Not tainted 6.4.0-rc7 #20
[ 8.337522] Hardware name: IBM 3906 M05 780 (LPAR)
[ 8.337525] Call Trace:
[ 8.337528] [<0000000162b0145a>] dump_stack_lvl+0x62/0x80
[ 8.337540] [<00000001622aeb30>] sysfs_warn_dup+0x78/0x88
[ 8.337549] [<00000001622aeca6>] sysfs_create_dir_ns+0xe6/0xf8
[ 8.337552] [<0000000162b04504>] kobject_add_internal+0xf4/0x340
[ 8.337557] [<0000000162b04d48>] kobject_add+0x78/0xd0
[ 8.337561] [<0000000162b04e0a>] kobject_create_and_add+0x6a/0xb8
[ 8.337565] [<00000001627a110e>] class_compat_register+0x5e/0x90
[ 8.337572] [<000003ff7fd815da>] mdev_register_parent+0x102/0x130 [mdev]
[ 8.337581] [<000003ff7fdc7f2c>] vfio_ccw_sch_probe+0xe4/0x178 [vfio_ccw]
[ 8.337588] [<0000000162a7833c>] css_probe+0x44/0x80
[ 8.337599] [<000000016279f4da>] really_probe+0xd2/0x460
[ 8.337603] [<000000016279fa08>] driver_probe_device+0x40/0xf0
[ 8.337606] [<000000016279fb78>] __device_attach_driver+0xc0/0x140
[ 8.337610] [<000000016279cbe0>] bus_for_each_drv+0x90/0xd8
[ 8.337618] [<00000001627a00b0>] __device_attach+0x110/0x190
[ 8.337621] [<000000016279c7c8>] bus_rescan_devices_helper+0x60/0xb0
[ 8.337626] [<000000016279cd48>] drivers_probe_store+0x48/0x80
[ 8.337632] [<00000001622ac9b0>] kernfs_fop_write_iter+0x138/0x1f0
[ 8.337635] [<00000001621e5e14>] vfs_write+0x1ac/0x2f8
[ 8.337645] [<00000001621e61d8>] ksys_write+0x70/0x100
[ 8.337650] [<0000000162b2bdc4>] __do_syscall+0x1d4/0x200
[ 8.337656] [<0000000162b3c828>] system_call+0x70/0x98
[ 8.337664] kobject: kobject_add_internal failed for mdev_bus with -EEXIST, don't try to register things with the same name in the same directory.
[ 8.337668] kobject: kobject_create_and_add: kobject_add error: -17
[ 8.337674] vfio_ccw: probe of 0.0.01d9 failed with error -12
[ 8.342941] vfio_ccw_mdev aeb9ca91-10c6-42bc-a168-320023570aea: Adding to iommu group 2
Move the initialization of the mdev_bus_compat_class pointer to the
init path, to match the cleanup in module exit. This way the code
in mdev_register_parent() can simply link the new parent to it,
rather than determining whether initialization is required first.
Fixes: 89345d5177 ("vfio/mdev: embedd struct mdev_parent in the parent data structure")
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230626133642.2939168-1-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 91afbaafd6 ]
During hibernation or restoration, freeze_secondary_cpus
checks num_online_cpus via BUG_ON, and the subsequent
save_processor_state also does the checking with WARN_ON.
In the case of CONFIG_PM_SLEEP_SMP=n, freeze_secondary_cpus
is not defined, but the sole possible condition to disable
CONFIG_PM_SLEEP_SMP is !SMP where num_online_cpus is always 1.
We also don't have to check it in save_processor_state.
So remove the unnecessary checking in save_processor_state.
Fixes: c031721001 ("RISC-V: Add arch functions to support hibernation/suspend-to-disk")
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230609075049.2651723-4-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b684c09f09 ]
ppc_save_regs() skips one stack frame while saving the CPU register states.
Instead of saving current R1, it pulls the previous stack frame pointer.
When vmcores caused by direct panic call (such as `echo c >
/proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
backtrace correctly. On further analysis, it was found that it was because
of mismatch between r1 and NIP.
GDB uses NIP to get current function symbol and uses corresponding debug
info of that function to unwind previous frames, but due to the
mismatching r1 and NIP, the unwinding does not work, and it fails to
unwind to the 2nd frame and hence does not show the backtrace.
GDB backtrace with vmcore of kernel without this patch:
---------
(gdb) bt
#0 0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>,
newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
#1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974
#2 0x0000000000000063 in ?? ()
#3 0xc000000003579320 in ?? ()
---------
Further analysis revealed that the mismatch occurred because
"ppc_save_regs" was saving the previous stack's SP instead of the current
r1. This patch fixes this by storing current r1 in the saved pt_regs.
GDB backtrace with vmcore of patched kernel:
--------
(gdb) bt
#0 0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8)
at ./arch/powerpc/include/asm/kexec.h:69
#1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
#2 0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n")
at kernel/panic.c:358
#3 0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155
#4 0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false)
at drivers/tty/sysrq.c:602
#5 0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>,
count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163
#6 0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>,
buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340
#7 proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>,
ppos=<optimized out>) at fs/proc/inode.c:352
#8 0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00,
buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>,
count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582
#9 0xc0000000005b4264 in ksys_write (fd=<optimized out>,
buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2)
at fs/read_write.c:637
#10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>)
at arch/powerpc/kernel/syscall.c:171
#11 0xc00000000000c270 in system_call_vectored_common ()
at arch/powerpc/kernel/interrupt_64.S:192
--------
Nick adds:
So this now saves regs as though it was an interrupt taken in the
caller, at the instruction after the call to ppc_save_regs, whereas
previously the NIP was there, but R1 came from the caller's caller and
that mismatch is what causes gdb's dwarf unwinder to go haywire.
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Fixes: d16a58f885 ("powerpc: Improve ppc_save_regs()")
Reivewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230615091047.90433-1-adityag@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4f913c980 ]
Currently pointer iov is being dereferenced before the null check of iov
which can lead to null pointer dereference errors. Fix this by moving the
iov null check before the dereferencing.
Detected using cppcheck static analysis:
linux/arch/powerpc/platforms/powernv/pci-sriov.c:597:12: warning: Either
the condition '!iov' is redundant or there is possible null pointer
dereference: iov. [nullPointerRedundantCheck]
num_vfs = iov->num_vfs;
^
Fixes: 052da31d45 ("powerpc/powernv/sriov: De-indent setup and teardown")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230608095849.1147969-1-colin.i.king@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0fef6bb730 ]
In MCQ mode, when a device command uses a hardware queue shared with other
commands, a race condition may occur in the following scenario:
1. A device command is completed in CQx with CQE entry "e".
2. The interrupt handler copies the "cqe" pointer to "hba->dev_cmd.cqe"
and completes "hba->dev_cmd.complete".
3. The "ufshcd_wait_for_dev_cmd()" function is awakened and retrieves the
OCS value from "hba->dev_cmd.cqe".
However, there is a possibility that the CQE entry "e" will be overwritten
by newly completed commands in CQx, resulting in an incorrect OCS value
being received by "ufshcd_wait_for_dev_cmd()".
To avoid this race condition, the OCS value should be immediately copied to
the struct "lrb" of the device command. Then "ufshcd_wait_for_dev_cmd()"
can retrieve the OCS value from the struct "lrb".
Fixes: 57b1c0ef89 ("scsi: ufs: core: mcq: Add support to allocate multiple queues")
Suggested-by: Can Guo <quic_cang@quicinc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Link: https://lore.kernel.org/r/20230610021553.1213-2-powen.kao@mediatek.com
Tested-by: Po-Wen Kao <powen.kao@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c3ac3b0779 ]
Test "perf script task-analyzer tests" fails in environment with missing
libtraceevent support, as perf record fails to create the perf.data
file, which further tests depend on.
Instead, when perf is not compiled with libtraceevent support, skip
those tests instead of failing them, by checking the output of `perf
record --dry-run` to see if it prints the error "libtraceevent is
necessary for tracepoint support"
For the following output, perf compiled with: `make NO_LIBTRACEEVENT=1`
Before the patch:
108: perf script task-analyzer tests :
test child forked, pid 24105
failed to open perf.data: No such file or directory (try 'perf record' first)
FAIL: "invokation of perf script report task-analyzer command failed" Error message: ""
FAIL: "test_basic" Error message: "Failed to find required string:'Comm'."
failed to open perf.data: No such file or directory (try 'perf record' first)
FAIL: "invokation of perf script report task-analyzer --ns --rename-comms-by-tids 0:random command failed" Error message: ""
FAIL: "test_ns_rename" Error message: "Failed to find required string:'Comm'."
failed to open perf.data: No such file or directory (try 'perf record' first)
<...>
perf script task-analyzer tests: FAILED!
With this patch, the script instead returns 2 signifying SKIP, and after
the patch:
108: perf script task-analyzer tests :
test child forked, pid 26010
libtraceevent is necessary for tracepoint support
WARN: Skipping tests. No libtraceevent support
test child finished with -2
perf script task-analyzer tests: Skip
Fixes: e8478b84d6 ("perf test: Add new task-analyzer tests")
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Petar Gligoric <petar.gligoric@rohde-schwarz.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230613164145.50488-18-atrajeev@linux.vnet.ibm.com
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f999e23ce6 ]
Fix issues identified in dytc_profile_refresh identified by lkp-tests.
drivers/platform/x86/thinkpad_acpi.c:10538
dytc_profile_refresh() error: uninitialized symbol 'funcmode'.
drivers/platform/x86/thinkpad_acpi.c:10531
dytc_profile_refresh() error: uninitialized symbol 'output'.
drivers/platform/x86/thinkpad_acpi.c:10537
dytc_profile_refresh() error: uninitialized symbol 'output'.
These issues should not lead to real problems in the field as the refresh
function should only be called if MMC or PSC mode enabled. But good to fix.
Thanks to Dan Carpenter and the lkp-tests project for flagging these.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202306011202.1hbgLRD4-lkp@intel.com/
Fixes: 1bc5d819f0 ("platform/x86: thinkpad_acpi: Fix profile modes on Intel platforms")
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20230606151804.8819-1-mpearson-lenovo@squebb.ca
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 36d3e4138e ]
When printing output we may want to generate per event files, where the
--per-event-dump option should be used, creating perf.data.EVENT.dump
files instead of printing to stdout.
The callback thar processes event thus expects that evsel->priv->fp
should point to either the per-event FILE descriptor or to stdout.
The a3af66f51b ("perf script: Fix crash because of missing
evsel->priv") changeset fixed a case where evsel->priv wasn't setup,
thus set to NULL, causing a segfault when trying to access
evsel->priv->fp.
But it did it for the non --per-event-dump case by allocating a 'struct
perf_evsel_script' just to set its ->fp to stdout.
Since evsel->priv is only freed when --per-event-dump is used, we ended
up with a memory leak, detected using ASAN.
Fix it by using the same method as perf_script__setup_per_event_dump(),
and reuse that static 'struct perf_evsel_script'.
Also check if evsel_script__new() failed.
Fixes: a3af66f51b ("perf script: Fix crash because of missing evsel->priv")
Reported-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/lkml/ZH+F0wGAWV14zvMP@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a03b1a0b19 ]
Looking at generated code for handle_signal32() shows calls to a
function called __unsafe_save_user_regs.constprop.0 while user access
is open.
And that __unsafe_save_user_regs.constprop.0 function has two nops at
the begining, allowing it to be traced, which is unexpected during
user access open window.
The solution could be to mark __unsafe_save_user_regs() no trace, but
to be on the safe side the most efficient is to flag it __always_inline
as already done for function __unsafe_restore_general_regs(). The
function is relatively small and only called twice, so the size
increase will remain in the noise.
Do the same with save_tm_user_regs_unsafe() as it may suffer the
same issue.
Fixes: ef75e73182 ("powerpc/signal32: Transform save_user_regs() and save_tm_user_regs() in 'unsafe' version")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7e469c8f01860a69c1ada3ca6a5e2aa65f0f74b2.1685955220.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0eb089a72f ]
A disassembly of interrupt_exit_kernel_prepare() shows a useless read
of MSR register. This is shown by r9 being re-used immediately without
doing anything with the value read.
c000e0e0: 60 00 00 00 nop
c000e0e4: 7d 3a c2 a6 mfmd_ap r9
c000e0e8: 7d 20 00 a6 mfmsr r9
c000e0ec: 7c 51 13 a6 mtspr 81,r2
c000e0f0: 81 3f 00 84 lwz r9,132(r31)
c000e0f4: 71 29 80 00 andi. r9,r9,32768
This is due to the use of local_irq_save(). The flags read by
local_irq_save() are never used, use local_irq_disable() instead.
Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/df36c6205ab64326fb1b991993c82057e92ace2f.1685955214.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 353e7300a1 ]
Activating KCSAN on a 32 bits architecture leads to the following
link-time failure:
LD .tmp_vmlinux.kallsyms1
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_load':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_load_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_store':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_store_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_exchange':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_exchange_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_add':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_add_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_sub':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_sub_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_and':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_and_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_or':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_or_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_xor':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_xor_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_nand':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_nand_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_strong':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_weak':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_val':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
32 bits architectures don't have 64 bits atomic builtins. Only
include DEFINE_TSAN_ATOMIC_OPS(64) on 64 bits architectures.
Fixes: 0f8ad5f2e9 ("kcsan: Add support for atomic builtins")
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d9c6afc28d0855240171a4e0ad9ffcdb9d07fceb.1683892665.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 416a87c972 ]
commit c5ad454a12 ("platform/x86: intel/pmc/core: Add Meteor Lake
support to pmc core driver") was supposed to add support for Meter
Lake P/M and mistakenly added support for Meteor Lake S instead. Meteor
Lake P/M support was added later and MTL-S support needs to be removed
since its currently assigned to the wrong register maps.
Fixes: c5ad454a12 ("platform/x86: intel/pmc/core: Add Meteor Lake support to pmc core driver")
Signed-off-by: Xi Pardee <xi.pardee@intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20230601004706.871528-1-xi.pardee@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5835196a17 ]
Currently the getter returns ENOTSUPP on pin configured in
the push-pull mode. Fix this by adding the missed switch case.
Fixes: ccdf81d08d ("pinctrl: cherryview: add option to set open-drain pin config")
Fixes: 6e08d6bbeb ("pinctrl: Add Intel Cherryview/Braswell pin controller support")
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fad5723350 ]
The function table is filled with group information based on other
instance-specific data at runtime. However, the function table can be
shared between multiple instances, causing the ->probe() function for
one instance to overwrite the table of a previously probed instance.
Fix this by sharing only the function names and allocating a separate
function table for each instance.
Fixes: 5a00473607 ("pinctrl: tegra: Separate Tegra194 instances")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20230530105308.1292852-1-thierry.reding@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 549e91a9bb ]
ufshcd_queuecommand() may be called two times in a row for a SCSI command
before it is completed. Hence make the following changes:
- In the functions that submit a command, do not check the old value of
lrbp->cmd nor clear lrbp->cmd in error paths.
- In ufshcd_release_scsi_cmd(), do not clear lrbp->cmd.
See also scsi_send_eh_cmnd().
This commit prevents that the following appears if a command times out:
WARNING: at drivers/ufs/core/ufshcd.c:2965 ufshcd_queuecommand+0x6f8/0x9a8
Call trace:
ufshcd_queuecommand+0x6f8/0x9a8
scsi_send_eh_cmnd+0x2c0/0x960
scsi_eh_test_devices+0x100/0x314
scsi_eh_ready_devs+0xd90/0x114c
scsi_error_handler+0x2b4/0xb70
kthread+0x16c/0x1e0
Fixes: 5a0b0cb9be ("[SCSI] ufs: Add support for sending NOP OUT UPIU")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230524203659.1394307-3-bvanassche@acm.org
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe8637f770 ]
One UFS vendor asked to increase the UFS timeout from 1 s to 3 s. Another
UFS vendor asked to increase the UFS timeout from 1 s to 10 s. Hence this
patch that increases the UFS timeout to 10 s. This patch can cause the
total timeout to exceed 20 s, the Android shutdown timeout. This is fine
since the loop around ufshcd_execute_start_stop() exists to deal with unit
attentions and because unit attentions are reported quickly.
Fixes: dcd5b7637c ("scsi: ufs: Reduce the START STOP UNIT timeout")
Fixes: 8f2c96420c ("scsi: ufs: core: Reduce the power mode change timeout")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230524203659.1394307-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9914a3d033 ]
When NPIV ports are zoned to devices that support both initiator and target
mode, a remote device's initiated PRLI results in unintended final kref
clean up of the device's ndlp structure. This disrupts NPIV ports'
discovery for target devices that support both initiator and target mode.
Modify the NPIV lpfc_drop_node clause such that we allow the ndlp to live
so long as it was in NLP_STE_PLOGI_ISSUE, NLP_STE_REG_LOGIN_ISSUE, or
NLP_STE_PRLI_ISSUE nlp_state. This allows lpfc's issued PRLI completion
routine to determine if the final kref clean up should execute rather than
a remote device's issued PRLI.
Fixes: db651ec225 ("scsi: lpfc: Correct used_rpi count when devloss tmo fires with no recovery")
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c60738de85 ]
Smatch reported:
1. drivers/pci/controller/pci-ftpci100.c:526 faraday_pci_probe() warn:
'clk' from clk_prepare_enable() not released on lines: 442,451,462,478,512,517.
2. drivers/pci/controller/pci-ftpci100.c:526 faraday_pci_probe() warn:
'p->bus_clk' from clk_prepare_enable() not released on lines: 451,462,478,512,517.
The clock resource is obtained by devm_clk_get(), and then
clk_prepare_enable() makes the clock resource ready for use. After that,
clk_disable_unprepare() should be called to release the clock resource
when it is no longer needed. However, while doing some error handling
in faraday_pci_probe(), clk_disable_unprepare() is not called to release
clk and p->bus_clk before returning. These return lines are exactly 442,
451, 462, 478, 512, 517.
Fix this warning by replacing devm_clk_get() with devm_clk_get_enabled(),
which is equivalent to devm_clk_get() + clk_prepare_enable(). And with
devm_clk_get_enabled(), the clock will automatically be disabled,
unprepared and freed when the device is unbound from the bus.
Link: https://lore.kernel.org/r/20230508043641.23807-1-yejunyan@hust.edu.cn
Fixes: b3c433efb8 ("PCI: faraday: Fix wrong pointer passed to PTR_ERR()")
Fixes: 2eeb02b285 ("PCI: faraday: Add clock handling")
Fixes: 783a862563 ("PCI: faraday: Use pci_parse_request_of_pci_ranges()")
Fixes: d3c68e0a7e ("PCI: faraday: Add Faraday Technology FTPCI100 PCI Host Bridge driver")
Fixes: f1e8bd21e3 ("PCI: faraday: Convert IRQ masking to raw PCI config accessors")
Signed-off-by: Junyan Ye <yejunyan@hust.edu.cn>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8afd0d9fc ]
If a PCIe hotplug slot has an Attention Button, the normal hot-add flow is:
- Slot is empty and slot power is off
- User inserts card in slot and presses Attention Button
- OS blinks Power Indicator for 5 seconds
- After 5 seconds, OS turns on Power Indicator, turns on slot power, and
enumerates the device
Previously, if a user pressed the Attention Button on an *empty* slot,
pciehp logged the following messages and blinked the Power Indicator
until a second button press:
[0.000] pciehp: Button press: will power on in 5 sec
[0.001] # Power Indicator starts blinking
[5.001] # 5 second timeout; slot is empty, so we should cancel the
request to power on and turn off Power Indicator
[7.000] # Power Indicator still blinking
[8.000] # possible card insertion
[9.000] pciehp: Button press: canceling request to power on
The first button press incorrectly left the slot in BLINKINGON_STATE, so
the second was interpreted as a "cancel power on" event regardless of
whether a card was present.
If the slot is empty, turn off the Power Indicator and return from
BLINKINGON_STATE to OFF_STATE after 5 seconds, effectively canceling the
request to power on. Putting the slot in OFF_STATE also means the second
button press will correctly request a slot power on if the slot is
occupied.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20230512021518.336460-1-clementwei90@163.com
Fixes: d331710ea7 ("PCI: pciehp: Become resilient to missed events")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 456d8aa37d ]
Struct pcie_link_state->downstream is a pointer to the pci_dev of function
0. Previously we retained that pointer when removing function 0, and
subsequent ASPM policy changes dereferenced it, resulting in a
use-after-free warning from KASAN, e.g.:
# echo 1 > /sys/bus/pci/devices/0000:03:00.0/remove
# echo powersave > /sys/module/pcie_aspm/parameters/policy
BUG: KASAN: slab-use-after-free in pcie_config_aspm_link+0x42d/0x500
Call Trace:
kasan_report+0xae/0xe0
pcie_config_aspm_link+0x42d/0x500
pcie_aspm_set_policy+0x8e/0x1a0
param_attr_store+0x162/0x2c0
module_attr_store+0x3e/0x80
PCIe spec r6.0, sec 7.5.3.7, recommends that software program the same ASPM
Control value in all functions of multi-function devices.
Disable ASPM and free the pcie_link_state when any child function is
removed so we can discard the dangling pcie_link_state->downstream pointer
and maintain the same ASPM Control configuration for all functions.
[bhelgaas: commit log and comment]
Debugged-by: Zongquan Qin <qinzongquan@sangfor.com.cn>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Fixes: b5a0a9b59c ("PCI/ASPM: Read and set up L1 substate capabilities")
Link: https://lore.kernel.org/r/20230507034057.20970-1-dinghui@sangfor.com.cn
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 711bcc0cb3 ]
Ensure that both the keyboard touchscreen and the digitizer have their
driver bound after remove(). Without this modprobing lenovo-yogabook-wmi
after a rmmod fails because lenovo-yogabook-wmi defers probing until
both devices have their driver bound.
Fixes: c0549b72d9 ("platform/x86: lenovo-yogabook-wmi: Add driver for Lenovo Yoga Book")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230430165807.472798-4-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9148cd2eb4 ]
When yogabook_wmi_remove() runs yogabook_wmi_work might still be running
and using the devices which yogabook_wmi_remove() puts.
To avoid this move to explicitly cancelling the work rather then using
devm_work_autocancel().
This requires also making the yogabook_backside_hall_irq handler non
devm managed, so that it cannot re-queue the work while
yogabook_wmi_remove() runs.
Fixes: c0549b72d9 ("platform/x86: lenovo-yogabook-wmi: Add driver for Lenovo Yoga Book")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230430165807.472798-3-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f025312b08 ]
Smatch reported:
drivers/scsi/qedf/qedf_main.c:3056 qedf_alloc_global_queues()
warn: missing unwind goto?
At this point in the function, nothing has been allocated so we can return
directly. In particular the "qedf->global_queues" have not been allocated
so calling qedf_free_global_queues() will lead to a NULL dereference when
we check if (!gl[i]) and "gl" is NULL.
Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Signed-off-by: Jinhong Zhu <jinhongzhu@hust.edu.cn>
Link: https://lore.kernel.org/r/20230502140022.2852-1-jinhongzhu@hust.edu.cn
Reviewed-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b61cf04c49 ]
VMD driver can disable or enable MSI remapping by changing
VMCONFIG_MSI_REMAP register. This register needs to be set to the
default value during soft reboots. Drives failed to enumerate
when Windows boots after performing a soft reboot from Linux.
Windows doesn't support MSI remapping disable feature and stale
register value hinders Windows VMD driver initialization process.
Adding vmd_shutdown function to make sure to set the VMCONFIG
register to the default value.
Link: https://lore.kernel.org/r/20230224202811.644370-1-nirmal.patel@linux.intel.com
Fixes: ee81ee84f8 ("PCI: vmd: Disable MSI-X remapping when possible")
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e12f83023 ]
The Link Retraining process is initiated to account for the Gen2 defect in
the Cadence PCIe controller in J721E SoC. The errata corresponding to this
is i2085, documented at:
https://www.ti.com/lit/er/sprz455c/sprz455c.pdf
The existing workaround implemented for the errata waits for the Data Link
initialization to complete and assumes that the link retraining process
at the Physical Layer has completed. However, it is possible that the
Physical Layer training might be ongoing as indicated by the
PCI_EXP_LNKSTA_LT bit in the PCI_EXP_LNKSTA register.
Fix the existing workaround, to ensure that the Physical Layer training
has also completed, in addition to the Data Link initialization.
Link: https://lore.kernel.org/r/20230315070800.1615527-1-s-vadapalli@ti.com
Fixes: 4740b969aa ("PCI: cadence: Retrain Link to work around Gen2 training defect")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b07d5cc93e ]
After copy up, we may need to update d_flags if upper dentry is on a
remote fs and lower dentries are not.
Add helpers to allow incremental update of the revalidate flags.
Fixes: bccece1ead ("ovl: allow remote upper")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2560114c06 ]
In case devm_clk_hw_register() fails for one of synth clocks the probe
continues. Later on, when registering output clocks which have as parents
all the synth clocks, in case there is registration failure for at least
one synth clock the information passed to clk core for registering output
clock is not right: init.num_parents is fixed but init.parents may contain
an array with less parents.
Fixes: 3044a860fd ("clk: Add Si5341/Si5340 driver")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230530093913.1656095-4-claudiu.beznea@microchip.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51821765e8 ]
In the rare case in which one of the clock drivers has divider clocks
but not composite clocks, mtk_clk_simple_probe() would not io(re)map,
hence passing a NULL pointer to mtk_clk_register_dividers().
To fix this issue, extend the `if` conditional to also check if any
divider clocks are present. While at it, also make sure the iomem
pointer is NULL if no composite/divider clocks are declared, as we
are checking for that when iounmapping it in the error path.
This hasn't been seen on any MediaTek clock driver as the current ones
always declare composite clocks along with divider clocks, but this is
still an important fix for a future potential KP.
Fixes: 1fe074b1f1 ("clk: mediatek: Add divider clocks to mtk_clk_simple_{probe,remove}()")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230615122051.546985-2-angelogioacchino.delregno@collabora.com
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe9d66cf6e ]
Since hardware revision 5.0.0 the TE configuration moved out of the
PINGPONG block into the INTF block. Writing these registers has no
effect, and is omitted downstream via the DPU/SDE_PINGPONG_TE feature
flag. This flag is only added to PINGPONG blocks used by hardware prior
to 5.0.0.
The existing PP_BLK_TE macro has been removed in favour of directly
passing this feature flag, which has thus far been the only difference
with PP_BLK. PP_BLK_DITHER has been left in place as its embedded
feature flag already excludes this DPU_PINGPONG_TE bit and differs by
setting the block length to zero, as it only contains a DITHER subblock.
The code that writes to these registers in the INTF block will follow in
subsequent patches.
Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/534240/
Link: https://lore.kernel.org/r/20230411-dpu-intf-te-v4-14-27ce1a5ab5c6@somainline.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Stable-dep-of: 0b78be614c ("drm/msm/dpu: fix sc7280 and sc7180 PINGPONG done interrupts")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a7c38ec7d ]
This autorefresh disable logic in the physical command-mode encoder
consumes three callbacks to the pingpong block, and will explode in
unnecessary complexity when the same callbacks need to be called on the
interface block instead to accommodate INTF TE support. To clean this
up, move the logic into the pingpong block under a disable_autorefresh
callback, replacing the aforementioned three get_autorefresh,
setup_autorefresh and get_vsync_info callbacks.
The same logic will have to be replicated to the interface block when it
receives INTF TE support, but it is less complex than constantly
switching on a "has_intf_te" boolean to choose a callback.
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/534230/
Link: https://lore.kernel.org/r/20230411-dpu-intf-te-v4-13-27ce1a5ab5c6@somainline.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Stable-dep-of: 0b78be614c ("drm/msm/dpu: fix sc7280 and sc7180 PINGPONG done interrupts")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71344a718a ]
The fixed commit listed in the Fixes tag below, introduced a bug in
amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new
amdgpu_umc_fill_error_record() and internally in that new function the physical
address (argument "uint64_t retired_page"--wrong name) is right-shifted by
AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass
"address" to that new function, we should NOT right-shift it, since this
results, erroneously, in the page address to be 0 for first
2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses.
This commit fixes this bug.
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 400013b268 ("drm/amdgpu: add umc_fill_error_record to make code more simple")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230610113536.10621-1-luben.tuikov@amd.com
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d50dc746ff ]
Fixes the following gcc with W=1:
In file included from ./include/linux/string.h:253,
from ./include/linux/bitmap.h:11,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/cpumask.h:5,
from ./arch/x86/include/asm/msr.h:11,
from ./arch/x86/include/asm/processor.h:22,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:53,
from ./include/linux/thread_info.h:60,
from ./arch/x86/include/asm/preempt.h:7,
from ./include/linux/preempt.h:78,
from ./include/linux/spinlock.h:56,
from ./include/linux/mmzone.h:8,
from ./include/linux/gfp.h:7,
from ./include/linux/firmware.h:7,
from drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/sienna_cichlid_ppt.c:26:
In function ‘fortify_memcpy_chk’,
inlined from ‘sienna_cichlid_append_powerplay_table’ at drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/sienna_cichlid_ppt.c:444:2,
inlined from ‘sienna_cichlid_setup_pptable’ at drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/sienna_cichlid_ppt.c:506:8,
inlined from ‘sienna_cichlid_setup_pptable’ at drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/sienna_cichlid_ppt.c:494:12:
./include/linux/fortify-string.h:413:4: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
the compiler complains about the size calculation in the memcpy() -
"sizeof(*smc_dpm_table) - sizeof(smc_dpm_table->table_header)" is much
larger than what fits into table_member.
Hence, reuse 'smu_memcpy_trailing' for nv1x
Fixes: 7077b19a38 ("drm/amd/pm: use macro to get pptable members")
Suggested-by: Evan Quan <Evan.Quan@amd.com>
Cc: Evan Quan <Evan.Quan@amd.com>
Cc: Chengming Gui <Jack.Gui@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3bfbff9b46 ]
The bootrom burned into the MT7986 SoC will try multiple locations on
the SPI-NAND flash to load bl2 in case the bl2 image located at the the
previously attempted offset is corrupt.
Use 0x100000 instead of 0x80000 as partition size for bl2 on SPI-NAND,
allowing for up to four redundant copies of bl2 (typically sized a
bit less than 0x40000).
Fixes: 8e01fb15b8 ("arm64: dts: mt7986: add Bananapi R3")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/ZH9UGF99RgzrHZ88@makrotopia.org
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a4366b5695 ]
The capacity-dmips-mhz parameter was miscalculated: this SoC runs
the first (Cortex-A55) cluster at a maximum of 2000MHz and the
second (Cortex-A76) cluster at a maximum of 2200MHz.
In order to calculate the right capacity-dmips-mhz, the following
test was performed:
1. CPUFREQ governor was set to 'performance' on both clusters
2. Ran dhrystone with 500000000 iterations for 10 times on each cluster
3. Calculated the mean result for each cluster
4. Calculated DMIPS/MHz: dmips_mhz = dmips_per_second / cpu_mhz
5. Scaled results to 1024:
result_c0 = dmips_mhz_c0 / dmips_mhz_c1 * 1024
The mean results for this SoC are:
Cluster 0 (LITTLE): 12016411 Dhry/s
Cluster 1 (BIG): 31702034 Dhry/s
The calculated scaled results are:
Cluster 0: 426.953226899238 (rounded to 427)
Cluster 1: 1024
Fixes: 48489980e2 ("arm64: dts: Add Mediatek SoC MT8192 and evaluation board dts and Makefile")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230602183515.3778780-1-nfraprado@collabora.com
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7bfb2ad21 ]
Using devres to depopulate the aux bus made sure that upon a probe
deferral the EDP panel device would be destroyed and recreated upon next
attempt.
But the struct device which the devres is tied to is the DPUs
(drm_dev->dev), which may be happen after the DP controller is torn
down.
Indications of this can be seen in the commonly seen EDID-hexdump full
of zeros in the log, or the occasional/rare KASAN fault where the
panel's attempt to read the EDID information causes a use after free on
DP resources.
It's tempting to move the devres to the DP controller's struct device,
but the resources used by the device(s) on the aux bus are explicitly
torn down in the error path. The KASAN-reported use-after-free also
remains, as the DP aux "module" explicitly frees its devres-allocated
memory in this code path.
As such, explicitly depopulate the aux bus in the error path, and in the
component unbind path, to avoid these issues.
Fixes: 2b57f72661 ("drm/msm/dp: fix aux-bus EP lifetime")
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/542163/
Link: https://lore.kernel.org/r/20230612220106.1884039-1-quic_bjorande@quicinc.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 155fa3a91d ]
Currently, slice_count is being used to calculate word count and
pkt_per_line. Instead, these values should be calculated using slice per
packet, which is not the same as slice_count.
Slice count represents the number of slices per interface, and its value
will not always match that of slice per packet. For example, it is possible
to have cases where there are multiple slices per interface but the panel
specifies only one slice per packet.
Thus, use the default value of one slice per packet and remove slice_count
from the aforementioned calculations.
Fixes: 08802f515c ("drm/msm/dsi: Add support for DSC configuration")
Fixes: bc6b6ff813 ("drm/msm/dsi: Use DSC slice(s) packet size to compute word count")
Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org>
Signed-off-by: Jessica Zhang <quic_jesszhan@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/541965/
Link: https://lore.kernel.org/r/20230405-add-dsc-support-v6-5-95eab864d1b6@quicinc.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b38c6ced4e ]
main_i2c0 is aliased as i2c0 which creates a problem for u-boot R5
SPL attempting to reuse the same definition in the common board
detection logic as it looks for the first i2c instance as the bus on
which to detect the eeprom to understand the board variant involved.
Switch main_i2c0 to i2c3 alias allowing us to introduce wkup_i2c0
and potentially space for mcu_i2c instances in the gap for follow on
patches.
Fixes: 635fb18ba0 ("arch: arm64: dts: Add support for AM69 Starter Kit")
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Udit Kumar <u-kumar1@ti.com>
Link: https://lore.kernel.org/r/20230602214937.2349545-5-nm@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c10a9df30e ]
main_i2c0 is aliased as i2c0 which creates a problem for u-boot R5
SPL attempting to reuse the same definition in the common board
detection logic as it looks for the first i2c instance as the bus on
which to detect the eeprom to understand the board variant involved.
Switch main_i2c0 to i2c3 alias allowing us to introduce wkup_i2c0
and potentially space for mcu_i2c instances in the gap for follow on
patches.
Fixes: e20a06aca5 ("arm64: dts: ti: Add support for J784S4 EVM board")
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Udit Kumar <u-kumar1@ti.com>
Link: https://lore.kernel.org/r/20230602214937.2349545-2-nm@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 155e7635ed ]
Mailbox nodes are now disabled by default. The BeagleBoard AI64 DT
addition went in at around the same time and must have missed that
change so the mailboxes are not re-enabled. Do that here.
Fixes: fae14a1cb8 ("arm64: dts: ti: Add k3-j721e-beagleboneai64")
Signed-off-by: Andrew Davis <afd@ti.com>
Reviewed-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20230515172137.474626-1-afd@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 223ce29c8b ]
The framebuffer configuration for edo pdx203, written in edo dtsi (which
is overwritten in pdx206 dts for its smaller panel) has to use a
1096x2560 configuration as this is what the panel (and framebuffer area)
has been initialized to. Downstream userspace also has access to (and
uses) this 2.5k mode by default, and only switches the panel to 4k when
requested.
This is similar to commit be8de06dc3 ("arm64: dts: qcom:
sm8150-kumano: Panel framebuffer is 2.5k instead of 4k") which fixed the
same for the previous generation Sony platform.
Fixes: 69cdb97ef6 ("arm64: dts: qcom: sm8250: Add support for SONY Xperia 1 II / 5 II (Edo platform)")
Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230606211418.587676-1-marijn.suijten@somainline.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b04cbd81b ]
The rpmh driver will cache sleep and wake votes until the cluster
power-domain is about to enter idle, to avoid unnecessary writes. So
associate the apps_rsc with the cluster pd, so that it can be notified
about this event.
Without this, only AMC votes are being commited.
Fixes: 07c8ded6e3 ("arm64: dts: qcom: add sdm670 and pixel 3a device trees")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230531-topic-rsc-v1-5-b4a985f57b8b@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3db7285e04 ]
Smatch reports:
drivers/clk/mediatek/clk-mtk.c:583 mtk_clk_simple_probe() warn:
'base' from of_iomap() not released on lines: 496.
This problem was also found in linux-next. In mtk_clk_simple_probe(),
base is not released when handling errors
if clk_data is not existed, which may cause a leak.
So free_base should be added here to release base.
Fixes: c58cd0e40f ("clk: mediatek: Add mtk_clk_simple_probe() to simplify clock providers")
Signed-off-by: Bosi Zhang <u201911157@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/r/20230422084331.47198-1-u201911157@hust.edu.cn
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 878b02d5f3 ]
Replace of_iomap() and kzalloc() with devm_of_iomap() and devm_kzalloc()
which can automatically release the related memory when the device
or driver is removed or unloaded to avoid potential memory leak.
In this case, iounmap(anatop_base) in line 427,433 are removed
as manual release is not required.
Besides, referring to clk-imx8mq.c, check the return code of
of_clk_add_hw_provider, if it returns negtive, print error info
and unregister hws, which makes the program more robust.
Fixes: 9c140d9926 ("clk: imx: Add support for i.MX8MP clock driver")
Signed-off-by: Yuxing Liu <lyx2022@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20230503070607.2462-1-lyx2022@hust.edu.cn
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e02ba11b45 ]
In function probe(), it returns directly without unregistered hws
when error occurs.
Fix this by adding 'goto unregister_hws;' on line 295 and
line 310.
Use devm_kzalloc() instead of kzalloc() to automatically
free the memory using devm_kfree() when error occurs.
Replace of_iomap() with devm_of_iomap() to automatically
handle the unused ioremap region and delete 'iounmap(anatop_base);'
in unregister_hws.
Fixes: 24defbe194 ("clk: imx: add i.MX93 clk")
Signed-off-by: Zhanhao Hu <zero12113@hust.edu.cn>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20230601033825.336558-1-zero12113@hust.edu.cn
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b280598ab ]
Use devm_of_iomap() instead of of_iomap() to automatically
handle the unused ioremap region. If any error occurs, regions allocated by
kzalloc() will leak, but using devm_kzalloc() instead will automatically
free the memory using devm_kfree().
Also, fix error handling of hws by adding unregister_hws label, which
unregisters remaining hws when iomap failed.
Fixes: 7154b046d8 ("clk: imx: Add initial support for i.MXRT1050 clock driver")
Signed-off-by: Kai Ma <kaima@hust.edu.cn>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Jesse Taube <Mr.Bossman075@gmail.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20230418113451.151312-1-kaima@hust.edu.cn
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8208181fe5 ]
Currently, certain clocks are derrived as a divider from their
parent clock. For some clocks, even when CLK_SET_RATE_PARENT
is set, the parent clock is not properly set which can lead
to some relatively inaccurate clock values.
Unlike imx/clk-composite-93 and imx/clk-divider-gate, it
cannot rely on calling a standard determine_rate function,
because the 8m composite clocks have a pre-divider and
post-divider. Because of this, a custom determine_rate
function is necessary to determine the maximum clock
division which is equivalent to pre-divider * the
post-divider.
With this added, the system can attempt to adjust the parent rate
when the proper flags are set which can lead to a more precise clock
value.
On the imx8mplus, no clock changes are present.
On the Mini and Nano, this can help achieve more accurate
lcdif clocks. When trying to get a pixel clock of 31.500MHz
on an imx8m Nano, the clocks divided the 594MHz down, but
left the parent rate untouched which caused a calulation error.
Before:
video_pll 594000000
video_pll_bypass 594000000
video_pll_out 594000000
disp_pixel 31263158
disp_pixel_clk 31263158
Variance = -236842 Hz
After this patch:
video_pll 31500000
video_pll_bypass 31500000
video_pll_out 31500000
disp_pixel 31500000
disp_pixel_clk 31500000
Variance = 0 Hz
All other clocks rates and parent were the same.
Similar results on imx8mm were found.
Fixes: 690dccc4a0 ("Revert "clk: imx: composite-8m: Add support to determine_rate"")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Tested-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20230506195325.876871-1-aford173@gmail.com
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3099bcdc19 ]
bnxt_qplib_service_creq can be called from interrupt or tasklet or
process context. So the function take irq variant of spin_lock.
But when wake_up is invoked with the lock held, it is putting the
calling context to sleep.
[exception RIP: __wake_up_common+190]
RIP: ffffffffb7539d7e RSP: ffffa73300207ad8 RFLAGS: 00000083
RAX: 0000000000000001 RBX: ffff91fa295f69b8 RCX: dead000000000200
RDX: ffffa733344af940 RSI: ffffa73336527940 RDI: ffffa73336527940
RBP: 000000000000001c R8: 0000000000000002 R9: 00000000000299c0
R10: 0000017230de82c5 R11: 0000000000000002 R12: ffffa73300207b28
R13: 0000000000000000 R14: ffffa733341bf928 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
Call the wakeup after releasing the lock.
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1686308514-11996-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0af91306e1 ]
Driver is not handling the wraparound of the mbox producer index correctly.
Currently the wraparound happens once u32 max is reached.
Bit 31 of the producer index register is special and should be set
only once for the first command. Because the producer index overflow
setting bit31 after a long time, FW goes to initialization sequence
and this causes FW hang.
Fix is to wraparound the mbox producer index once it reaches u16 max.
Fixes: cee0c7bba4 ("RDMA/bnxt_re: Refactor command queue management code")
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1686308514-11996-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 736a932736 ]
The commit 010c8bbad2 ("drm: msm: adreno: Disable preemption on Adreno
510") added special handling for a510 (this SKU doesn't seem to support
preemption, so the driver should clamp nr_rings to 1). However the
gpu->revn is not yet set (it is set later, in adreno_gpu_init()) and
thus the condition is always false. Check config->rev instead.
Fixes: 010c8bbad2 ("drm: msm: adreno: Disable preemption on Adreno 510")
Reported-by: Adam Skladowski <a39.skl@gmail.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Tested-by: Adam Skladowski <a39.skl@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/531511/
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f0bcf49e9 ]
This is motivated by OOB access in amdgpu_vm_update_range when
offset_in_bo+map_size overflows.
v2: keep the validations in amdgpu_vm_bo_map
v3: add the validations to amdgpu_vm_bo_map/amdgpu_vm_bo_replace_map
rather than to amdgpu_gem_va_ioctl
Fixes: 9f7eb5367d ("drm/amdgpu: actually use the VM map parameters")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49904a0ebf ]
While KUnit tests that cannot be built as a loadable module must depend
on "KUNIT=y", this is not true for modular tests, where it adds an
unnecessary limitation.
Fix this by relaxing the dependency to "KUNIT".
Fixes: 08809e482a ("HID: uclogic: KUnit best practices and naming conventions")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1becc57cd1 ]
Function rv740_get_decoded_reference_divider() may return 0 due to
unpredictable reference divider value calculated in
radeon_atom_get_clock_dividers(). This will lead to
division-by-zero error once that value is used as a divider
in calculating 'clk_s'.
While unlikely, this issue should nonetheless be prevented so add a
sanity check for such cases by testing 'decoded_ref' value against 0.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
v2: minor coding style fixes (Alex)
In practice this should actually happen as the vbios should be
properly populated.
Fixes: 66229b2005 ("drm/radeon/kms: add dpm support for rv7xx (v4)")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b18f05a066 ]
[Why]
When freesync video mode is enabled, switching resolution from native
mode to one of the freesync video compatible modes can trigger continous
artifacts on some eDP panels when running under KDE. The articating can be seen in the
attached bug report.
[How]
Fix this by restricting updates that require full commit by using the same checks
for stream and scaling changes in the the enable pass of dm_update_crtc_state()
along with the check for compatible timings for freesync vide mode.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162
Fixes: da5e149097 ("drm/amd/display: Fix hang when skipping modeset")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cabbdea1f1 ]
Pointer mqd_mem_obj can be deallocated in kfd_gtt_sa_allocate().
The function then returns non-zero value, which causes the second deallocation.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d1f8f0d17d ("drm/amdkfd: Move non-sdma mqd allocation out of init_mqd")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b153a0bb41 ]
The PMON_CONFIG register on ADM1272 is a 16 bit register. Writing a 8 bit
value into it clears the upper 8 bits of the register, resulting in
unexpected side effects. Fix by writing the 16 bit register value.
Also, it has been reported that temperature readings are sometimes widely
inaccurate, to the point where readings may result in device shutdown due
to errant overtemperature faults. Improve by enabling temperature sampling.
While at it, move the common code for ADM1272 and ADM1278 into a separate
function, and clarify in the error message that an attempt was made to
enable both VOUT and temperature monitoring.
Last but not least, return the error code reported by the underlying I2C
controller and not -ENODEV if updating the PMON_CONFIG register fails.
After all, this does not indicate that the chip is not present, but an
error in the communication with the chip.
Fixes: 4ff0ce227a ("hwmon: (pmbus/adm1275) Add support for ADM1272")
Fixes: 9da9c2dc57 ("hwmon: (adm1275) enable adm1272 temperature reporting")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230602213447.3557346-1-linux@roeck-us.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ec7639b5e ]
The gaudi2_get_tpc_idle_status() function returned the incorrect variable
so it always returned true.
Fixes: d85f0531b9 ("accel/habanalabs: break is_idle function into per-engine sub-routines")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3f2778b1b ]
The audio routing flow is not correct, the flow should be from source
(second element in the pair) to sink (first element in the pair). The
flow now is from "HP_OUT" to "Playback", where "Playback" is source
and "HP_OUT" is sink, i.e. the direction is swapped and there is no
direct link between the two either.
Fill in the correct routing, where "HP_OUT" supplies the "Headphone Jack",
"Line In Jack" supplies "LINE_IN" input, "Microphone Jack" supplies "MIC_IN"
input and "Mic Bias" supplies "Microphone Jack".
Fixes: 34e0c7847d ("ARM: dts: stm32: Add DH Electronics DHCOM STM32MP1 SoM and PDK2 board")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a325956fa7 ]
The realtek Bluetooth module uses the same driver as the
realtek,rtl8822cs-bt and the realtek,rtl8723bs-bt, however by selecting
the 8723bs advanced power saving features are disabled that appear
to interfere with normal operation of the bluetooth module. This
change switches the compatible string to disable power saving. Without
this patch evtest of a paired bluetooth controller fails, with this
patch the controller operates as expected.
Fixes: b6986b7920 ("arm64: dts: rockchip: Update compatible for bluetooth")
Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://lore.kernel.org/r/20230508160811.3568213-3-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28ee08cef4 ]
The I2S0_8CH_MCLKOUT clock rate on Rock 5B board defaults to 12 MHz and
it is used to provide the master clock (MCLK) for the ES8316 audio
codec.
On sound card initialization, this limits the allowed sample rates
according to the MCLK/LRCK ratios supported by the codec, which results
in the following non-standard rates: 15625, 30000, 31250, 46875.
Hence, the very first access of the sound card fails:
Broken configuration for playback: no configurations available: Invalid argument
Setting of hwparams failed: Invalid argument
However, all subsequent attempts will succeed, as the audio graph card
will request a correct clock frequency, based on the stream sample rate
and the multiplication factor.
Assign MCLK to 12.288 MHz, which allows the codec to advertise most of
the standard sample rates.
Fixes: 55529fe3f3 ("arm64: dts: rockchip: Add rk3588-rock-5b analog audio")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20230530181140.483936-4-cristian.ciocaltea@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69d439818f ]
Rather than selecting the display IP and feature flags at the same time
the general PCI probing happens, move this step into the display code
itself so that it can be more easily re-used outside of i915 (i.e., by
the Xe driver).
v2:
- Make intel_display_device_probe() always return a non-NULL pointer
and simplify copying of runtime_defaults. (Andrzej)
v3:
- Redefine INTEL_VGA_DEVICE/INTEL_QUANTA_DEVICE to eliminate a cast and
an include of linux/mod_devicetable.h. (Jani)
- Keep explicit memcpy for runtime defaults. (Jani)
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230523195609.73627-5-matthew.d.roper@intel.com
Stable-dep-of: 19db206209 ("drm/i915: No 10bit gamma on desktop gen3 parts")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5af5169d75 ]
Rather than embeddeding the display's device info within the main device
info structure, just provide a pointer to the display-specific
structure. This is in preparation for moving the display device info
definitions into the display code itself and for eventually allowing the
pointer to be assigned at runtime on platforms that use GMD_ID for
device identification.
In the future, this will also eventually allow the same display device
info structures to be used outside the current i915 code (e.g., from the
Xe driver).
v2:
- Move introduction of DISPLAY_INFO() to this patch. (Andrzej)
v3:
- Also use DISPLAY_INFO() in intel_display_reg_defs.h. (Andrzej)
- Use "{}" instead of "{ 0 }" for empty struct init. (Jani)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230523195609.73627-3-matthew.d.roper@intel.com
Stable-dep-of: 19db206209 ("drm/i915: No 10bit gamma on desktop gen3 parts")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 446a20c9ba ]
The goal has been to just make device info a pointer to static const
data, i.e. the static const structs in i915_pci.c. See [1]. However,
there were issues with intel_device_info_runtime_init() clearing the
display sub-struct of device info on the !HAS_DISPLAY() path, which
consequently disables a lot of display functionality, like it
should. Looks like we'd have to cover all those paths, and maybe
sprinkle HAS_DISPLAY() checks in them, which we haven't gotten around
to.
In the mean time, hide mkwrite_device_info() better within
intel_device_info.c by adding a intel_device_info_driver_create() for
the very early initialization of the device info and initial runtime
info. This also lets us declutter i915_drv.h a bit, and stops promoting
mkwrite_device_info() as something that could be used.
[1] https://lore.kernel.org/r/a0422f0a8ac055f65b7922bcd3119b180a41e79e.1655712106.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230411105643.292416-1-jani.nikula@intel.com
Stable-dep-of: 19db206209 ("drm/i915: No 10bit gamma on desktop gen3 parts")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 404c3acda4 ]
Our current limited range matrix is a bit off. I think it
was originally calculated with rounding, as if we wanted
the normal pixel replication type of behaviour.
That is, since the 8bpc max value is 0xeb we assumed the
16bpc max value should be 0xebeb, but what the HDMI spec
actually says it should be is 0xeb00.
So to get what we want we make the formula
out = in * (235-16) << (12-8) / in_max + 16 << (12-8),
with 12 being precision of the csc, 8 being the precision
of the constants we used.
The hardware takes its coefficients as floating point
values, but the (235−16)/255 = ~.86, so exponent 0
is what we want anyway, so it works out perfectly without
having to hardcode it in hex or start playing with floats.
In terms of raw numbers we are feeding the hardware the
post offset changes from 0x101 to 0x100, and the coefficient
changes from 0xdc0 to 0xdb0 (~.860->~.855). So this should
make everything come out just a tad darker.
I already used better constants in lut_limited_range() earlier
so the output of the two paths should be closer now.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230329135002.3096-2-ville.syrjala@linux.intel.com
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Stable-dep-of: 19db206209 ("drm/i915: No 10bit gamma on desktop gen3 parts")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0501fdec10 ]
make dtbs_check:
arch/arm/boot/dts/renesas/r8a7743-iwg20d-q7.dtb: backlight: pwms: [[58, 0, 5000000], [0]] is too long
From schema: Documentation/devicetree/bindings/leds/backlight/pwm-backlight.yaml
arch/arm/boot/dts/renesas/r8a7743-iwg20d-q7-dbcm-ca.dtb: backlight: pwms: [[67, 0, 5000000], [0]] is too long
From schema: Documentation/devicetree/bindings/leds/backlight/pwm-backlight.yaml
arch/arm/boot/dts/renesas/r8a7744-iwg20d-q7-dbcm-ca.dtb: backlight: pwms: [[67, 0, 5000000], [0]] is too long
From schema: Documentation/devicetree/bindings/leds/backlight/pwm-backlight.yaml
arch/arm/boot/dts/renesas/r8a7744-iwg20d-q7.dtb: backlight: pwms: [[58, 0, 5000000], [0]] is too long
From schema: Documentation/devicetree/bindings/leds/backlight/pwm-backlight.yaml
PWM specifiers referring to R-Car PWM Timer Controllers should contain
only two cells.
Fix this by dropping the bogus third cell.
Fixes: 6f89dd9e93 ("ARM: dts: iwg20d-q7-common: Add LCD support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/6e5c3167424a43faf8c1fa68d9667b3d87dc86d8.1684855911.git.geert+renesas@glider.be
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c9358de193 ]
The hfi1 user SDMA pinned-page cache will leave a stale cache entry when
the cache-entry's virtual address range is invalidated but that cache
entry is in-use by an outstanding SDMA request.
Subsequent user SDMA requests with buffers in or spanning the virtual
address range of the stale cache entry will result in packets constructed
from the wrong memory, the physical pages pointed to by the stale cache
entry.
To fix this, remove mmu_rb_node cache entries from the mmu_rb_handler
cache independent of the cache entry's refcount. Add 'struct kref
refcount' to struct mmu_rb_node and manage mmu_rb_node lifetime with
kref_get() and kref_put().
mmu_rb_node.refcount makes sdma_mmu_node.refcount redundant. Remove
'atomic_t refcount' from struct sdma_mmu_node and change sdma_mmu_node
code to use mmu_rb_node.refcount.
Move the mmu_rb_handler destructor call after a
wait-for-SDMA-request-completion call so mmu_rb_nodes that need
mmu_rb_handler's workqueue to queue themselves up for destruction from an
interrupt context may do so.
Fixes: f48ad614c1 ("IB/hfi1: Move driver out of staging")
Fixes: 00cbce5cbf ("IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests")
Link: https://lore.kernel.org/r/168451393605.3700681.13493776139032178861.stgit@awfm-02.cornelisnetworks.com
Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b002760f87 ]
Commit df8fc4e934 ("kbuild: Enable -fstrict-flex-arrays=3") triggers a
warning for fortified memset():
In function 'fortify_memset_chk',
inlined from 'irdma_clr_wqes' at drivers/infiniband/hw/irdma/uk.c:103:4:
include/linux/fortify-string.h:493:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
493 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The problem here isthat the inner array only has four 8-byte elements, so
clearing 4096 bytes overflows that. As this structure is part of an outer
array, change the code to pass a pointer to the irdma_qp_quanta instead,
and change the size argument for readability, matching the comment above
it.
Fixes: 551c46edc7 ("RDMA/irdma: Add user/kernel shared libraries")
Link: https://lore.kernel.org/r/20230523111859.2197825-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b1a78babd ]
Fix build errors in soc/fsl/qe/usb.c when QUICC_ENGINE is not set.
This happens when PPC_EP88XC is set, which selects CPM1 & CPM.
When CPM is set, USB_FSL_QE can be set without QUICC_ENGINE
being set. When USB_FSL_QE is set, QE_USB deafults to y, which
causes build errors when QUICC_ENGINE is not set. Making
QE_USB depend on QUICC_ENGINE prevents QE_USB from defaulting to y.
Fixes these build errors:
drivers/soc/fsl/qe/usb.o: in function `qe_usb_clock_set':
usb.c:(.text+0x1e): undefined reference to `qe_immr'
powerpc-linux-ld: usb.c:(.text+0x2a): undefined reference to `qe_immr'
powerpc-linux-ld: usb.c:(.text+0xbc): undefined reference to `qe_setbrg'
powerpc-linux-ld: usb.c:(.text+0xca): undefined reference to `cmxgcr_lock'
powerpc-linux-ld: usb.c:(.text+0xce): undefined reference to `cmxgcr_lock'
Fixes: 5e41486c40 ("powerpc/QE: add support for QE USB clocks routing")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/all/202301101500.pillNv6R-lkp@intel.com/
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Leo Li <leoyang.li@nxp.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Qiang Zhao <qiang.zhao@nxp.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Nicolas Schier <nicolas@jasle.eu>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60413129ee ]
When using the codec through the generic audio graph card, there are at
least two calls of es8316_set_dai_sysclk(), with the effect of limiting
the allowed sample rates according to the MCLK/LRCK ratios supported by
the codec:
1. During audio card setup, to set the initial MCLK - see
asoc_simple_init_dai().
2. Before opening a stream, to update MCLK, according to the stream
sample rate and the multiplication factor - see
asoc_simple_hw_params().
In some cases the initial MCLK might be set to a frequency that doesn't
match any of the supported ratios, e.g. 12287999 instead of 12288000,
which is only 1 Hz below the supported clock, as that is what the
hardware reports. This creates an empty list of rate constraints, which
is further passed to snd_pcm_hw_constraint_list() via
es8316_pcm_startup(), and causes the following error on the very first
access of the sound card:
$ speaker-test -D hw:Analog,0 -F S16_LE -c 2 -t wav
Broken configuration for playback: no configurations available: Invalid argument
Setting of hwparams failed: Invalid argument
Note that all subsequent retries succeed thanks to the updated MCLK set
at point 2 above, which uses a computed frequency value instead of a
reading from the hardware registers. Normally this would have mitigated
the issue, but es8316_pcm_startup() executes before the 2nd call to
es8316_set_dai_sysclk(), hence it cannot make use of the updated
constraints.
Since es8316_pcm_hw_params() performs anyway a final validation of MCLK
against the stream sample rate and the supported MCLK/LRCK ratios, fix
the issue by ensuring that sysclk_constraints list is only set when at
least one supported sample rate is autodetected by the codec.
Fixes: b8b88b7087 ("ASoC: add es8316 codec driver")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20230530181140.483936-3-cristian.ciocaltea@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f07342903 ]
The following error occurs when trying to restore a previously saved
ALSA mixer state (tested on a Rock 5B board):
$ alsactl --no-ucm -f /tmp/asound.state store hw:Analog
$ alsactl --no-ucm -I -f /tmp/asound.state restore hw:Analog
alsactl: set_control:1475: Cannot write control '2:0:0:ALC Capture Target Volume:0' : Invalid argument
According to ES8316 datasheet, the register at address 0x2B, which is
related to the above mixer control, contains by default the value 0xB0.
Considering the corresponding ALC target bits (ALCLVL) are 7:4, the
control is initialized with 11, which is one step above the maximum
value allowed by the driver:
ALCLVL | dB gain
-------+--------
0000 | -16.5
0001 | -15.0
0010 | -13.5
.... | .....
0111 | -6.0
1000 | -4.5
1001 | -3.0
1010 | -1.5
.... | .....
1111 | -1.5
The tests performed using the VU meter feature (--vumeter=TYPE) of
arecord/aplay confirm the specs are correct and there is no measured
gain if the 1011-1111 range would have been mapped to 0 dB:
dB gain | VU meter %
--------+-----------
-6.0 | 30-31
-4.5 | 35-36
-3.0 | 42-43
-1.5 | 50-51
0.0 | 50-51
Increment the max value allowed for ALC Capture Target Volume control,
so that it matches the hardware default. Additionally, update the
related TLV to prevent an artificial extension of the dB gain range.
Fixes: b8b88b7087 ("ASoC: add es8316 codec driver")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20230530181140.483936-2-cristian.ciocaltea@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cf765e598 ]
Fix the following error in kernel log due to too long sound card name:
"
asoc-audio-graph-card sound: ASoC: driver name too long 'STM32MP1-AV96-HDMI' -> 'STM32MP1-AV96-H'
"
Fixes: e027da3427 ("ARM: dts: stm32: Add bindings for audio on AV96")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 861bc1d288 ]
omap2 contains a hack to define tick_broadcast() on non-SMP
configurations in place of the normal SMP definition. This one
causes a warning because of a missing prototype:
arch/arm/mach-omap2/board-generic.c:44:6: error: no previous prototype for 'tick_broadcast'
Make sure to always include the header with the declaration.
Fixes: d86ad463d6 ("ARM: OMAP2+: Fix regression for using local timer on non-SMP SoCs")
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Link: https://lore.kernel.org/r/20230516153109.514251-9-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 419013740e ]
ep93xx_clocksource_read() is only called from the file it is declared in,
while ep93xx_timer_init() is declared in a header that is not included here.
arch/arm/mach-ep93xx/timer-ep93xx.c:120:13: error: no previous prototype for 'ep93xx_timer_init'
arch/arm/mach-ep93xx/timer-ep93xx.c:63:5: error: no previous prototype for 'ep93xx_clocksource_read'
Fixes: 000bc17817 ("ARM: ep93xx: switch to GENERIC_CLOCKEVENTS")
Acked-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Link: https://lore.kernel.org/r/20230516153109.514251-3-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f24b495508 ]
The previous setting was related to the overall dimension and not to the
active display area.
In the "PHYSICAL SPECIFICATIONS" section, the datasheet shows the
following parameters:
----------------------------------------------------------
| Item | Specifications | unit |
----------------------------------------------------------
| Display area | 98.7 (W) x 57.5 (H) | mm |
----------------------------------------------------------
| Overall dimension | 105.5(W) x 67.2(H) x 4.96(D) | mm |
----------------------------------------------------------
Fixes: 966fea78ad ("drm/panel: simple: Add support for Ampire AM-480272H3TMQW-T01H")
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
[narmstrong: fixed Fixes commit id length]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230516085039.3797303-1-dario.binacchi@amarulasolutions.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54f1a83c72 ]
According to Table 13-45 of the i.MX8M Mini Reference Manual, the min
and max values for M and the frequency range for the VCO_out
calculator were incorrect. This information was contradicted in other
parts of the mini, nano and plus manuals. After reaching out to my
NXP Rep, when confronting him about discrepencies in the Nano manual,
he responded with:
"Yes it is definitely wrong, the one that is part
of the NOTE in MIPI_DPHY_M_PLLPMS register table against PMS_P,
PMS_M and PMS_S is not correct. I will report this to Doc team,
the one customer should be take into account is the Table 13-40
DPHY PLL Parameters and the Note above."
These updated values also match what is used in the NXP downstream
kernel.
To fix this, make new variables to hold the min and max values of m
and the minimum value of VCO_out, and update the PMS calculator to
use these new variables instead of using hard-coded values to keep
the backwards compatibility with other parts using this driver.
Fixes: 4d562c70c4 ("drm: bridge: samsung-dsim: Add i.MX8M Mini/Nano support")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Jagan Teki <jagan@amarulasolutions.com>
Tested-by: Jagan Teki <jagan@amarulasolutions.com> # imx8mm-icore
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230526030559.326566-3-aford173@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd9e329af7 ]
The datasheet describes the following initialization flow including
minimum delay times between each step:
1. DSI data lanes need to be in LP-11 and the clock lane in HS mode
2. toggle EN signal
3. initialize registers
4. enable PLL
5. soft reset
6. enable DSI stream
7. check error status register
To meet this requirement we need to make sure the host bridge's
pre_enable() is called first by using the pre_enable_prev_first
flag.
Furthermore we need to split enable() into pre_enable() which covers
steps 2-5 from above and enable() which covers step 7 and is called
after the host bridge's enable().
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Fixes: ceb515ba29 ("drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver")
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> #TQMa8MxML/MBa8Mx
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230503163313.2640898-3-frieder@fris.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5500f823db ]
The 96Boards specification expects a 1.8V power rail on the low-speed
expansion connector that is able to provide at least 0.18W / 100 mA.
According to the DB410c hardware user manual this is done by connecting
both L15 and L16 in parallel with up to 55mA each (for 110 mA total) [1].
Unfortunately the current regulator setup in the DB410c device tree
does not implement the specification correctly and only provides 5 mA:
- Only L15 is marked always-on, so L16 is never enabled.
- Without specifying a load the regulator is put into LPM where
it can only provide 5 mA.
Fix this by:
- Adding proper voltage constraints for L16.
- Making L16 always-on.
- Adding regulator-system-load for both L15 and L16. 100 mA should be
available in total, so specify 50 mA for each. (The regulator
hardware can only be in normal (55 mA) or low-power mode (5 mA) so
this will actually result in the expected 110 mA total...)
[1]: https://www.96boards.org/documentation/consumer/dragonboard/dragonboard410c/hardware-docs/hardware-user-manual.md.html#power-supplies
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Fixes: 828dd5d66f ("arm64: dts: apq8016-sbc: make 1.8v available on LS expansion")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230510-msm8916-regulators-v1-2-54d4960a05fc@gerhold.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e27654df20 ]
For some reason DB410c has completely bogus regulator constraints that
actually just correspond to the programmable voltages which are already
provided by the regulator driver. Some of them are not just outside the
recommended operating conditions of the APQ8016E SoC but even exceed
the absolute maximum ratings, potentially risking permanent device
damage.
In practice it's not quite as dangerous thanks to the RPM firmware:
It turns out that it has its own voltage constraints and silently
clamps all regulator requests. For example, requesting 3.3V for L5
(allowed by the current regulator constraints!) still results in 1.8V
being programmed in the actual regulator hardware.
Experimentation with various voltages shows that the internal RPM
voltage constraints roughly correspond to the safe "specified range"
in the PM8916 Device Specification (rather than the "programmable
range" used inside apq8016-sbc.dtsi right now).
Combine those together with some fixed voltages used in the old
msm-3.10 device tree from Qualcomm to give DB410c some actually valid
voltage constraints.
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Fixes: 4c7d53d16d ("arm64: dts: apq8016-sbc: add regulators support")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230510-msm8916-regulators-v1-1-54d4960a05fc@gerhold.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e0285ab95 ]
The TUSB6010 (MUSB) device is picking up some GPIO lines
hardcoded by number and passing on to the TUSB6010 device
when registering it.
Instead of nasty workarounds, provide a GPIO descriptor
table and then make the TUSB6010 MUSB glue driver pick up
the GPIO lines directly, convert it to an IRQ and pass down
to the MUSB driver. OMAP2 is the only system using the
TUSB6010.
Stash the GPIO descriptors in the glue layer and use
then to power up and down the TUSB6010 on-demand, instead
of using boardfile callbacks.
Since the OMAP2 boards are the only boards using the
.set_power() and .board_set_power() callbacks, we can
just delete them as the power is now handled directly
in the TUSB6010 glue code.
Cc: Bin Liu <b-liu@ti.com>
Cc: linux-usb@vger.kernel.org
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 078dc5194c ]
The OMAP2 platform data quirk is using the global GPIO numberspace
to obtain two WLAN GPIOs to drive power and xcvr reset GPIO
lines during start-up.
Rewrite the quirk to use a GPIO descriptor table so we avoid using
global GPIO numbers.
This gets rid of the final dependency on the legacy <linux/gpio.h>
header from the OMAP2/3 platforms.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94075d16be ]
This switches the USB hub GPIO reset line handling in the
OMAP2 pdata quirks over to using GPIO descriptors to avoid using
the global GPIO numberspace.
Since the GPIOs are exported and assumedly used by some kind
of userspace we cannot simply use hogs in the device tree.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5f4fa60d6 ]
The TWL4030 GPIO driver has a custom platform data .set_up()
callback to call back into the platform and do misc stuff such
as hog and export a GPIO for WLAN PWR on a specific OMAP3 board.
Avoid all the kludgery in the platform data and the boardfile
and just put the quirks right into the driver. Make it
conditional on OMAP3.
I think the exported GPIO is used by some kind of userspace
so ordinary DTS hogs will probably not work.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c729baa860 ]
After fixing all the offending users referencing the global GPIO
numberspace in OMAP1, a few sites still remain including the
legacy <linus/gpio.h> header for no reason.
Delete the last remaining users, and OMAP1 is free from legacy
GPIO dependencies.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df89de979f ]
The code in serial.c looks up GPIOs corresponding to a line
on the UART when muxed in as GPIO to use this as a wakeup
on serial activity for OMAP1.
Utilize the NULL device to define some board-specific
GPIO lookups and use these to immediately look up the
same GPIOs, set as input and convert to IRQ numbers,
then set these to wakeup IRQs. This is ugly but should work.
This is only needed on the OSK1 and Nokia 770 devices that
use the OMAP16xx.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 084b6f2167 ]
The platform devices on the Nokia 770 is using some
board-specific IRQs that get statically assigned to platform
devices in the boardfile.
This does not work with dynamic IRQ chip bases.
Utilize the NULL device to define some board-specific
GPIO lookups and use these to immediately look up the
same GPIOs, convert to IRQ numbers and pass as resources
to the devices. This is ugly but should work.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e519f0bb64 ]
A recent change to the OMAP driver making it use a dynamic GPIO
base created problems with some old OMAP1 board files, among
them Nokia 770, SX1 and also the OMAP2 Nokia n8x0.
Fix up all instances of GPIOs being used for the MMC driver
by pushing the handling of power, slot selection and MMC
"cover" into the driver as optional GPIOs.
This is maybe not the most perfect solution as the MMC
framework have some central handlers for some of the
stuff, but it at least makes the situtation better and
solves the immediate issue.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 767d83361a ]
The Nokia 770 is using GPIOs from the global numberspace on the
CBUS node to pass down to the LCD controller. This regresses when we
let the OMAP GPIO driver use dynamic GPIO base.
The Nokia 770 now has dynamic allocation of IRQ numbers, so this
needs to be fixed for it to work.
As this is the only user of LCD MIPID we can easily augment the
driver to use a GPIO descriptor instead and resolve the issue.
The platform data .shutdown() callback wasn't even used in the
code, but we encode a shutdown asserting RESET in the remove()
callback for completeness sake.
The CBUS also has the ADS7846 touchscreen attached.
Populate the devices on the Nokia 770 CBUS I2C using software
nodes instead of platform data quirks. This includes the LCD
and the ADS7846 touchscreen so the conversion just brings the LCD
along with it as software nodes is an all-or-nothing design
pattern.
The ADS7846 has some limited support for using GPIO descriptors,
let's convert it over completely to using device properties and then
fix all remaining boardfile users to provide all platform data using
software nodes.
Dump the of includes and of_match_ptr() in the ADS7846 driver as part
of the job.
Since we have to move ADS7846 over to obtaining the GPIOs it is
using exclusively from descriptors, we provide descriptor tables
for the two remaining in-kernel boardfiles using ADS7846:
- PXA Spitz
- MIPS Alchemy DB1000 development board
It was too hard for me to include software node conversion of
these two remaining users at this time: the spitz is using a
hscync callback in the platform data that would require further
GPIO descriptor conversion of the Spitz, and moving the hsync
callback down into the driver: it will just become too big of
a job, but it can be done separately.
The MIPS Alchemy DB1000 is simply something I cannot test, so take
the easier approach of just providing some GPIO descriptors in
this case as I don't want the patch to grow too intrusive.
As we see that several device trees have incorrect polarity flags
and just expect to bypass the gpiolib polarity handling, fix up
all device trees too, in a separate patch.
Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 480c82daa3 ]
It appears this happens because the OMAP driver now
allocates GPIO numbers dynamically, so all that is
references by number is a bit up in the air.
Utilize the NULL device to define some board-specific
GPIO lookups and use these to immediately look up the
same GPIOs, convert to IRQ numbers and pass as resources
to the devices. This is ugly but should work.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c40db6249 ]
It appears this happens because the OMAP driver now
allocates GPIO numbers dynamically, so all that is
references by number is a bit up in the air.
Utilize the NULL device to define some board-specific
GPIO lookups and use these to immediately look up the
same GPIOs, convert to IRQ numbers and pass as resources
to the devices. This is ugly but should work.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa1ae0cd89 ]
The AMS Delta board uses GPIO descriptors exclusively and
does not have any dependencies on the legacy <linux/gpio.h>
header, so just drop it.
Acked-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c32c81f3db ]
Aaro reports problems on the OSK1 board after we altered
the dynamic base for GPIO allocations.
It appears this happens because the OMAP driver now
allocates GPIO numbers dynamically, so all that is
references by number is a bit up in the air.
Let's bite the bullet and try to just move the gpio_chip
in the tps65010 MFD driver over to using dynamic allocations.
Alter everything in the OSK1 board file to use a GPIO
descriptor table and lookups.
Utilize the NULL device to define some board-specific
GPIO lookups and use these to immediately look up the
same GPIOs, convert to IRQ numbers and pass as resources
to the devices. This is ugly but should work.
The .setup() callback for tps65010 was used for some GPIO
hogging, but since the OSK1 is the only user in the entire
kernel we can alter the signatures to something that
is helpful and make a clean transition.
Fixes: 92bf78b33b ("gpio: omap: use dynamic allocation of base")
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: andy.shevchenko@gmail.com
Cc: Andreas Kemnade <andreas@kemnade.info>
Acked-by: Lee Jones <lee@kernel.org>
Reviewed-by: Lee Jones <lee@kernel.org>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1464e48d69 ]
During probe, the driver registers i2c dummy devices and populates the
aux bus, which registers a device for the panel. After doing that, the
driver can still defer probe if needed. This ordering of operations is
troublesome however, because the deferred probe work will retry probing
all pending devices every time a new device is registered. Therefore, if
modules need to be loaded in order to satisfy the dependencies for this
driver to complete probe, the kernel will stall, since it'll keep trying
to probe the anx7625 driver, but never succeed, given that modules would
only be loaded after the deferred probe work completes.
Two changes are required to avoid this issue:
* Move of_find_mipi_dsi_host_by_node(), which can defer probe, to before
anx7625_register_i2c_dummy_clients() and
devm_of_dp_aux_populate_ep_devices(), which register devices.
* Make use of the done_probing callback when populating the aux bus,
so that the bridge registration is only done after the panel is
probed. This is required because the panel might need to defer probe,
but the aux bus population needs the i2c dummy devices working, so
this call couldn't just be moved to an earlier point in probe.
One caveat is that if the panel is described outside the aux bus, the
probe loop issue can still happen, but we don't have a way to avoid
it in that case since there's no callback available.
With this patch applied, it's possible to boot on
mt8192-asurada-spherion with
CONFIG_DRM_ANALOGIX_ANX7625=y
CONFIG_MTK_MMSYS=m
CONFIG_BACKLIGHT_PWM=y
and also with
CONFIG_DRM_ANALOGIX_ANX7625=y
CONFIG_MTK_MMSYS=y
CONFIG_BACKLIGHT_PWM=m
Fixes: adca62ec37 ("drm/bridge: anx7625: Support reading edid through aux channel")
Fixes: 269332997a ("drm/bridge: anx7625: Return -EPROBE_DEFER if the dsi host was not found")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: Robert Foss <rfoss@kernel.org>
Signed-off-by: Robert Foss <rfoss@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230518193902.891121-1-nfraprado@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ffec92e70 ]
The model property should be at the top level, let's move it out
of the pinctrl node.
Fixes: d2eaf949d2 ("ARM: dts: omap3-gta04a5one: define GTA04A5 variant with OneNAND")
Cc: Andreas Kemnade <andreas@kemnade.info>
Cc: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7061b6af34 ]
When map() is called on a detached domain, the domain does not exist in
the device so we do not send a MAP request, but we do update the
internal mapping tree, to be replayed on the next attach. Since this
constitutes a successful iommu_map() call, return *mapped in this case
too.
Fixes: 7e62edd7a3 ("iommu/virtio: Add map/unmap_pages() callbacks implementation")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230515113946.1017624-3-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff2e4bfd16 ]
bnxt_re currently uses the names "bnxt_qplib_creq" and "bnxt_qplib_nq-0"
while registering IRQs. There is no way to distinguish the IRQs of
different device ports when there are multiple IB devices registered.
This could make the scenarios worse where one want to pin IRQs of a device
port to certain CPUs.
Fixed the code to use unique names which has PCI BDF information while
registering interrupts like: "bnxt_re-nq-0@pci:0000:65:00.0" and
"bnxt_re-creq@pci:0000:65:00.1".
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/1684478897-12247-4-git-send-email-selvin.xavier@broadcom.com
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab112ee789 ]
When the ulp hook to start the IRQ fails because the rings are not
available, tasklets are not enabled. In this case when the driver is
unloaded, driver calls CREQ tasklet_kill. This causes an indefinite hang
as the tasklet is not enabled.
Driver shouldn't call tasklet_kill if it is not enabled. So using the
creq->requested and nq->requested flags to identify if both tasklets/irqs
are registered. Checking this flag while scheduling the tasklet from
ISR. Also, added a cleanup for disabling tasklet, in case request_irq
fails during start_irq.
Check for return value for bnxt_qplib_rcfw_start_irq and in case the
bnxt_qplib_rcfw_start_irq fails, return bnxt_re_start_irq without
attempting to start NQ IRQs.
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/1684478897-12247-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0babf89c9c ]
In the unlikely event that something goes wrong with the device and
its registers, the fan_from_reg() function may return 0. This value
will cause a division-by-zero error in the show_pwm() function.
To prevent this, test the value of
fan_from_reg(data->fan_full_speed[nr]) against 0 before performing
the division. If the division-by-zero error is avoided, assign 0 to
the val variable.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: df9ec2dae0 ("hwmon: (f71882fg) Reorder symbols to get rid of a few forward declarations")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Link: https://lore.kernel.org/r/20230510143537.145060-1-n.zhandarovich@fintech.ru
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 70be83708c ]
PSCI is not implemented on SparX-5 at all, there is no ATF and U-boot that
is shipped does not implement it as well.
I have tried flashing the latest BSP 2022.12 U-boot which did not work.
After contacting Microchip, they confirmed that there is no ATF for the
SoC nor PSCI implementation which is unfortunate in 2023.
So, disable PSCI as otherwise kernel crashes as soon as it tries probing
PSCI with, and the crash is only visible if earlycon is used.
Since PSCI is not implemented, switch core bringup to use spin-tables
which are implemented in the vendor U-boot and actually work.
Tested on PCB134 with eMMC (VSC5640EV).
Fixes: 6694aee00a ("arm64: dts: sparx5: Add basic cpu support")
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Acked-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20230221105039.316819-1-robert.marko@sartura.hr
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9660efc2af ]
The ethernet MAC EEPROM is not populated on the SoM itself, it has to be
populated on each carrier board. Move the EEPROM into the correct place
in DTs, i.e. the carrier board DTs. Add label to the EEPROM too.
Fixes: 7e76f82acd ("ARM: dts: stm32: Split Avenger96 into DHCOR SoM and Avenger96 board")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab87f558dc ]
Currently, the pixel conversion isn't rounding the fixed-point values
before assigning it to the RGB coefficients, which is causing the IGT
pixel-format tests to fail. So, use the drm_fixp2int_round() fixed-point
helper to round the values when assigning it to the RGB coefficients.
Tested with igt@kms_plane@pixel-format and igt@kms_plane@pixel-format-source-clamping.
[v2]:
* Use drm_fixp2int_round() to fix the pixel conversion instead of
casting the values to s32 (Melissa Wen).
Fixes: 89b03aeaef ("drm/vkms: fix 32bit compilation error by replacing macros")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Arthur Grillo <arthurgrillo@riseup.net>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20230512104044.65034-2-mcanal@igalia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 123ee07ba5 ]
Smatch reports:
drivers/gpu/drm/sun4i/sun4i_tcon.c:805 sun4i_tcon_init_clocks() warn:
'tcon->clk' from clk_prepare_enable() not released on lines: 792,801.
In the function sun4i_tcon_init_clocks(), tcon->clk and tcon->sclk0 are
not disabled in the error handling, which affects the release of
these variable. Although sun4i_tcon_bind(), which calls
sun4i_tcon_init_clocks(), use sun4i_tcon_free_clocks to disable the
variables mentioned, but the error handling branch of
sun4i_tcon_init_clocks() ignores the required disable process.
To fix this issue, use the devm_clk_get_enabled to automatically
balance enable and disabled calls. As original implementation use
sun4i_tcon_free_clocks() to disable clk explicitly, we delete the
related calls and error handling that are no longer needed.
Fixes: 9026e0d122 ("drm: Add Allwinner A10 Display Engine support")
Fixes: b14e945bda ("drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init")
Fixes: 8e92404725 ("drm/sun4i: support TCONs without channel 1")
Fixes: 34d698f6e3 ("drm/sun4i: Add has_channel_0 TCON quirk")
Signed-off-by: XuDong Liu <m202071377@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230430112347.4689-1-m202071377@hust.edu.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad527ca87e ]
The .driver_data content in i2c_device_id table must match the
.data content in of_device_id table, else device_get_match_data()
would return bogus value on i2c_device_id match. Align the two
tables.
The i2c_device_id table is now converted from of_device_id using
's@.compatible = "renesas,\([^"]\+"\), .data = \(.*\)@"\1, .driver_data = (kernel_ulong_t)\2@'
Fixes: 892e0ddea1 ("clk: rs9: Add Renesas 9-series PCIe clock generator driver")
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Link: https://lore.kernel.org/r/20230507133906.15061-3-marek.vasut+renesas@mailbox.org
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5e10beeaf ]
The .driver_data content in i2c_device_id table must match the
.data content in of_device_id table, else device_get_match_data()
would return bogus value on i2c_device_id match. Align the two
tables.
The i2c_device_id table is now converted from of_device_id using
's@.compatible = "renesas,\([^"]\+"\), .data = \(.*\)@"\1, .driver_data = (kernel_ulong_t)\2@'
Fixes: 48c5e98fed ("clk: Renesas versaclock7 ccf device driver")
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Link: https://lore.kernel.org/r/20230507133906.15061-2-marek.vasut+renesas@mailbox.org
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3c8e2c575 ]
There is no such property in the SPI controller binding documentation.
Also Linux driver doesn't look for it.
This fixes:
arch/arm/boot/dts/bcm4708-asus-rt-ac56u.dtb: spi@18029200: Unevaluated properties are not allowed ('clock-names' was unexpected)
From schema: Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.yaml
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Link: https://lore.kernel.org/r/20230503122830.3200-1-zajec5@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bac7842cd1 ]
Correct computation of THS_TRAILCNT register.
This register must be set to a value that ensure that
THS_TRAIL > 60 ns + 4 x UI
and
THS_TRAIL > 8 x UI
and
THS_TRAIL < TEOT
with
TEOT = 105 ns + (12 x UI)
with the actual value of THS_TRAIL being
(1 + THS_TRAILCNT) x ByteClk cycle + ((1 to 2) + 2) xHSBYTECLK cycle +
- (PHY output delay)
with PHY output delay being about
(8 + (5 to 6)) x MIPIBitClk cycle in the BitClk conversion.
Fixes: ff1ca6397b ("drm/bridge: Add tc358768 driver")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Robert Foss <rfoss@kernel.org>
Signed-off-by: Robert Foss <rfoss@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230427142934.55435-9-francesco@dolcini.it
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77a089328d ]
Correct computation of THS_ZEROCNT register.
This register must be set to a value that ensure that
THS_PREPARE + THS_ZERO > 145ns + 10*UI
with the actual value of (THS_PREPARE + THS_ZERO) being
((1 to 2) + 1 + (TCLK_ZEROCNT + 1) + (3 to 4)) x ByteClk cycle +
+ HSByteClk x (2 + (1 to 2)) + (PHY delay)
with PHY delay being about
(8 + (5 to 6)) x MIPIBitClk cycle in the BitClk conversion.
Fixes: ff1ca6397b ("drm/bridge: Add tc358768 driver")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Robert Foss <rfoss@kernel.org>
Signed-off-by: Robert Foss <rfoss@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230427142934.55435-7-francesco@dolcini.it
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75a8aeac25 ]
Always enable HS video mode setting the TXMD bit, without this change no
video output is present with DSI sinks that are setting
MIPI_DSI_MODE_LPM flag (tested with LT8912B DSI-HDMI bridge).
Previously the driver was enabling HS mode only when the DSI sink was
not explicitly setting the MIPI_DSI_MODE_LPM, however this is not
correct.
The MIPI_DSI_MODE_LPM is supposed to indicate that the sink is willing
to receive data in low power mode, however clearing the
TC358768_DSI_CONTROL_TXMD bit will make the TC358768 send video in
LP mode that is not the intended behavior.
Fixes: ff1ca6397b ("drm/bridge: Add tc358768 driver")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Robert Foss <rfoss@kernel.org>
Signed-off-by: Robert Foss <rfoss@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230427142934.55435-2-francesco@dolcini.it
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd75f3694b ]
With CONFIG_DEBUG_SLAB=y:
# Subtest: input_core
1..3
input: Test input device as /devices/virtual/input/input1
8<--- cut here ---
Unable to handle kernel paging request at virtual address 6b6b6dd7 when read
...
__lock_acquire from lock_acquire+0x26c/0x300
lock_acquire from _raw_spin_lock_irqsave+0x50/0x64
_raw_spin_lock_irqsave from devres_remove+0x20/0x7c
devres_remove from devres_destroy+0x8/0x24
devres_destroy from input_free_device+0x2c/0x60
input_free_device from kunit_try_run_case+0x70/0x94 [kunit]
Without CONFIG_DEBUG_SLAB=y:
KTAP version 1
# Subtest: input_core
1..3
input: Test input device as /devices/virtual/input/input1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 694 at lib/refcount.c:28 refcount_warn_saturate+0x54/0x100
refcount_t: underflow; use-after-free.
...
Call Trace: [<0037cad4>] dump_stack+0xc/0x10
[<00377614>] __warn+0x7e/0xb4
[<0037768c>] warn_slowpath_fmt+0x42/0x62
[<001eee1c>] refcount_warn_saturate+0x54/0x100
[<000b1d34>] kfree_const+0x0/0x20
[<0036290a>] __kobject_del+0x0/0x6e
[<001eee1c>] refcount_warn_saturate+0x54/0x100
[<00362a1a>] kobject_put+0xa2/0xb6
[<11965770>] kunit_generic_run_threadfn_adapter+0x0/0x1c [kunit]
As per the comments for input_allocate_device() and
input_register_device(), input_free_device() must be called only to free
devices that have not been registered. input_unregister_device()
already calls input_put_device(), thus leading to a use-after-free.
Moreover, the kunit_suite.exit() method is called after every test case,
even on failures. As the test itself already does cleanups in its
failure paths, this may lead to a second use-after-free.
Fix the first issue by dropping the call to input_allocate_device() from
input_test_exit().
Fix the second issue by making the cleanup code conditional on a
successful test.
Fixes: fdefcbdd6f ("Input: Add KUnit tests for some of the input core helper functions")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/957b3b309a44d39fb6e38b2a526b250f69ea3d2c.1683022164.git.geert+renesas@glider.be
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dbe836576f ]
The watchdog_timer can schedule tx_timeout_task and watchdog_work
can also arm watchdog_timer. The process is shown below:
----------- timer schedules work ------------
cyttsp4_watchdog_timer() //timer handler
schedule_work(&cd->watchdog_work)
----------- work arms timer ------------
cyttsp4_watchdog_work() //workqueue callback function
cyttsp4_start_wd_timer()
mod_timer(&cd->watchdog_timer, ...)
Although del_timer_sync() and cancel_work_sync() are called in
cyttsp4_remove(), the timer and workqueue could still be rearmed.
As a result, the possible use after free bugs could happen. The
process is shown below:
(cleanup routine) | (timer and workqueue routine)
cyttsp4_remove() | cyttsp4_watchdog_timer() //timer
cyttsp4_stop_wd_timer() | schedule_work()
del_timer_sync() |
| cyttsp4_watchdog_work() //worker
| cyttsp4_start_wd_timer()
| mod_timer()
cancel_work_sync() |
| cyttsp4_watchdog_timer() //timer
| schedule_work()
del_timer_sync() |
kfree(cd) //FREE |
| cyttsp4_watchdog_work() // reschedule!
| cd-> //USE
This patch changes del_timer_sync() to timer_shutdown_sync(),
which could prevent rearming of the timer from the workqueue.
Fixes: 17fb1563d6 ("Input: cyttsp4 - add core driver for Cypress TMA4XX touchscreen devices")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20230421082919.8471-1-duoming@zju.edu.cn
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 55f9720dbf ]
SLPC enables use of efficient freq at init by default. It is
possible for GuC to request frequencies that are higher than
the 'software' max if user has set it lower than the efficient
level.
Scenarios/tests that require strict fixing of freq below the efficient
level will need to disable it through this interface.
v2: Keep just one interface to toggle sysfs. With this, user will
be completely responsible for toggling efficient frequency if need
be. There will be no implicit disabling when user sets min < RP1 (Ashutosh)
v3: Remove unused label, review comments (Ashutosh)
v4: Toggle efficient freq usage in SLPC selftest and checkpatch fixes
v5: Review comments (Andi) and add a separate patch for selftest updates
Fixes: 95ccf312a1 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency")
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230426003942.1924347-1-vinay.belgaumkar@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 710cc1e7cd ]
[Why]
The bit for flip addr is being set causing the determination for
FAST vs MEDIUM to always return MEDIUM when plane info is provided
as a surface update. This causes extreme stuttering for the typical
atomic update path on Linux.
[How]
Don't use update_flags->raw for determining FAST vs MEDIUM. It's too
fragile to changes like this.
Explicitly specify the update type per update flag instead. It's not
as clever as checking the bits itself but at least it's correct.
Fixes: aa5fdb1ab5 ("drm/amd/display: Explicitly specify update type per plane info change")
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20c3dffdcc ]
Several calls to ci_dpm_fini() will attempt to free resources that
either have been freed before or haven't been allocated yet. This
may lead to undefined or dangerous behaviour.
For instance, if r600_parse_extended_power_table() fails, it might
call r600_free_extended_power_table() as will ci_dpm_fini() later
during error handling.
Fix this by only freeing pointers to objects previously allocated.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: cc8dbbb4f6 ("drm/radeon: add dpm support for CI dGPUs (v2)")
Co-developed-by: Natalia Petrova <n.petrova@fintech.ru>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3306ba4b60 ]
Three functions in the amdgpu display driver cause -Wmissing-prototype
warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_resource.c:1858:6: error: no previous prototype for 'is_timing_changed' [-Werror=missing-prototypes]
is_timing_changed() is actually meant to be a global symbol, but needs
a proper name and prototype.
Fixes: 17ce8a6907 ("drm/amd/display: Add dsc pre-validation in atomic check")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d06f925f13 ]
When using the felix driver (the only one which supports UC filtering
and MC filtering) as a DSA master for a random other DSA switch, one can
see the following stack trace when the downstream switch ports join a
VLAN-aware bridge:
=============================
WARNING: suspicious RCU usage
-----------------------------
net/8021q/vlan_core.c:238 suspicious rcu_dereference_protected() usage!
stack backtrace:
Workqueue: dsa_ordered dsa_slave_switchdev_event_work
Call trace:
lockdep_rcu_suspicious+0x170/0x210
vlan_for_each+0x8c/0x188
dsa_slave_sync_uc+0x128/0x178
__hw_addr_sync_dev+0x138/0x158
dsa_slave_set_rx_mode+0x58/0x70
__dev_set_rx_mode+0x88/0xa8
dev_uc_add+0x74/0xa0
dsa_port_bridge_host_fdb_add+0xec/0x180
dsa_slave_switchdev_event_work+0x7c/0x1c8
process_one_work+0x290/0x568
What it's saying is that vlan_for_each() expects rtnl_lock() context and
it's not getting it, when it's called from the DSA master's ndo_set_rx_mode().
The caller of that - dsa_slave_set_rx_mode() - is the slave DSA
interface's dsa_port_bridge_host_fdb_add() which comes from the deferred
dsa_slave_switchdev_event_work().
We went to great lengths to avoid the rtnl_lock() context in that call
path in commit 0faf890fc5 ("net: dsa: drop rtnl_lock from
dsa_slave_switchdev_event_work"), and calling rtnl_lock() is simply not
an option due to the possibility of deadlocking when calling
dsa_flush_workqueue() from the call paths that do hold rtnl_lock() -
basically all of them.
So, when the DSA master calls vlan_for_each() from its ndo_set_rx_mode(),
the state of the 8021q driver on this device is really not protected
from concurrent access by anything.
Looking at net/8021q/, I don't think that vlan_info->vid_list was
particularly designed with RCU traversal in mind, so introducing an RCU
read-side form of vlan_for_each() - vlan_for_each_rcu() - won't be so
easy, and it also wouldn't be exactly what we need anyway.
In general I believe that the solution isn't in net/8021q/ anyway;
vlan_for_each() is not cut out for this task. DSA doesn't need rtnl_lock()
to be held per se - since it's not a netdev state change that we're
blocking, but rather, just concurrent additions/removals to a VLAN list.
We don't even need sleepable context - the callback of vlan_for_each()
just schedules deferred work.
The proposed escape is to remove the dependency on vlan_for_each() and
to open-code a non-sleepable, rtnl-free alternative to that, based on
copies of the VLAN list modified from .ndo_vlan_rx_add_vid() and
.ndo_vlan_rx_kill_vid().
Fixes: 64fdc5f341 ("net: dsa: sync unicast and multicast addresses for VLAN filters too")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230626154402.3154454-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a9922e7be ]
ipvlan_queue_xmit() should return NET_XMIT_XXX, but
ipvlan_xmit_mode_l2/l3() returns rx_handler_result_t or NET_RX_XXX
in some cases. ipvlan_rcv_frame() will only return RX_HANDLER_CONSUMED
in ipvlan_xmit_mode_l2/l3() because 'local' is true. It's equal to
NET_XMIT_SUCCESS. But dev_forward_skb() can return NET_RX_SUCCESS or
NET_RX_DROP, and returning NET_RX_DROP(NET_XMIT_DROP) will increase
both ipvlan and ipvlan->phy_dev drops counter.
The skb to forward can be treated as xmitted successfully. This patch
makes ipvlan_queue_xmit() return NET_XMIT_SUCCESS for forward skb.
Fixes: 2ad7bf3638 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230626093347.7492-1-cambda@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b389139f12 ]
Set element addition error path decrements reference counter on chains
twice: once on element release and again via nft_data_release().
Then, d6b478666f ("netfilter: nf_tables: fix underflow in object
reference counter") incorrectly fixed this by removing the stateful
object reference count decrement.
Restore the stateful object decrement as in b91d903688 ("netfilter:
nf_tables: fix leaking object reference count") and let
nft_data_release() decrement the chain reference counter, so this is
done only once.
Fixes: d6b478666f ("netfilter: nf_tables: fix underflow in object reference counter")
Fixes: 628bd3e49c ("netfilter: nf_tables: drop map element references from preparation phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e70489721 ]
Otherwise a dangling reference to a rule object that is gone remains
in the set binding list.
Fixes: 26b5a5712e ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f188d30087 ]
ct_sip_parse_numerical_param() returns only 0 or 1 now.
But process_register_request() and process_register_response() imply
checking for a negative value if parsing of a numerical header parameter
failed.
The invocation in nf_nat_sip() looks correct:
if (ct_sip_parse_numerical_param(...) > 0 &&
...) { ... }
Make the return value of the function ct_sip_parse_numerical_param()
a tristate to fix all the cases
a) return 1 if value is found; *val is set
b) return 0 if value is not found; *val is unchanged
c) return -1 on error; *val is undefined
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 0f32a40fc9 ("[NETFILTER]: nf_conntrack_sip: create signalling expectations")
Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff0a3a7d52 ]
Eric Dumazet says:
nf_conntrack_dccp_packet() has an unique:
dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh);
And nothing more is 'pulled' from the packet, depending on the content.
dh->dccph_doff, and/or dh->dccph_x ...)
So dccp_ack_seq() is happily reading stuff past the _dh buffer.
BUG: KASAN: stack-out-of-bounds in nf_conntrack_dccp_packet+0x1134/0x11c0
Read of size 4 at addr ffff000128f66e0c by task syz-executor.2/29371
[..]
Fix this by increasing the stack buffer to also include room for
the extra sequence numbers and all the known dccp packet type headers,
then pull again after the initial validation of the basic header.
While at it, mark packets invalid that lack 48bit sequence bit but
where RFC says the type MUST use them.
Compile tested only.
v2: first skb_header_pointer() now needs to adjust the size to
only pull the generic header. (Eric)
Heads-up: I intend to remove dccp conntrack support later this year.
Fixes: 2bc780499a ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f67fbf819 ]
The `shift` variable which indicates the offset in the string at which
to start matching the pattern is initialized to `bm->patlen - 1`, but it
is not reset when a new block is retrieved. This means the implemen-
tation may start looking at later and later positions in each successive
block and miss occurrences of the pattern at the beginning. E.g.,
consider a HTTP packet held in a non-linear skb, where the HTTP request
line occurs in the second block:
[... 52 bytes of packet headers ...]
GET /bmtest HTTP/1.1\r\nHost: www.example.com\r\n\r\n
and the pattern is "GET /bmtest".
Once the first block comprising the packet headers has been examined,
`shift` will be pointing to somewhere near the end of the block, and so
when the second block is examined the request line at the beginning will
be missed.
Reinitialize the variable for each new block.
Fixes: 8082e4ed0a ("[LIB]: Boyer-Moore extension for textsearch infrastructure strike #2")
Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1390
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6709d4b7bc ]
This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.
// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|
and the crash trace for the one of the discussed UAF like:
BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>
Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)
The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
In summary, this patch solves those use-after-free by
1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:
// llcp_sock_bind()
local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B
There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.
2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.
3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.
Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.
Fixes: 52feb444a9 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1b355438b ]
efx_net_stats() (.ndo_get_stats64) can be called during an ethtool
selftest, during which time nic_data->mc_stats is NULL as the NIC has
been fini'd. In this case do not attempt to fetch the latest stats
from the hardware, else we will crash on a NULL dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000038
RIP efx_nic_update_stats
abridged calltrace:
efx_ef10_update_stats_pf
efx_net_stats
dev_get_stats
dev_seq_printf_stats
Skipping the read is safe, we will simply give out stale stats.
To ensure that the free in efx_ef10_fini_nic() does not race against
efx_ef10_update_stats_pf(), which could cause a TOCTTOU bug, take the
efx->stats_lock in fini_nic (it is already held across update_stats).
Fixes: d3142c193d ("sfc: refactor EF10 stats handling")
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f1bc9fc4a0 ]
64-bit DMA detection will fail if axienet was started before (by boot
loader, boot ROM, etc). In this state axienet will not start properly.
XAXIDMA_TX_CDESC_OFFSET + 4 register (MM2S_CURDESC_MSB) is used to detect
64-bit DMA capability here. But datasheet says: When DMACR.RS is 1
(axienet is in enabled state), CURDESC_PTR becomes Read Only (RO) and
is used to fetch the first descriptor. So iowrite32()/ioread32() trick
to this register to detect 64-bit DMA will not work.
So move axienet reset before 64-bit DMA detection.
Fixes: f735c40ed9 ("net: axienet: Autodetect 64-bit DMA capability")
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/20230622192245.116864-1-fido_max@inbox.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11b73313c1 ]
In blamed commit, I missed that get_dist_table() was allocating
memory using GFP_KERNEL, and acquiring qdisc lock to perform
the swap of newly allocated table with current one.
In this patch, get_dist_table() is allocating memory and
copy user data before we acquire the qdisc lock.
Then we perform swap operations while being protected by the lock.
Note that after this patch netem_change() no longer can do partial changes.
If an error is returned, qdisc conf is left unchanged.
Fixes: 2174a08db8 ("sch_netem: acquire qdisc lock in netem_change()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230622181503.2327695-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d61f926d4 ]
syzbot reported a possible deadlock in netlink_set_err() [1]
A similar issue was fixed in commit 1d482e666b ("netlink: disable IRQs
for netlink_lock_table()") in netlink_lock_table()
This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
which were not covered by cited commit.
[1]
WARNING: possible irq lock inversion dependency detected
6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
syz-executor.2/23011 just changed the state of lock:
ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
but this lock was taken by another, SOFTIRQ-safe lock in the past:
(&local->queue_stop_reason_lock){..-.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(nl_table_lock);
local_irq_disable();
lock(&local->queue_stop_reason_lock);
lock(nl_table_lock);
<Interrupt>
lock(&local->queue_stop_reason_lock);
*** DEADLOCK ***
Fixes: 1d482e666b ("netlink: disable IRQs for netlink_lock_table()")
Reported-by: syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=a7d200a347f912723e5c
Link: https://lore.kernel.org/netdev/000000000000e38d1605fea5747e@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230621154337.1668594-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c789ad7cbe ]
There's an hardware issue that can cause missing timestamps. The bug
is that the interrupt is only cleared if the IGC_TXSTMPH_0 register is
read.
The bug can cause a race condition if a timestamp is captured at the
wrong time, and we will miss that timestamp. To reduce the time window
that the problem is able to happen, in case no timestamp was ready, we
read the "previous" value of the timestamp registers, and we compare
with the "current" one, if it didn't change we can be reasonably sure
that no timestamp was captured. If they are different, we use the new
value as the captured timestamp.
The HW bug is not easy to reproduce, got to reproduce it when smashing
the NIC with timestamping requests from multiple applications (e.g.
multiple ntpperf instances + ptp4l), after 10s of minutes.
This workaround has more impact when multiple timestamp registers are
used, and the IGC_TXSTMPH_0 register always need to be read, so the
interrupt is cleared.
Fixes: 2c344ae245 ("igc: Add support for TX timestamping")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce58c7cc8b ]
Before requesting a packet transmission to be hardware timestamped,
check if the user has TX timestamping enabled. Fixes an issue that if
a packet was internally forwarded to the NIC, and it had the
SKBTX_HW_TSTAMP flag set, the driver would mark that timestamp as
skipped.
In reality, that timestamp was "not for us", as TX timestamp could
never be enabled in the NIC.
Checking if the TX timestamping is enabled earlier has a secondary
effect that when TX timestamping is disabled, there's no need to check
for timestamp timeouts.
We should only take care to free any pending timestamp when TX
timestamping is disabled, as that skb would never be released
otherwise.
Fixes: 2c344ae245 ("igc: Add support for TX timestamping")
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c50e2b150 ]
Currently, the igc driver supports timestamping only one tx packet at a
time. During the transmission flow, the skb that requires hardware
timestamping is saved in adapter->ptp_tx_skb. Once hardware has the
timestamp, an interrupt is delivered, and adapter->ptp_tx_work is
scheduled. In igc_ptp_tx_work(), we read the timestamp register, update
adapter->ptp_tx_skb, and notify the network stack.
While the thread executing the transmission flow (the user process
running in kernel mode) and the thread executing ptp_tx_work don't
access adapter->ptp_tx_skb concurrently, there are two other places
where adapter->ptp_tx_skb is accessed: igc_ptp_tx_hang() and
igc_ptp_suspend().
igc_ptp_tx_hang() is executed by the adapter->watchdog_task worker
thread which runs periodically so it is possible we have two threads
accessing ptp_tx_skb at the same time. Consider the following scenario:
right after __IGC_PTP_TX_IN_PROGRESS is set in igc_xmit_frame_ring(),
igc_ptp_tx_hang() is executed. Since adapter->ptp_tx_start hasn't been
written yet, this is considered a timeout and adapter->ptp_tx_skb is
cleaned up.
This patch fixes the issue described above by adding the ptp_tx_lock to
protect access to ptp_tx_skb and ptp_tx_start fields from igc_adapter.
Since igc_xmit_frame_ring() called in atomic context by the networking
stack, ptp_tx_lock is defined as a spinlock, and the irq safe variants
of lock/unlock are used.
With the introduction of the ptp_tx_lock, the __IGC_PTP_TX_IN_PROGRESS
flag doesn't provide much of a use anymore so this patch gets rid of it.
Fixes: 2c344ae245 ("igc: Add support for TX timestamping")
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fde4c557f ]
The Stuff Bit Count is always coded on 4 bits [1]. Update the Stuff
Bit Count size accordingly.
In addition, the CRC fields of CAN FD Frames contain stuff bits at
fixed positions called fixed stuff bits [2]. The CRC field starts with
a fixed stuff bit and then has another fixed stuff bit after each
fourth bit [2], which allows us to derive this formula:
FSB count = 1 + round_down(len(CRC field)/4)
The length of the CRC field is [1]:
len(CRC field) = len(Stuff Bit Count) + len(CRC)
= 4 + len(CRC)
with len(CRC) either 17 or 21 bits depending of the payload length.
In conclusion, for CRC17:
FSB count = 1 + round_down((4 + 17)/4)
= 6
and for CRC 21:
FSB count = 1 + round_down((4 + 21)/4)
= 7
Add a Fixed Stuff bits (FSB) field with above values and update
CANFD_FRAME_OVERHEAD_SFF and CANFD_FRAME_OVERHEAD_EFF accordingly.
[1] ISO 11898-1:2015 section 10.4.2.6 "CRC field":
The CRC field shall contain the CRC sequence followed by a recessive
CRC delimiter. For FD Frames, the CRC field shall also contain the
stuff count.
Stuff count
If FD Frames, the stuff count shall be at the beginning of the CRC
field. It shall consist of the stuff bit count modulo 8 in a 3-bit
gray code followed by a parity bit [...]
[2] ISO 11898-1:2015 paragraph 10.5 "Frame coding":
In the CRC field of FD Frames, the stuff bits shall be inserted at
fixed positions; they are called fixed stuff bits. There shall be a
fixed stuff bit before the first bit of the stuff count, even if the
last bits of the preceding field are a sequence of five consecutive
bits of identical value, there shall be only the fixed stuff bit,
there shall not be two consecutive stuff bits. A further fixed stuff
bit shall be inserted after each fourth bit of the CRC field [...]
Fixes: 85d99c3e2a ("can: length: can_skb_get_frame_len(): introduce function to get data length of frame in data link layer")
Suggested-by: Thomas Kopp <Thomas.Kopp@microchip.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Thomas Kopp <Thomas.Kopp@microchip.com>
Link: https://lore.kernel.org/all/20230611025728.450837-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a5cb79762 ]
When calling bpf_sk_lookup_tcp(), bpf_sk_lookup_udp() or
bpf_skc_lookup_tcp() from tc/xdp ingress, VRF socket bindings aren't
respoected, i.e. unbound sockets are returned, and bound sockets aren't
found.
VRF binding is determined by the sdif argument to sk_lookup(), however
when called from tc the IP SKB control block isn't initialized and thus
inet{,6}_sdif() always returns 0.
Fix by calculating sdif for the tc/xdp flows by observing the device's
l3 enslaved state.
The cg/sk_skb hooking points which are expected to support
inet{,6}_sdif() pass sdif=-1 which makes __bpf_skc_lookup() use the
existing logic.
Fixes: 6acc9b432e ("bpf: Add helper to retrieve socket in BPF")
Signed-off-by: Gilad Sever <gilad9366@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/20230621104211.301902-4-gilad9366@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e98730bc0 ]
Change BPF helper socket lookup functions to use TC specific variants:
bpf_tc_sk_lookup_tcp() / bpf_tc_sk_lookup_udp() / bpf_tc_skc_lookup_tcp()
instead of sharing implementation with the cg / sk_skb hooking points.
This allows introducing a separate logic for the TC flow.
The tc functions are identical to the original code.
Signed-off-by: Gilad Sever <gilad9366@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230621104211.301902-2-gilad9366@gmail.com
Stable-dep-of: 9a5cb79762 ("bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c467c8f081 ]
This microSD card never clears Flush Cache bit after cache flush has
been started in sd_flush_cache(). This leads e.g. to failure to mount
file system. Add a quirk which disables the SD cache for this specific
card from specific manufacturing date of 11/2019, since on newer dated
cards from 05/2023 the cache flush works correctly.
Fixes: 08ebf903af ("mmc: core: Fixup support for writeback-cache for eMMC and SD")
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20230620102713.7701-1-marex@denx.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ec272c586 ]
Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews".
This patch series attempts to finish resolving the feedback received
from Petr Mladek on the v5 series I posted.
Probably the only thing that wasn't fully as clean as Petr requested was
the Kconfig stuff. I couldn't find a better way to express it without a
more major overhaul. In the very least, I renamed "NON_ARCH" to
"PERF_OR_BUDDY" in the hopes that will make it marginally better.
Nothing in this series is terribly critical and even the bugfixes are
small. However, it does cleanup a few things that were pointed out in
review.
This patch (of 10):
The permissions for the kernel.nmi_watchdog sysctl have always been set at
compile time despite the fact that a watchdog can fail to probe. Let's
fix this and set the permissions based on whether the hardlockup detector
actually probed.
Link: https://lkml.kernel.org/r/20230527014153.2793931-1-dianders@chromium.org
Link: https://lkml.kernel.org/r/20230526184139.1.I0d75971cc52a7283f495aac0bd5c3041aadc734e@changeid
Fixes: a994a3147e ("watchdog/hardlockup/perf: Implement init time detection of perf")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reported-by: Petr Mladek <pmladek@suse.com>
Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81972551df ]
The perf hardlockup detector works by looking at interrupt counts and
seeing if they change from run to run. The interrupt counts are managed
by the common watchdog code via its watchdog_timer_fn().
Currently the API between the perf detector and the common code is a
function: is_hardlockup(). When the hard lockup detector sees that
function return true then it handles printing out debug info and inducing
a panic if necessary.
Let's change the API a little bit in preparation for the buddy hardlockup
detector. The buddy hardlockup detector wants to print nearly the same
debug info and have nearly the same panic behavior. That means we want to
move all that code to the common file. For now, the code in the common
file will only be there if the perf hardlockup detector is enabled, but
eventually it will be selected by a common config.
Right now, this _just_ moves the code from the perf detector file to the
common file and changes the names. It doesn't make the changes that the
buddy hardlockup detector will need and doesn't do any style cleanups. A
future patch will do cleanup to make it more obvious what changed.
With the above, we no longer have any callers of is_hardlockup() outside
of the "watchdog.c" file, so we can remove it from the header, make it
static, and move it to the same "#ifdef" block as our new
watchdog_hardlockup_check(). While doing this, it can be noted that even
if no hardlockup detectors were configured the existing code used to still
have the code for counting/checking "hrtimer_interrupts" even if the perf
hardlockup detector wasn't configured. We didn't need to do that, so move
all the "hrtimer_interrupts" counting to only be there if the perf
hardlockup detector is configured as well.
This change is expected to be a no-op.
Link: https://lkml.kernel.org/r/20230519101840.v5.8.Id4133d3183e798122dc3b6205e7852601f289071@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 9ec272c586 ("watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1753fd02a ]
The mm_struct mm_count field is frequently updated by mmgrab/mmdrop
performed by context switch. This causes false-sharing for surrounding
mm_struct fields which are read-mostly.
This has been observed on a 2sockets/112core/224cpu Intel Sapphire Rapids
server running hackbench, and by the kernel test robot will-it-scale
testcase.
Move the mm_count field into its own cache line to prevent false-sharing
with other mm_struct fields.
Move mm_count to the first field of mm_struct to minimize the amount of
padding required: rather than adding padding before and after the mm_count
field, padding is only added after mm_count.
Note that I noticed this odd comment in mm_struct:
commit 2e3025434a ("mm: relocate 'write_protect_seq' in struct mm_struct")
/*
* With some kernel config, the current mmap_lock's offset
* inside 'mm_struct' is at 0x120, which is very optimal, as
* its two hot fields 'count' and 'owner' sit in 2 different
* cachelines, and when mmap_lock is highly contended, both
* of the 2 fields will be accessed frequently, current layout
* will help to reduce cache bouncing.
*
* So please be careful with adding new fields before
* mmap_lock, which can easily push the 2 fields into one
* cacheline.
*/
struct rw_semaphore mmap_lock;
This comment is rather odd for a few reasons:
- It requires addition/removal of mm_struct fields to carefully consider
field alignment of _other_ fields,
- It expresses the wish to keep an "optimal" alignment for a specific
kernel config.
I suspect that the author of this comment may want to revisit this topic
and perhaps introduce a split-struct approach for struct rw_semaphore,
if the need is to place various fields of this structure in different
cache lines.
Link: https://lkml.kernel.org/r/20230515143536.114960-1-mathieu.desnoyers@efficios.com
Fixes: 223baf9d17 ("sched: Fix performance regression introduced by mm_cid")
Fixes: af7f588d8f ("sched: Introduce per-memory-map concurrency ID")
Link: https://lore.kernel.org/lkml/7a0c1db1-103d-d518-ed96-1584a28fbf32@efficios.com
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202305151017.27581d75-yujie.liu@intel.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Olivier Dion <odion@efficios.com>
Cc: <michael.christie@oracle.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39432f8a37 ]
The removed code ran for any BSS that was not included in the MBSSID
element in order to update it. However, instead of using the correct
inheritance rules, it would simply copy the elements from the
transmitting AP. The result is that we would report incorrect elements
in this case.
After some discussions, it seems that there are likely not even APs
actually using this feature. Either way, removing the code decreases
complexity and makes the cfg80211 behaviour more correct.
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.cfd6d8db1f26.Ia1044902b86cd7d366400a4bfb93691b8f05d68c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dfd9aa3e7a ]
The cfg80211_gen_new_ie function merges the IEs using inheritance rules.
Rewrite this function to fix issues around inheritance rules. In
particular, vendor elements do not require any special handling, as they
are either all inherited or overridden by the subprofile.
Also, add fragmentation handling as this may be needed in some cases.
This also changes the function to not require making a copy. The new
version could be optimized a bit by explicitly tracking which IEs have
been handled already rather than looking that up again every time.
Note that a small behavioural change is the removal of the SSID special
handling. This should be fine for the MBSSID element, as the SSID must
be included in the subelement.
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.bc6152e146db.I2b5f3bc45085e1901e5b5192a674436adaf94748@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ec7291e24 ]
There's quite a bit of code accessing sband iftype data
(HE, HE 6 GHz, EHT) and we always need to remember to use
the ieee80211_vif_type_p2p() helper. Add new helpers to
directly get it from the sband/vif rather than having to
call ieee80211_vif_type_p2p().
Convert most code with the following spatch:
@@
expression vif, sband;
@@
-ieee80211_get_he_iftype_cap(sband, ieee80211_vif_type_p2p(vif))
+ieee80211_get_he_iftype_cap_vif(sband, vif)
@@
expression vif, sband;
@@
-ieee80211_get_eht_iftype_cap(sband, ieee80211_vif_type_p2p(vif))
+ieee80211_get_eht_iftype_cap_vif(sband, vif)
@@
expression vif, sband;
@@
-ieee80211_get_he_6ghz_capa(sband, ieee80211_vif_type_p2p(vif))
+ieee80211_get_he_6ghz_capa_vif(sband, vif)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.db099f49e764.Ie892966c49e22c7b7ee1073bc684f142debfdc84@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable-dep-of: f912959875 ("wifi: iwlwifi: mvm: correctly access HE/EHT sband capa")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa0e21fa44 ]
This filter already exists for excluding IPv6 SNMP stats. Extend its
definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.
This patch constitutes a partial fix for a netlink attribute nesting
overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
requester doesn't need them, the truncation of the VF list is avoided.
While it was technically only the stats added in commit c5a9f6f0ab
("net/core: Add drop counters to VF statistics") breaking the camel's
back, the appreciable size of the stats data should never have been
included without due consideration for the maximum number of VFs
supported by PCI.
Fixes: 3b766cd832 ("net/core: Add reading VF statistics through the PF netdevice")
Fixes: c5a9f6f0ab ("net/core: Add drop counters to VF statistics")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Cc: Edwin Peer <espeer@gmail.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20230611105108.122586-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ffc85d929 ]
Make sure that the following unsafe example is rejected by verifier:
1: r9 = ... some pointer with range X ...
2: r6 = ... unbound scalar ID=a ...
3: r7 = ... unbound scalar ID=b ...
4: if (r6 > r7) goto +1
5: r6 = r7
6: if (r6 > X) goto ...
[ Upstream commit 904e6ddf41 ]
Change mark_chain_precision() to track precision in situations
like below:
r2 = unknown value
...
--- state #0 ---
...
r1 = r2 // r1 and r2 now share the same ID
...
--- state #1 {r1.id = A, r2.id = A} ---
...
if (r2 > 10) goto exit; // find_equal_scalars() assigns range to r1
...
--- state #2 {r1.id = A, r2.id = A} ---
r3 = r10
r3 += r1 // need to mark both r1 and r2
At the beginning of the processing of each state, ensure that if a
register with a scalar ID is marked as precise, all registers sharing
this ID are also marked as precise.
This property would be used by a follow-up change in regsafe().
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230613153824.3324830-2-eddyz87@gmail.com
Stable-dep-of: 1ffc85d929 ("bpf: Verify scalar ids mapping in regsafe() using check_ids()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75086cc6de ]
On EDMA capable hardware, ath9k_txq_list_has_key() can enter infinite
loop if it is called while all txq_fifos have packets that use different
key that the one we are looking for. Fix it by exiting the loop if all
txq_fifos have been checked already.
Because this loop is called under spin_lock_bh() (see ath_txq_lock) it
causes the following rcu stall:
rcu: INFO: rcu_sched self-detected stall on CPU
ath10k_pci 0000:01:00.0: failed to read temperature -11
rcu: 1-....: (5254 ticks this GP) idle=189/1/0x4000000000000002 softirq=8442983/8442984 fqs=2579
(t=5257 jiffies g=17983297 q=334)
Task dump for CPU 1:
task:hostapd state:R running task stack: 0 pid: 297 ppid: 289 flags:0x0000000a
Call trace:
dump_backtrace+0x0/0x170
show_stack+0x1c/0x24
sched_show_task+0x140/0x170
dump_cpu_task+0x48/0x54
rcu_dump_cpu_stacks+0xf0/0x134
rcu_sched_clock_irq+0x8d8/0x9fc
update_process_times+0xa0/0xec
tick_sched_timer+0x5c/0xd0
__hrtimer_run_queues+0x154/0x320
hrtimer_interrupt+0x120/0x2f0
arch_timer_handler_virt+0x38/0x44
handle_percpu_devid_irq+0x9c/0x1e0
handle_domain_irq+0x64/0x90
gic_handle_irq+0x78/0xb0
call_on_irq_stack+0x28/0x38
do_interrupt_handler+0x54/0x5c
el1_interrupt+0x2c/0x4c
el1h_64_irq_handler+0x14/0x1c
el1h_64_irq+0x74/0x78
ath9k_txq_has_key+0x1bc/0x250 [ath9k]
ath9k_set_key+0x1cc/0x3dc [ath9k]
drv_set_key+0x78/0x170
ieee80211_key_replace+0x564/0x6cc
ieee80211_key_link+0x174/0x220
ieee80211_add_key+0x11c/0x300
nl80211_new_key+0x12c/0x330
genl_family_rcv_msg_doit+0xbc/0x11c
genl_rcv_msg+0xd8/0x1c4
netlink_rcv_skb+0x40/0x100
genl_rcv+0x3c/0x50
netlink_unicast+0x1ec/0x2c0
netlink_sendmsg+0x198/0x3c0
____sys_sendmsg+0x210/0x250
___sys_sendmsg+0x78/0xc4
__sys_sendmsg+0x4c/0x90
__arm64_sys_sendmsg+0x28/0x30
invoke_syscall.constprop.0+0x60/0x100
do_el0_svc+0x48/0xd0
el0_svc+0x14/0x50
el0t_64_sync_handler+0xa8/0xb0
el0t_64_sync+0x158/0x15c
This rcu stall is hard to reproduce as is, but changing ATH_TXFIFO_DEPTH
from 8 to 2 makes it reasonably easy to reproduce.
Fixes: ca2848022c ("ath9k: Postpone key cache entry deletion for TXQ frames reference it")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Tested-by: Nicolas Escande <nico.escande@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230609093744.1985-1-repk@triplefau.lt
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b23ed4d74c ]
Dan Carpenter reported invalid check for calloc() result in
test_verifier.c:get_xlated_program():
./tools/testing/selftests/bpf/test_verifier.c:1365 get_xlated_program()
warn: variable dereferenced before check 'buf' (see line 1364)
./tools/testing/selftests/bpf/test_verifier.c
1363 *cnt = xlated_prog_len / buf_element_size;
1364 *buf = calloc(*cnt, buf_element_size);
1365 if (!buf) {
This should be if (!*buf) {
1366 perror("can't allocate xlated program buffer");
1367 return -ENOMEM;
This commit refactors the get_xlated_program() to avoid using double
pointer type.
Fixes: 933ff53191 ("selftests/bpf: specify expected instructions in test_verifier tests")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/bpf/ZH7u0hEGVB4MjGZq@moroto/
Link: https://lore.kernel.org/bpf/20230609221637.2631800-1-eddyz87@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 434587df9f ]
There are no other files referencing this function, apparently
it was left global to avoid an 'unused function' warning when
the only caller is left out. With a 'W=1' build, it causes
a 'missing prototype' warning though:
drivers/memstick/host/r592.c:47:13: error: no previous prototype for 'memstick_debug_get_tpc_name' [-Werror=missing-prototypes]
Annotate the function as 'static __maybe_unused' to avoid both
problems.
Fixes: 9263412501 ("memstick: add driver for Ricoh R5C592 card reader")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230516202714.560929-1-arnd@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cba6c4309 ]
Patch series "kexec: enable kexec_crash_size to support two crash kernel
regions".
When crashkernel=X fails to reserve region under 4G, it will fall back to
reserve region above 4G and a region of the default size will also be
reserved under 4G. Unfortunately, /sys/kernel/kexec_crash_size only
supports one crash kernel region now, the user cannot sense the low memory
reserved by reading /sys/kernel/kexec_crash_size. Also, low memory cannot
be freed by writing this file.
For example:
resource_size(crashk_res) = 512M
resource_size(crashk_low_res) = 256M
The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be
768M. When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size
of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB,
which is incorrect.
Since crashk_res manages the memory with high address and crashk_low_res
manages the memory with low address, crashk_low_res is shrunken only when
all crashk_res is shrunken. And because when there is only one crash
kernel region, crashk_res is always used. Therefore, if all crashk_res is
shrunken and crashk_low_res still exists, swap them.
This patch (of 6):
If the value of parameter 'new_size' is in the semi-open and semi-closed
interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the
calculation result of ram_res is:
ram_res->start = crashk_res.end + 1
ram_res->end = crashk_res.end
The operation of insert_resource() fails, and ram_res is not added to
iomem_resource. As a result, the memory of the control block ram_res is
leaked.
In fact, on all architectures, the start address and size of crashk_res
are already aligned by KEXEC_CRASH_MEM_ALIGN. Therefore, we do not need
to round up crashk_res.start again. Instead, we should round up
'new_size' in advance.
Link: https://lkml.kernel.org/r/20230527123439.772-1-thunder.leizhen@huawei.com
Link: https://lkml.kernel.org/r/20230527123439.772-2-thunder.leizhen@huawei.com
Fixes: 6480e5a092 ("kdump: add missing RAM resource in crash_shrink_memory()")
Fixes: 06a7f71124 ("kexec: premit reduction of the reserved memory size")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e8b2c88fc ]
The ice_ptp_extts_work() and ice_ptp_periodic_work() functions are both
scheduled on the same kthread worker, pf.ptp.kworker. The
ice_ptp_periodic_work() function sends to the firmware to interact with the
PHY, and must block to wait for responses.
This can cause delay in responding to the PFINT_OICR_TSYN_EVNT interrupt
cause, ultimately resulting in disruption to processing an input signal of
the frequency is high enough. In our testing, even 100 Hz signals get
disrupted.
Fix this by instead processing the signal inside the miscellaneous
interrupt thread prior to handling Tx timestamps.
Use atomic bits in a new pf->misc_thread bitmap in order to safely
communicate which tasks require processing within the
ice_misc_intr_thread_fn(). This ensures the communication of desired tasks
from the ice_misc_intr() are correctly processed without racing even in the
event that the interrupt triggers again before the thread function exits.
Fixes: 172db5f91d ("ice: add support for auxiliary input/output pins")
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b241e26082 ]
In case WoWlan was never configured during the operation of the system,
the hw->wiphy->wowlan_config will be NULL. rsi_config_wowlan() checks
whether wowlan_config is non-NULL and if it is not, then WARNs about it.
The warning is valid, as during normal operation the rsi_config_wowlan()
should only ever be called with non-NULL wowlan_config. In shutdown this
rsi_config_wowlan() should only ever be called if WoWlan was configured
before by the user.
Add checks for non-NULL wowlan_config into the shutdown hook. While at it,
check whether the wiphy is also non-NULL before accessing wowlan_config .
Drop the single-use wowlan_config variable, just inline it into function
call.
Fixes: 16bbc3eb83 ("rsi: fix null pointer dereference during rsi_shutdown()")
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230527222833.273741-1-marex@denx.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a76c7ca9e ]
The spi geni driver in SE DMA mode, unlike GSI DMA, is not making use of
DMA mapping functionality available in the framework.
The driver does mapping internally which makes dma buffer fields available
in spi_transfer struct superfluous while requiring additional members in
spi_geni_master struct.
Conform to the design by having framework handle map/unmap and do only
DMA transfer in the driver; this also simplifies code a bit.
Fixes: e5f0dfa78a ("spi: spi-geni-qcom: Add support for SE DMA mode")
Suggested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/1684325894-30252-3-git-send-email-quic_vnivarth@quicinc.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit edd75c8028 ]
Building BPF selftests with custom HOSTCFLAGS yields an error:
# make HOSTCFLAGS="-O2"
[...]
HOSTCC ./tools/testing/selftests/bpf/tools/build/resolve_btfids/main.o
main.c:73:10: fatal error: linux/rbtree.h: No such file or directory
73 | #include <linux/rbtree.h>
| ^~~~~~~~~~~~~~~~
The reason is that tools/bpf/resolve_btfids/Makefile passes header
include paths by extending HOSTCFLAGS which is overridden by setting
HOSTCFLAGS in the make command (because of Makefile rules [1]).
This patch fixes the above problem by passing the include paths via
`HOSTCFLAGS_resolve_btfids` which is used by tools/build/Build.include
and can be combined with overridding HOSTCFLAGS.
[1] https://www.gnu.org/software/make/manual/html_node/Overriding.html
Fixes: 56a2df7615 ("tools/resolve_btfids: Compile resolve_btfids as host program")
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230530123352.1308488-1-vmalik@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7793fc3bab ]
This patch fixes an incorrect assumption made in the original
bpf_refcount series [0], specifically that the BPF program calling
bpf_refcount_acquire on some node can always guarantee that the node is
alive. In that series, the patch adding failure behavior to rbtree_add
and list_push_{front, back} breaks this assumption for non-owning
references.
Consider the following program:
n = bpf_kptr_xchg(&mapval, NULL);
/* skip error checking */
bpf_spin_lock(&l);
if(bpf_rbtree_add(&t, &n->rb, less)) {
bpf_refcount_acquire(n);
/* Failed to add, do something else with the node */
}
bpf_spin_unlock(&l);
It's incorrect to assume that bpf_refcount_acquire will always succeed in this
scenario. bpf_refcount_acquire is being called in a critical section
here, but the lock being held is associated with rbtree t, which isn't
necessarily the lock associated with the tree that the node is already
in. So after bpf_rbtree_add fails to add the node and calls bpf_obj_drop
in it, the program has no ownership of the node's lifetime. Therefore
the node's refcount can be decr'd to 0 at any time after the failing
rbtree_add. If this happens before the refcount_acquire above, the node
might be free'd, and regardless refcount_acquire will be incrementing a
0 refcount.
Later patches in the series exercise this scenario, resulting in the
expected complaint from the kernel (without this patch's changes):
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 207 at lib/refcount.c:25 refcount_warn_saturate+0xbc/0x110
Modules linked in: bpf_testmod(O)
CPU: 1 PID: 207 Comm: test_progs Tainted: G O 6.3.0-rc7-02231-g723de1a718a2-dirty #371
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0xbc/0x110
Code: 6f 64 f6 02 01 e8 84 a3 5c ff 0f 0b eb 9d 80 3d 5e 64 f6 02 00 75 94 48 c7 c7 e0 13 d2 82 c6 05 4e 64 f6 02 01 e8 64 a3 5c ff <0f> 0b e9 7a ff ff ff 80 3d 38 64 f6 02 00 0f 85 6d ff ff ff 48 c7
RSP: 0018:ffff88810b9179b0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
RDX: 0000000000000202 RSI: 0000000000000008 RDI: ffffffff857c3680
RBP: ffff88810027d3c0 R08: ffffffff8125f2a4 R09: ffff88810b9176e7
R10: ffffed1021722edc R11: 746e756f63666572 R12: ffff88810027d388
R13: ffff88810027d3c0 R14: ffffc900005fe030 R15: ffffc900005fe048
FS: 00007fee0584a700(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005634a96f6c58 CR3: 0000000108ce9002 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
bpf_refcount_acquire_impl+0xb5/0xc0
(rest of output snipped)
The patch addresses this by changing bpf_refcount_acquire_impl to use
refcount_inc_not_zero instead of refcount_inc and marking
bpf_refcount_acquire KF_RET_NULL.
For owning references, though, we know the above scenario is not possible
and thus that bpf_refcount_acquire will always succeed. Some verifier
bookkeeping is added to track "is input owning ref?" for bpf_refcount_acquire
calls and return false from is_kfunc_ret_null for bpf_refcount_acquire on
owning refs despite it being marked KF_RET_NULL.
Existing selftests using bpf_refcount_acquire are modified where
necessary to NULL-check its return value.
[0]: https://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/
Fixes: d2dcc67df9 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230602022647.1571784-5-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc0d76cafe ]
Given the pointer to struct bpf_{rb,list}_node within a local kptr and
the byte offset of that field within the kptr struct, the calculation changed
by this patch is meant to find the beginning of the kptr so that it can
be passed to bpf_obj_drop.
Unfortunately instead of doing
ptr_to_kptr = ptr_to_node_field - offset_bytes
the calculation is erroneously doing
ptr_to_ktpr = ptr_to_node_field - (offset_bytes * sizeof(struct bpf_rb_node))
or the bpf_list_node equivalent.
This patch fixes the calculation.
Fixes: d2dcc67df9 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230602022647.1571784-4-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2140a6e342 ]
In verifier.c, fixup_kfunc_call uses struct bpf_insn_aux_data's
kptr_struct_meta field to pass information about local kptr types to
various helpers and kfuncs at runtime. The recent bpf_refcount series
added a few functions to the set that need this information:
* bpf_refcount_acquire
* Needs to know where the refcount field is in order to increment
* Graph collection insert kfuncs: bpf_rbtree_add, bpf_list_push_{front,back}
* Were migrated to possibly fail by the bpf_refcount series. If
insert fails, the input node is bpf_obj_drop'd. bpf_obj_drop needs
the kptr_struct_meta in order to decr refcount and properly free
special fields.
Unfortunately the verifier handling of collection insert kfuncs was not
modified to actually populate kptr_struct_meta. Accordingly, when the
node input to those kfuncs is passed to bpf_obj_drop, it is done so
without the information necessary to decr refcount.
This patch fixes the issue by populating kptr_struct_meta for those
kfuncs.
Fixes: d2dcc67df9 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230602022647.1571784-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d585f48ee ]
For kfuncs like bpf_obj_drop and bpf_refcount_acquire - which take
user-defined types as input - the verifier needs to track the specific
type passed in when checking a particular kfunc call. This requires
tracking (btf, btf_id) tuple. In commit 7c50b1cb76
("bpf: Add bpf_refcount_acquire kfunc") I added an anonymous union with
inner structs named after the specific kfuncs tracking this information,
with the goal of making it more obvious which kfunc this data was being
tracked / expected to be tracked on behalf of.
In a recent series adding a new user of this tuple, Alexei mentioned
that he didn't like this union usage as it doesn't really help with
readability or bug-proofing ([0]). In an offline convo we agreed to
have the tuple be fields (arg_btf, arg_btf_id), with comments in
bpf_kfunc_call_arg_meta definition enumerating the uses of the fields by
kfunc-specific handling logic. Such a pattern is used by struct
bpf_reg_state without trouble.
Accordingly, this patch removes the anonymous union in favor of arg_btf
and arg_btf_id fields and comment enumerating their current uses. The
patch also removes struct btf_and_id, which was only being used by the
removed union's inner structs.
This is a mechanical change, existing linked_list and rbtree tests will
validate that correct (btf, btf_id) are being passed.
[0]: https://lore.kernel.org/bpf/20230505021707.vlyiwy57vwxglbka@dhcp-172-26-102-232.dhcp.thefacebook.com
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230510213047.1633612-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: 2140a6e342 ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75bd32f5ce ]
Currently, on WCN3990, the station disconnect after hardware recovery is
not working as expected. This is because of setting the
IEEE80211_SDATA_DISCONNECT_HW_RESTART flag very early in the hardware
recovery process even before the driver invokes ieee80211_hw_restart().
On the contrary, mac80211 expects this flag to be set after
ieee80211_hw_restart() is invoked for it to trigger station disconnect.
Set the IEEE80211_SDATA_DISCONNECT_HW_RESTART flag in
ath10k_reconfig_complete() instead to fix this.
The other targets are not affected by this change, since the hardware
params flag is not set.
Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1
Fixes: 2c3fc50591 ("ath10k: Trigger sta disconnect on hardware restart")
Signed-off-by: Youghandhar Chintala <quic_youghand@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230518101515.3820-1-quic_youghand@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60548b825b ]
Default samples/pktgen scripts send 60 byte packets as hardware adds
4-bytes FCS checksum, which fulfils minimum Ethernet 64 bytes frame
size.
XDP layer will not necessary have access to the 4-bytes FCS checksum.
This leads to bpf_xdp_load_bytes() failing as it tries to copy 64-bytes
from an XDP packet that only have 60-bytes available.
Fixes: 7722517422 ("samples/bpf: fixup some tools to be able to support xdp multibuffer")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/168545704139.2996228.2516528552939485216.stgit@firesoul
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f1784a59c ]
When receiving more rx packets than the kernel can handle the driver
drops the packets and issues an error message. This is bad for two
reasons. The logs are flooded with myriads of messages, but then time
consumed for printing messages in that critical code path brings down
the device. After some time of excessive rx load the driver responds
with:
rtw_8822cu 1-1:1.2: failed to get tx report from firmware
rtw_8822cu 1-1:1.2: firmware failed to report density after scan
rtw_8822cu 1-1:1.2: firmware failed to report density after scan
The device stops working until being replugged.
Fix this by lowering the priority to debug level and also by
ratelimiting it.
Fixes: a82dfd33d1 ("wifi: rtw88: Add common USB chip support")
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230524103934.1019096-1-s.hauer@pengutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 08880713ce ]
If CONFIG_DEBUG_FS is not set:
regulator: Failed to create debugfs directory
...
regulator-dummy: Failed to create debugfs directory
As per the comments for debugfs_create_dir(), errors returned by this
function should be expected, and ignored:
* If debugfs is not enabled in the kernel, the value -%ENODEV will be
* returned.
*
* NOTE: it's expected that most callers should _ignore_ the errors returned
* by this function. Other debugfs functions handle the fact that the "dentry"
* passed to them could be an error and they don't crash in that case.
* Drivers should generally work fine even if debugfs fails to init anyway.
Adhere to the debugfs spirit, and streamline all operations by:
1. Demoting the importance of the printed error messages to debug
level, like is already done in create_regulator(),
2. Further ignoring any returned errors, as by design, all debugfs
functions are no-ops when passed an error pointer.
Fixes: 2bf1c45be3 ("regulator: Fix error checking for debugfs_create_dir")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/2f8bb6e113359ddfab7b59e4d4274bd4c06d6d0a.1685013051.git.geert+renesas@glider.be
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b9e86d445 ]
If the probe routine fails with -EPROBE_DEFER after taking over the
OF node from its parent driver, reprobing triggers pinctrl_bind_pins()
and that will fail. Fix this by setting of_node_reused, so that the
device does not try to setup pin muxing.
For me this always happens once the driver is marked to prefer async
probing and never happens without that flag.
Fixes: 259b93b21a ("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in 4.14")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://lore.kernel.org/r/20230504173618.142075-12-sebastian.reichel@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f04a32b2c5 ]
The sign-file utility (from scripts/) is used in prog_tests/verify_pkcs7_sig.c,
but the utility should not be called as a test. Executing this utility produces
the following error:
selftests: /linux/tools/testing/selftests/bpf: urandom_read
ok 16 selftests: /linux/tools/testing/selftests/bpf: urandom_read
selftests: /linux/tools/testing/selftests/bpf: sign-file
not ok 17 selftests: /linux/tools/testing/selftests/bpf: sign-file # exit=2
Also, urandom_read is mistakenly used as a test. It does not lead to an error,
but should be moved over to TEST_GEN_FILES as well. The empty TEST_CUSTOM_PROGS
can then be removed.
Fixes: fc97590668 ("selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc")
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/ZEuWFk3QyML9y5QQ@example.org
Link: https://lore.kernel.org/bpf/88e3ab23029d726a2703adcf6af8356f7a2d3483.1684316821.git.legion@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 108598c39e ]
If it fails to attach fentry, the allocated bpf trampoline image will be
left in the system. That can be verified by checking /proc/kallsyms.
This meamleak can be verified by a simple bpf program as follows:
SEC("fentry/trap_init")
int fentry_run()
{
return 0;
}
It will fail to attach trap_init because this function is freed after
kernel init, and then we can find the trampoline image is left in the
system by checking /proc/kallsyms.
$ tail /proc/kallsyms
ffffffffc0613000 t bpf_trampoline_6442453466_1 [bpf]
ffffffffc06c3000 t bpf_trampoline_6442453466_1 [bpf]
$ bpftool btf dump file /sys/kernel/btf/vmlinux | grep "FUNC 'trap_init'"
[2522] FUNC 'trap_init' type_id=119 linkage=static
$ echo $((6442453466 & 0x7fffffff))
2522
Note that there are two left bpf trampoline images, that is because the
libbpf will fallback to raw tracepoint if -EINVAL is returned.
Fixes: e21aa34178 ("bpf: Fix fexit trampoline.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Link: https://lore.kernel.org/bpf/20230515130849.57502-2-laoar.shao@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47e79cbeea ]
After commit e21aa34178 ("bpf: Fix fexit trampoline."), the selector is only
used to indicate how many times the bpf trampoline image are updated and been
displayed in the trampoline ksym name. After the trampoline is freed, the
selector will start from 0 again. So the selector is a useless value to the
user. We can remove it.
If the user want to check whether the bpf trampoline image has been updated
or not, the user can compare the address. Each time the trampoline image is
updated, the address will change consequently. Jiri also pointed out another
issue that perf is still using the old name "bpf_trampoline_%lu", so this
change can fix the issue in perf.
Fixes: e21aa34178 ("bpf: Fix fexit trampoline.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Link: https://lore.kernel.org/bpf/ZFvOOlrmHiY9AgXE@krava
Link: https://lore.kernel.org/bpf/20230515130849.57502-3-laoar.shao@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 04cb8453a9 ]
On aarch64, "bpftool feature" reports an incorrect BPF JIT limit:
$ sudo /sbin/bpftool feature
Scanning system configuration...
bpf() syscall restricted to privileged users
JIT compiler is enabled
JIT compiler hardening is disabled
JIT compiler kallsyms exports are enabled for root
skipping kernel config, can't open file: No such file or directory
Global memory limit for JIT compiler for unprivileged users is -201326592 bytes
This is because /proc/sys/net/core/bpf_jit_limit reports
$ sudo cat /proc/sys/net/core/bpf_jit_limit
68169519595520
...and an int is assumed in read_procfs(). Change read_procfs()
to return a long to avoid negative value reporting.
Fixes: 7a4522bbef ("tools: bpftool: add probes for /proc/ eBPF parameters")
Reported-by: Nicky Veitch <nicky.veitch@oracle.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230512113134.58996-1-alan.maguire@oracle.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d9b41daa5 ]
If sock->service_name is NULL, the local variable
service_name_tlv_length will not be assigned by nfc_llcp_build_tlv(),
later leading to using value frmo the stack. Smatch warning:
net/nfc/llcp_commands.c:442 nfc_llcp_send_connect() error: uninitialized symbol 'service_name_tlv_length'.
Fixes: de9e5aeb4f ("NFC: llcp: Fix usage of llcp_add_tlv()")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f34baf67e ]
n_bytes variable in the driver represents the number of bytes per word
that needs to be sent/copied to fifo. Bits/word can be between 8 and 32
bits from the client but in memory they are a power of 2, same is mentioned
in spi.h header:
"
* @bits_per_word: Data transfers involve one or more words; word sizes
* like eight or 12 bits are common. In-memory wordsizes are
* powers of two bytes (e.g. 20 bit samples use 32 bits).
* This may be changed by the device's driver, or left at the
* default (0) indicating protocol words are eight bit bytes.
* The spi_transfer.bits_per_word can override this for each transfer.
"
Hence, round of n_bytes to a power of 2 to avoid values like 3 which
would generate unalligned/odd accesses to memory/fifo.
* tested on Baikal-T1 based system with DW SPI-looped back interface
transferring a chunk of data with DFS:8,12,16.
Fixes: a51acc2400 ("spi: dw: Add support for 32-bits max xfer size")
Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com
Signed-off-by: Joy Chakraborty <joychakr@google.com
Reviewed-by: Serge Semin <fancer.lancer@gmail.com
Tested-by: Serge Semin <fancer.lancer@gmail.com
Link: https://lore.kernel.org/r/20230512104746.1797865-4-joychakr@google.com
Signed-off-by: Mark Brown <broonie@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29ebbba7d4 ]
With the way the hooks implemented right now, we have a special
condition: optval larger than PAGE_SIZE will expose only first 4k into
BPF; any modifications to the optval are ignored. If the BPF program
doesn't handle this condition by resetting optlen to 0,
the userspace will get EFAULT.
The intention of the EFAULT was to make it apparent to the
developers that the program is doing something wrong.
However, this inadvertently might affect production workloads
with the BPF programs that are not too careful (i.e., returning EFAULT
for perfectly valid setsockopt/getsockopt calls).
Let's try to minimize the chance of BPF program screwing up userspace
by ignoring the output of those BPF programs (instead of returning
EFAULT to the userspace). pr_info_once those cases to
the dmesg to help with figuring out what's going wrong.
Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bdeeed3498 ]
It seems like __builtin_offset() doesn't preserve CO-RE field
relocations properly. So if offsetof() macro is defined through
__builtin_offset(), CO-RE-enabled BPF code using container_of() will be
subtly and silently broken.
To avoid this problem, redefine offsetof() and container_of() in the
form that works with CO-RE relocations more reliably.
Fixes: 5fbc220862 ("tools/libpf: Add offsetof/container_of macro in bpf_helpers.h")
Reported-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230509065502.2306180-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28fa3ac487 ]
When force-freeing leftover entries from our match_action_ht, call
efx_tc_delete_rule(), which releases all the rule's resources, rather
than open-coding it. The open-coded version was missing a call to
release the rule's encap match (if any).
It probably doesn't matter as everything's being torn down anyway, but
it's cleaner this way and prevents further error messages potentially
being logged by efx_tc_encap_match_free() later on.
Move efx_tc_flow_free() further down the file to avoid introducing a
forward declaration of efx_tc_delete_rule().
Fixes: 17654d84b4 ("sfc: add offloading of 'foreign' TC (decap) rules")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ce4bb0912 ]
Mandatory WFA testcase
CT_Security_WPA2Personal_STA_RSNEBoundsVerification-AbsentRSNCap,
performs bounds verfication on Beacon and/or Probe response frames. It
failed and observed the reason to be absence of cipher suite and AKM
suite in RSN information. To fix this, enable the RSN flag before extracting RSN
capabilities.
Fixes: cd21d99e59 ("wifi: wilc1000: validate pairwise and authentication suite offsets")
Signed-off-by: Amisha Patel <amisha.patel@microchip.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230421181005.4865-1-amisha.patel@microchip.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f655badf2a ]
Fix propagate_precision() logic to perform propagation of all necessary
registers and stack slots across all active frames *in one batch step*.
Doing this for each register/slot in each individual frame is wasteful,
but the main problem is that backtracking of instruction in any frame
except the deepest one just doesn't work. This is due to backtracking
logic relying on jump history, and available jump history always starts
(or ends, depending how you view it) in current frame. So, if
prog A (frame #0) called subprog B (frame #1) and we need to propagate
precision of, say, register R6 (callee-saved) within frame #0, we
actually don't even know where jump history that corresponds to prog
A even starts. We'd need to skip subprog part of jump history first to
be able to do this.
Luckily, with struct backtrack_state and __mark_chain_precision()
handling bitmasks tracking/propagation across all active frames at the
same time (added in previous patch), propagate_precision() can be both
fixed and sped up by setting all the necessary bits across all frames
and then performing one __mark_chain_precision() pass. This makes it
unnecessary to skip subprog parts of jump history.
We also improve logging along the way, to clearly specify which
registers' and slots' precision markings are propagated within which
frame. Each frame will have dedicated line and all registers and stack
slots from that frame will be reported in format similar to precision
backtrack regs/stack logging. E.g.:
frame 1: propagating r1,r2,r3,fp-8,fp-16
frame 0: propagating r3,r9,fp-120
Fixes: 529409ea92 ("bpf: propagate precision across all frames, not just the last one")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ef22b6865 ]
Teach __mark_chain_precision logic to maintain register/stack masks
across all active frames when going from child state to parent state.
Currently this should be mostly no-op, as precision backtracking usually
bails out when encountering subprog entry/exit.
It's not very apparent from the diff due to increased indentation, but
the logic remains the same, except everything is done on specific `fr`
frame index. Calls to bt_clear_reg() and bt_clear_slot() are replaced
with frame-specific bt_clear_frame_reg() and bt_clear_frame_slot(),
where frame index is passed explicitly, instead of using current frame
number.
We also adjust logging to emit affected frame number. And we also add
better logging of human-readable register and stack slot masks, similar
to previous patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: f655badf2a ("bpf: fix propagate_precision() logic for inner frames")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d9439c21a9 ]
Add helper to format register and stack masks in more human-readable
format. Adjust logging a bit during backtrack propagation and especially
during forcing precision fallback logic to make it clearer what's going
on (with log_level=2, of course), and also start reporting affected
frame depth. This is in preparation for having more than one active
frame later when precision propagation between subprog calls is added.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: f655badf2a ("bpf: fix propagate_precision() logic for inner frames")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 407958a0e9 ]
Add struct backtrack_state and straightforward API around it to keep
track of register and stack masks used and maintained during precision
backtracking process. Having this logic separately allow to keep
high-level backtracking algorithm cleaner, but also it sets us up to
cleanly keep track of register and stack masks per frame, allowing (with
some further logic adjustments) to perform precision backpropagation
across multiple frames (i.e., subprog calls).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: f655badf2a ("bpf: fix propagate_precision() logic for inner frames")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c39028b333 ]
The btf_dump/struct_data selftest is failing with:
[...]
test_btf_dump_struct_data:FAIL:unexpected return value dumping fs_context unexpected unexpected return value dumping fs_context: actual -7 != expected 264
[...]
The reason is in btf_dump_type_data_check_overflow(). It does not use
BTF_MEMBER_BITFIELD_SIZE from the struct's member (btf_member). Instead,
it is using the enum size which is 4. It had been working till the recent
commit 4e04143c86 ("fs_context: drop the unused lsm_flags member")
removed an integer member which also removed the 4 bytes padding at the
end of the fs_context. Missing this 4 bytes padding exposed this bug. In
particular, when btf_dump_type_data_check_overflow() reaches the member
'phase', -E2BIG is returned.
The fix is to pass bit_sz to btf_dump_type_data_check_overflow(). In
btf_dump_type_data_check_overflow(), it does a different size check when
bit_sz is not zero.
The current fs_context:
[3600] ENUM 'fs_context_purpose' encoding=UNSIGNED size=4 vlen=3
'FS_CONTEXT_FOR_MOUNT' val=0
'FS_CONTEXT_FOR_SUBMOUNT' val=1
'FS_CONTEXT_FOR_RECONFIGURE' val=2
[3601] ENUM 'fs_context_phase' encoding=UNSIGNED size=4 vlen=7
'FS_CONTEXT_CREATE_PARAMS' val=0
'FS_CONTEXT_CREATING' val=1
'FS_CONTEXT_AWAITING_MOUNT' val=2
'FS_CONTEXT_AWAITING_RECONF' val=3
'FS_CONTEXT_RECONF_PARAMS' val=4
'FS_CONTEXT_RECONFIGURING' val=5
'FS_CONTEXT_FAILED' val=6
[3602] STRUCT 'fs_context' size=264 vlen=21
'ops' type_id=3603 bits_offset=0
'uapi_mutex' type_id=235 bits_offset=64
'fs_type' type_id=872 bits_offset=1216
'fs_private' type_id=21 bits_offset=1280
'sget_key' type_id=21 bits_offset=1344
'root' type_id=781 bits_offset=1408
'user_ns' type_id=251 bits_offset=1472
'net_ns' type_id=984 bits_offset=1536
'cred' type_id=1785 bits_offset=1600
'log' type_id=3621 bits_offset=1664
'source' type_id=42 bits_offset=1792
'security' type_id=21 bits_offset=1856
's_fs_info' type_id=21 bits_offset=1920
'sb_flags' type_id=20 bits_offset=1984
'sb_flags_mask' type_id=20 bits_offset=2016
's_iflags' type_id=20 bits_offset=2048
'purpose' type_id=3600 bits_offset=2080 bitfield_size=8
'phase' type_id=3601 bits_offset=2088 bitfield_size=8
'need_free' type_id=67 bits_offset=2096 bitfield_size=1
'global' type_id=67 bits_offset=2097 bitfield_size=1
'oldapi' type_id=67 bits_offset=2098 bitfield_size=1
Fixes: 920d16af9b ("libbpf: BTF dumper support for typed data")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230428013638.1581263-1-martin.lau@linux.dev
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f24292e827 ]
For the reasons also described in commit b383e8abed ("wifi: ath9k: avoid
uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should
validate pkt_len before accessing the SKB.
For example, the obtained SKB may have been badly constructed with
pkt_len = 8. In this case, the SKB can only contain a valid htc_frame_hdr
but after being processed in ath9k_htc_rx_msg() and passed to
ath9k_wmi_ctrl_rx() endpoint RX handler, it is expected to have a WMI
command header which should be located inside its data payload.
Implement sanity checking inside ath9k_wmi_ctrl_rx(). Otherwise, uninit
memory can be referenced.
Tested on Qualcomm Atheros Communications AR9271 802.11n .
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: fb9987d0f7 ("ath9k_htc: Support for AR9271 chipset.")
Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230424183348.111355-1-pchelkin@ispras.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84214ab468 ]
When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a03
("igc: Add transmit and receive fastpath and interrupt handlers"), the
hardware wasn't configured to provide RSS hash, thus it made sense to not
enable net_device NETIF_F_RXHASH feature bit.
The NIC hardware was configured to enable RSS hash info in v5.2 via commit
2121c2712f ("igc: Add multiple receive queues control supporting"), but
forgot to set the NETIF_F_RXHASH feature bit.
The original implementation of igc_rx_hash() didn't extract the associated
pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
this patch are about extracting the RSS Type from the hardware and mapping
this to enum pkt_hash_types. This was based on Foxville i225 software user
manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).
For UDP it's worth noting that RSS (type) hashing have been disabled both for
IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
because hardware RSS doesn't handle fragmented pkts well when enabled (can
cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
the effect that netstack will do a software based hash calc calling into
flow_dissect, but only when code calls skb_get_hash(), which doesn't
necessary happen for local delivery.
For QA verification testing I wrote a small bpftrace prog:
[0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt
Fixes: 2121c2712f ("igc: Add multiple receive queues control supporting")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182464270.616355.11391652654430626584.stgit@firesoul
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ebb83d84e4 ]
After commit 8ad075c2eb ("sched: Async unthrottling for cfs
bandwidth"), we may update the rq clock multiple times in the loop of
__cfsb_csd_unthrottle().
A prior (although less common) instance of this problem exists in
unthrottle_offline_cfs_rqs().
Cure both by ensuring update_rq_clock() is called before the loop and
setting RQCF_ACT_SKIP during the loop, to supress further updates.
The alternative would be pulling update_rq_clock() out of
unthrottle_cfs_rq(), but that gives an even bigger mess.
Fixes: 8ad075c2eb ("sched: Async unthrottling for cfs bandwidth")
Reviewed-By: Ben Segall <bsegall@google.com>
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20230613082012.49615-4-jiahao.os@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dda5f312bb ]
The sync_*() ops on arch/arm are defined in terms of the regular bitops
with no special handling. This is not correct, as UP kernels elide
barriers for the fully-ordered operations, and so the required ordering
is lost when such UP kernels are run under a hypervsior on an SMP
system.
Fix this by defining sync ops with the required barriers.
Note: On 32-bit arm, the sync_*() ops are currently only used by Xen,
which requires ARMv7, but the semantics can be implemented for ARMv6+.
Fixes: e54d2f6152 ("xen/arm: sync_bitops")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-2-mark.rutland@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f88130e8a ]
Normally __swp_entry_to_pte() is never called with a value translating
to a valid PTE. The only known exception is pte_swap_tests(), resulting
in a WARN splat in Xen PV guests, as __pte_to_swp_entry() did
translate the PFN of the valid PTE to a guest local PFN, while
__swp_entry_to_pte() doesn't do the opposite translation.
Fix that by using __pte() in __swp_entry_to_pte() instead of open
coding the native variant of it.
For correctness do the similar conversion for __swp_entry_to_pmd().
Fixes: 05289402d7 ("mm/debug_vm_pgtable: add tests validating arch helpers for core MM features")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230306123259.12461-1-jgross@suse.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2fad201fe3 ]
Although, IBS pmus can be invoked via their own interface, indirect
IBS invocation via core pmu events is also supported with fixed set
of events: cpu-cycles:p, r076:p (same as cpu-cycles:p) and r0C1:p
(micro-ops) for user convenience.
This indirect IBS invocation is broken since commit 66d258c5b0
("perf/core: Optimize perf_init_event()"), which added RAW pmu under
'pmu_idr' list and thus if event_init() fails with RAW pmu, it started
returning error instead of trying other pmus.
Forward precise events from core pmu to IBS by overwriting 'type' and
'config' in the kernel copy of perf_event_attr. Overwriting will cause
perf_init_event() to retry with updated 'type' and 'config', which will
automatically forward event to IBS pmu.
Without patch:
$ sudo ./perf record -C 0 -e r076:p -- sleep 1
Error:
The r076:p event is not supported.
With patch:
$ sudo ./perf record -C 0 -e r076:p -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.341 MB perf.data (37 samples) ]
Fixes: 66d258c5b0 ("perf/core: Optimize perf_init_event()")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230504110003.2548-3-ravi.bangoria@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8cd0d8633e ]
The KTAP parser I used to test the KTAP output for ftracetest was overly
robust and did not notice that the test number and pass/fail result were
reversed. Fix this.
Fixes: dbcf76390e ("selftests/ftrace: Improve integration with kselftest runner")
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 375b9ff53c ]
In the unlikely case that CLOCK_REALTIME is not defined, variable ret is
not initialized and further accumulation of return values to ret can leave
ret in an undefined state. Fix this by initialized ret to zero and changing
the assignment of ret to an accumulation for the CLOCK_REALTIME case.
Fixes: 03f55c7952 ("kselftest: Extend vDSO selftest to clock_getres")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a153f254e5 ]
When running as Xen PV initial domain (aka dom0), MTRRs are disabled
by the hypervisor, but the system should nevertheless use correct
cache memory types. This has always kind of worked, as disabled MTRRs
resulted in disabled PAT, too, so that the kernel avoided code paths
resulting in inconsistencies. This bypassed all of the sanity checks
the kernel is doing with enabled MTRRs in order to avoid memory
mappings with conflicting memory types.
This has been changed recently, leading to PAT being accepted to be
enabled, while MTRRs stayed disabled. The result is that
mtrr_type_lookup() no longer is accepting all memory type requests,
but started to return WB even if UC- was requested. This led to
driver failures during initialization of some devices.
In reality MTRRs are still in effect, but they are under complete
control of the Xen hypervisor. It is possible, however, to retrieve
the MTRR settings from the hypervisor.
In order to fix those problems, overwrite the MTRR state via
mtrr_overwrite_state() with the MTRR data from the hypervisor, if the
system is running as a Xen dom0.
Fixes: 72cbc8f04f ("x86/PAT: Have pat_enabled() properly reflect state when running on Xen")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230502120931.20719-6-jgross@suse.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d053b481a5 ]
Replace size_or_mask and size_and_mask with the much easier concept of
high reserved bits.
While at it, instead of using constants in the MTRR code, use some new
[ bp:
- Drop mtrr_set_mask()
- Unbreak long lines
- Move struct mtrr_state_type out of the uapi header as it doesn't
belong there. It also fixes a HDRTEST breakage "unknown type name ‘bool’"
as Reported-by: kernel test robot <lkp@intel.com>
- Massage.
]
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230502120931.20719-3-jgross@suse.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Stable-dep-of: a153f254e5 ("x86/xen: Set MTRR state when running as Xen PV initial domain")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c957f1f3c4 ]
In order to avoid mappings using the UC- cache attribute, set the
MTRR state to use WB caching as the default.
This is needed in order to cope with the fact that PAT is enabled,
while MTRRs are not supported by the hypervisor.
Fixes: 90b926e68f ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230502120931.20719-5-jgross@suse.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29055dc742 ]
When running virtualized, MTRR access can be reduced (e.g. in Xen PV
guests or when running as a SEV-SNP guest under Hyper-V). Typically, the
hypervisor will not advertize the MTRR feature in CPUID data, resulting
in no MTRR memory type information being available for the kernel.
This has turned out to result in problems (Link tags below):
- Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
- Xen PV dom0 mapping memory as WB which should be UC- instead
Solve those problems by allowing an MTRR static state override,
overwriting the empty state used today. In case such a state has been
set, don't call get_mtrr_state() in mtrr_bp_init().
The set state will only be used by mtrr_type_lookup(), as in all other
cases mtrr_enabled() is being checked, which will return false. Accept
the overwrite call only for selected cases when running as a guest.
Disable X86_FEATURE_MTRR in order to avoid any MTRR modifications by
just refusing them.
[ bp: Massage. ]
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/all/4fe9541e-4d4c-2b2a-f8c8-2d34a7284930@nerdbynature.de/
Link: https://lore.kernel.org/lkml/BYAPR21MB16883ABC186566BD4D2A1451D7FE9@BYAPR21MB1688.namprd21.prod.outlook.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Stable-dep-of: c957f1f3c4 ("x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guest")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f6b980646b ]
The physical address width calculation in mtrr_bp_init() can easily be
replaced with using the already available value x86_phys_bits from
struct cpuinfo_x86.
The same information source can be used in mtrr/cleanup.c, removing the
need to pass that value on to mtrr_cleanup().
In print_mtrr_state() use x86_phys_bits instead of recalculating it
from size_or_mask.
Move setting of size_or_mask and size_and_mask into a dedicated new
function in mtrr/generic.c, enabling to make those 2 variables static,
as they are used in generic.c only now.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230502120931.20719-2-jgross@suse.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Stable-dep-of: c957f1f3c4 ("x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guest")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bf5ddd7365 ]
This code-movement-only commit moves the rcu_scale_cleanup() and
rcu_scale_shutdown() functions to follow kfree_scale_cleanup().
This is code movement is in preparation for a bug-fix patch that invokes
kfree_scale_cleanup() from rcu_scale_cleanup().
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Stable-dep-of: 23fc8df26d ("rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b409afe026 ]
The BUSTED-BOOST and TREE03 scenarios specify a mythical tree.use_softirq
module parameter, which means a failure to get full test coverage. This
commit therefore corrects the name to rcutree.use_softirq.
Fixes: e2b949d543 ("rcutorture: Make TREE03 use real-time tree.use_softirq setting")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 401b0de3ae ]
The rcu_tasks_invoke_cbs() function relies on queue_work_on() to silently
fall back to WORK_CPU_UNBOUND when the specified CPU is offline. However,
the queue_work_on() function's silent fallback mechanism relies on that
CPU having been online at some time in the past. When queue_work_on()
is passed a CPU that has never been online, workqueue lockups ensue,
which can be bad for your kernel's general health and well-being.
This commit therefore checks whether a given CPU has ever been online,
and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to
queue_work_on(). Why not simply omit the queue_work_on() call entirely?
Because this function is flooding callback-invocation notifications
to all CPUs, and must deal with possibilities that include a sparse
cpu_possible_mask.
This commit also moves the setting of the rcu_data structure's
->beenonline field to rcu_cpu_starting(), which executes on the
incoming CPU before that CPU has ever enabled interrupts. This ensures
that the required workqueues are present. In addition, because the
incoming CPU has not yet enabled its interrupts, there cannot yet have
been any softirq handlers running on this CPU, which means that the
WARN_ON_ONCE(!rdp->beenonline) within the RCU_SOFTIRQ handler cannot
have triggered yet.
Fixes: d363f833c6 ("rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 15d44dfa40 ]
Currently, rcu_cpu_starting() is written so that it might be invoked
with interrupts enabled. However, it is always called when interrupts
are disabled, either by rcu_init(), notify_cpu_starting(), or from a
call point prior to the call to notify_cpu_starting().
But why bother requiring that interrupts be disabled? The purpose is
to allow the rcu_data structure's ->beenonline flag to be set after all
early processing has completed for the incoming CPU, thus allowing this
flag to be used to determine when workqueues have been set up for the
incoming CPU, while still allowing this flag to be used as a diagnostic
within rcu_core().
This commit therefore makes rcu_cpu_starting() rely on interrupts being
disabled.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Stable-dep-of: 401b0de3ae ("rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e30f65c4b3 ]
Writing `subprocess.Popen[str]` requires python 3.9+.
kunit.py has an assertion that the python version is 3.7+, so we should
try to stay backwards compatible.
This conflicts a bit with commit 1da2e6220e ("kunit: tool: fix
pre-existing `mypy --strict` errors and update run_checks.py"), since
mypy complains like so
> kunit_kernel.py:95: error: Missing type parameters for generic type "Popen" [type-arg]
Note: `mypy --strict --python-version 3.7` does not work.
We could annotate each file with comments like
`# mypy: disable-error-code="type-arg"
but then we might still get nudged to break back-compat in other files.
This patch adds a `mypy.ini` file since it seems like the only way to
disable specific error codes for all our files.
Note: run_checks.py doesn't need to specify `--config_file mypy.ini`,
but I think being explicit is better, particularly since most kernel
devs won't be familiar with how mypy works.
Fixes: 695e260308 ("kunit: tool: add subscripts for type annotations where appropriate")
Reported-by: SeongJae Park <sj@kernel.org>
Link: https://lore.kernel.org/linux-kselftest/20230501171520.138753-1-sj@kernel.org
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Tested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89382022b3 ]
Should an error occur after calling sun8i_ths_resource_init() in the probe
function, some resources need to be released, as already done in the
.remove() function.
Switch to the devm_clk_get_enabled() helper and add a new devm_action to
turn sun8i_ths_resource_init() into a fully managed function.
Move the place where reset_control_deassert() is called so that the
recommended order of reset release/clock enable steps is kept.
A64 manual states that:
3.3.6.4. Gating and reset
Make sure that the reset signal has been released before the release of
module clock gating;
This fixes the issue and removes some LoC at the same time.
Fixes: dccc5c3b6f ("thermal/drivers/sun8i: Add thermal driver for H6/H5/H3/A64/A83T/R40")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/a8ae84bd2dc4b55fe428f8e20f31438bf8bb6762.1684089931.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03f44ffb3d ]
If the intel_pstate driver is set to passive mode, then writing the
same value to the energy_performance_preference sysfs twice will fail.
This is caused by the wrong return value used (index of the matched
energy_perf_string), instead of the length of the passed in parameter.
Fix by forcing the internal return value to zero when the same
preference is passed in by user. This same issue is not present when
active mode is used for the driver.
Fixes: f6ebbcf08f ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Reported-by: Niklas Neronin <niklas.neronin@intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b9c3ddcec ]
checker_stack_use_t32strd() and kprobe_handler() can be made static since
they are not used from other files, while coverage_start_registers()
and __kprobes_test_case() are used from assembler code, and just need
a declaration to avoid a warning with the global definition.
arch/arm/probes/kprobes/checkers-common.c:43:18: error: no previous prototype for 'checker_stack_use_t32strd'
arch/arm/probes/kprobes/core.c:236:16: error: no previous prototype for 'kprobe_handler'
arch/arm/probes/kprobes/test-core.c:723:10: error: no previous prototype for 'coverage_start_registers'
arch/arm/probes/kprobes/test-core.c:918:14: error: no previous prototype for '__kprobes_test_case_start'
arch/arm/probes/kprobes/test-core.c:952:14: error: no previous prototype for '__kprobes_test_case_end_16'
arch/arm/probes/kprobes/test-core.c:967:14: error: no previous prototype for '__kprobes_test_case_end_32'
Fixes: 6624cf651f ("ARM: kprobes: collects stack consumption for store instructions")
Fixes: 454f3e132d ("ARM/kprobes: Remove jprobe arm implementation")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4384a70c88 ]
Commit f38d1a6d00 ("PM: domains: Allocate governor data dynamically
based on a genpd governor") started to use the in-parameters in
genpd_add_device(), without first doing a verification of them.
This isn't really a big problem, as most callers do a verification already.
Therefore, let's drop the verification from genpd_add_device() and make
sure all the callers take care of it instead.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f38d1a6d00 ("PM: domains: Allocate governor data dynamically based on a genpd governor")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4658fe81b3 ]
After commit 3382388d71 ("intel_rapl: abstract RAPL common code"),
accessing to IOSF_MBI interface is done in the RAPL common code.
Thus it is the CONFIG_INTEL_RAPL_CORE that has dependency of
CONFIG_IOSF_MBI, while CONFIG_INTEL_RAPL_MSR does not.
This problem was not exposed previously because all the previous RAPL
common code users, aka, the RAPL MSR and MMIO I/F drivers, have
CONFIG_IOSF_MBI selected.
Fix the CONFIG_IOSF_MBI dependency in RAPL code. This also fixes a build
time failure when the RAPL TPMI I/F driver is introduced without
selecting CONFIG_IOSF_MBI.
x86_64-linux-ld: vmlinux.o: in function `set_floor_freq_atom':
intel_rapl_common.c:(.text+0x2dac9b8): undefined reference to `iosf_mbi_write'
x86_64-linux-ld: intel_rapl_common.c:(.text+0x2daca66): undefined reference to `iosf_mbi_read'
Reference to iosf_mbi.h is also removed from the RAPL MSR I/F driver.
Fixes: 3382388d71 ("intel_rapl: abstract RAPL common code")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20230601213246.3271412-1-arnd@kernel.org
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d05b5e0baf ]
The current initialization of the struct x86_cpu_id via
pl4_support_ids[] is partial and wrong. It is initializing
"stepping" field with "X86_FEATURE_ANY" instead of "feature" field.
Use X86_MATCH_INTEL_FAM6_MODEL macro instead of initializing
each field of the struct x86_cpu_id for pl4_supported list of CPUs.
This X86_MATCH_INTEL_FAM6_MODEL macro internally uses another macro
X86_MATCH_VENDOR_FAM_MODEL_FEATURE for X86 based CPU matching with
appropriate initialized values.
Reported-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/lkml/28ead36b-2d9e-1a36-6f4e-04684e420260@intel.com
Fixes: eb52bc2ae5 ("powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC")
Fixes: b08b95cf30 ("powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P")
Fixes: 5157559069 ("powercap: RAPL: Add Power Limit4 support for RaptorLake")
Fixes: 1cc5b9a411 ("powercap: Add Power Limit4 support for Alder Lake SoC")
Fixes: 8365a898fe ("powercap: Add Power Limit4 support")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9368aa1882 ]
Since 315bada690 ("EDAC: Check for GHES preference in the
chipset-specific EDAC drivers"), vendor specific EDAC driver will not
probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device
is present. Make ghes_get_devices() return NULL when the GHES device
list is empty to fix the problem.
Fixes: 9057a3f7ac ("EDAC/ghes: Prepare to make ghes_edac a proper module")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a6a9f1c5a ]
The driver needs to migrate the perf context if the current using CPU going
to teardown. By the time calling the cpuhp::teardown() callback the
cpu_online_mask() hasn't updated yet and still includes the CPU going to
teardown. In current driver's implementation we may migrate the context
to the teardown CPU and leads to the below calltrace:
...
[ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
[ 368.113699][ T932] Call trace:
[ 368.116834][ T932] __switch_to+0x7c/0xbc
[ 368.120924][ T932] __schedule+0x338/0x6f0
[ 368.125098][ T932] schedule+0x50/0xe0
[ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
[ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
[ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
[ 368.144573][ T932] mutex_lock+0x50/0x60
[ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
[ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
[ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
[ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
[ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
[ 368.175099][ T932] kthread+0x108/0x13c
[ 368.179012][ T932] ret_from_fork+0x10/0x18
...
Use function cpumask_any_but() to find one correct active cpu to fixes
this issue.
Fixes: 8404b0fbc7 ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230608114326.27649-1-hejunhao3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 195edce08b ]
tl;dr: There is a race in the TDX private<=>shared conversion code
which could kill the TDX guest. Fix it by changing conversion
ordering to eliminate the window.
TDX hardware maintains metadata to track which pages are private and
shared. Additionally, TDX guests use the guest x86 page tables to
specify whether a given mapping is intended to be private or shared.
Bad things happen when the intent and metadata do not match.
So there are two thing in play:
1. "the page" -- the physical TDX page metadata
2. "the mapping" -- the guest-controlled x86 page table intent
For instance, an unrecoverable exit to VMM occurs if a guest touches a
private mapping that points to a shared physical page.
In summary:
* Private mapping => Private Page == OK (obviously)
* Shared mapping => Shared Page == OK (obviously)
* Private mapping => Shared Page == BIG BOOM!
* Shared mapping => Private Page == OK-ish
(It will read generate a recoverable #VE via handle_mmio())
Enter load_unaligned_zeropad(). It can touch memory that is adjacent but
otherwise unrelated to the memory it needs to touch. It will cause one
of those unrecoverable exits (aka. BIG BOOM) if it blunders into a
shared mapping pointing to a private page.
This is a problem when __set_memory_enc_pgtable() converts pages from
shared to private. It first changes the mapping and second modifies
the TDX page metadata. It's moving from:
* Shared mapping => Shared Page == OK
to:
* Private mapping => Shared Page == BIG BOOM!
This means that there is a window with a shared mapping pointing to a
private page where load_unaligned_zeropad() can strike.
Add a TDX handler for guest.enc_status_change_prepare(). This converts
the page from shared to private *before* the page becomes private. This
ensures that there is never a private mapping to a shared page.
Leave a guest.enc_status_change_finish() in place but only use it for
private=>shared conversions. This will delay updating the TDX metadata
marking the page private until *after* the mapping matches the metadata.
This also ensures that there is never a private mapping to a shared page.
[ dhansen: rewrite changelog ]
Fixes: 7dbde76316 ("x86/mm/cpa: Add support for TDX shared memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/all/20230606095622.1939-3-kirill.shutemov%40linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8be3593b9e ]
Sidharth reports that on M2, the PMU never generates any interrupt
when using 'perf record', which is a annoying as you get no sample.
I'm temped to say "no sample, no problem", but others may have
a different opinion.
Upon investigation, it appears that the counters on M2 are
significantly different from the ones on M1, as they count on
64 bits instead of 48. Which of course, in the fine M1 tradition,
means that we can only use 63 bits, as the top bit is used to signal
the interrupt...
This results in having to introduce yet another flag to indicate yet
another odd counter width. Who knows what the next crazy implementation
will do...
With this, perf can work out the correct offset, and 'perf record'
works as intended.
Tested on M2 and M2-Pro CPUs.
Cc: Janne Grunau <j@jannau.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Fixes: 7d0bfb7c99 ("drivers/perf: apple_m1: Add Apple M2 support")
Reported-by: Sidharth Kshatriya <sid.kshatriya@gmail.com>
Tested-by: Sidharth Kshatriya <sid.kshatriya@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230528080205.288446-1-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71746c995c ]
It turns out that my naive DTC reset logic fails to work as intended,
since, after checking with the hardware designers, the PMU actually
needs to be fully enabled in order to correctly clear any pending
overflows. Therefore, invert the sequence to start with turning on both
enables so that we can reliably get the DTCs into a known state, then
moving to our normal counters-stopped state from there. Since all the
DTM counters have already been unpaired during the initial discovery
pass, we just need to additionally reset the cycle counters to ensure
that no other unexpected overflows occur during this period.
Fixes: 0ba64770a2 ("perf: Add Arm CMN-600 PMU driver")
Reported-by: Geoff Blake <blakgeof@amazon.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0ea4559261ea394f827c9aee5168c77a60aaee03.1684946389.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e5d1c87220 ]
Currently, while calculating residency and latency values, right
operands may overflow if resulting values are big enough.
To prevent this, albeit unlikely case, play it safe and convert
right operands to left ones' type s64.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 30f604283e ("PM / Domains: Allow domain power states to be read from DT")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b5bf64c89 ]
Smatch reports:
drivers/clocksource/timer-cadence-ttc.c:529 ttc_timer_probe()
warn: 'timer_baseaddr' from of_iomap() not released on lines: 498,508,516.
timer_baseaddr may have the problem of not being released after use,
I replaced it with the devm_of_iomap() function and added the clk_put()
function to cleanup the "clk_ce" and "clk_cs".
Fixes: e932900a32 ("arm: zynq: Use standard timer binding")
Fixes: 70504f311d ("clocksource/drivers/cadence_ttc: Convert init function to return error")
Signed-off-by: Feng Mingxi <m202271825@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Acked-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230425065611.702917-1-m202271825@hust.edu.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2293cae703 ]
In case of real io scheduler, q->elevator is set, so blk_mq_run_hw_queue()
may just check if scheduler queue has request to dispatch, see
__blk_mq_sched_dispatch_requests(). Then IO hang may be caused because
all passthorugh requests may stay in sw queue.
And any passthrough request should have been inserted to hctx->dispatch
always.
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Fixes: d97217e7f0 ("blk-mq: don't queue plugged passthrough requests into scheduler")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230621132208.1142318-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c731cd0b6d ]
If a bio gets split, it needs to have a proper file_offset for checksum
validation and repair to work properly.
Based on feedback from Josef, commit 852eee62d3 ("btrfs: allow
btrfs_submit_bio to split bios") skipped this adjustment for ONE_ORDERED
bios. But if we actually ever need to split a ONE_ORDERED read bio, this
will lead to a wrong file offset in the repair code. Right now the only
user of the file_offset is logging of an error message so this is mostly
harmless, but the wrong offset might be more problematic for additional
users in the future.
Fixes: 852eee62d3 ("btrfs: allow btrfs_submit_bio to split bios")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7027f87108 ]
When extent_write_locked_range was originally added, it was only used
writing back compressed pages from an async helper thread. But it is
now also used for writing back pages on zoned devices, where it is
called directly from the ->writepage context. In this case we want to
be able to pass on the writeback_control instead of creating a new one,
and more importantly want to use all the normal cgroup interaction
instead of potentially deferring writeback to another helper.
Fixes: 898793d992 ("btrfs: zoned: write out partially allocated region")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eb34dceace ]
__extent_writepage does a lot of things that make no sense for
extent_write_locked_range, given that extent_write_locked_range itself is
called from __extent_writepage either directly or through a workqueue,
and all this work has already been done in the first invocation and the
pages haven't been unlocked since. Call __extent_writepage_io directly
instead and open code the logic tracked in
btrfs_bio_ctrl::extent_locked.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 7027f87108 ("btrfs: don't treat zoned writeback as being from an async helper thread")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 973fb26e81 ]
If cow_file_range_async fails to allocate the asynchronous writeback
context, it currently returns an error and entirely fails the writeback.
This is not a good idea as a writeback failure is a non-temporary error
condition that will make the file system unusable. Just fall back to
synchronous uncompressed writeback instead. This requires us to delay
setting the BTRFS_INODE_HAS_ASYNC_EXTENT flag until we've committed to
the async writeback.
The compression checks INODE_NOCOMPRESS and FORCE_COMPRESS are moved
from cow_file_range_async to the preceding checks btrfs_run_delalloc_range().
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 7027f87108 ("btrfs: don't treat zoned writeback as being from an async helper thread")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 36614a3beb ]
The range_end field in struct writeback_control is inclusive, just like
the end parameter passed to extent_write_locked_range. Not doing this
could cause extra writeout, which is harmless but suboptimal.
Fixes: 771ed689d2 ("Btrfs: Optimize compressed writeback and reads")
CC: stable@vger.kernel.org # 5.9+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 7027f87108 ("btrfs: don't treat zoned writeback as being from an async helper thread")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50b21d7a06 ]
Stop trying to cluster writes of multiple extent_buffers into a single
bio. There is no need for that as the blk_plug mechanism used all the
way up in writeback_inodes_wb gives us the same I/O pattern even with
multiple bios. Removing the clustering simplifies
lock_extent_buffer_for_io a lot and will also allow passing the eb
as private data to the end I/O handler.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 7027f87108 ("btrfs: don't treat zoned writeback as being from an async helper thread")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fdd160160 ]
lock_extent_buffer_for_io never returns a negative error value, so switch
the return value to a simple bool.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ keep noinline_for_stack ]
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 7027f87108 ("btrfs: don't treat zoned writeback as being from an async helper thread")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b78b98e06f ]
The btrfs_bio_ctrl machinery is overkill for reading extent_buffers
as we always operate on PAGE_SIZE chunks (or one smaller one for the
subpage case) that are contiguous and are guaranteed to fit into a
single bio. Replace it with open coded btrfs_bio_alloc, __bio_add_page
and btrfs_submit_bio calls in a helper function shared between
the subpage and node size >= PAGE_SIZE cases.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 7027f87108 ("btrfs: don't treat zoned writeback as being from an async helper thread")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e95382834c ]
Currently read_extent_buffer_pages skips pages that are already uptodate
when reading in an extent_buffer. While this reduces the amount of data
read, it increases the number of I/O operations as we now need to do
multiple I/Os when reading an extent buffer with one or more uptodate
pages in the middle of it. On any modern storage device, be that hard
drives or SSDs this actually decreases I/O performance. Fortunately
this case is pretty rare as the pages are always initially read together
and then aged the same way. Besides simplifying the code a bit as-is
this will allow for major simplifications to the I/O completion handler
later on.
Note that the case where all pages are uptodate is still handled by an
optimized fast path that does not read any data from disk.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 7027f87108 ("btrfs: don't treat zoned writeback as being from an async helper thread")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d9e522010 ]
itimer_delete() has a retry loop when the timer is concurrently expired. On
non-RT kernels this just spin-waits until the timer callback has completed,
except for posix CPU timers which have HAVE_POSIX_CPU_TIMERS_TASK_WORK
enabled.
In that case and on RT kernels the existing task could live lock when
preempting the task which does the timer delivery.
Replace spin_unlock() with an invocation of timer_wait_running() to handle
it the same way as the other retry loops in the posix timer code.
Fixes: ec8f954a40 ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/87v8g7c50d.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 001b8ccd06 ]
In compact 4B, two adjacent lclusters are packed together as a unit to
form on-disk indexes for effective random access, as below:
(amortized = 4, vcnt = 2)
_____________________________________________
|___@_____ encoded bits __________|_ blkaddr _|
0 . amortized * vcnt = 8
. .
. . amortized * vcnt - 4 = 4
. .
.____________________________.
|_type (2 bits)_|_clusterofs_|
Therefore, encoded bits for each pack are 32 bits (4 bytes). IOWs,
since each lcluster can get 16 bits for its type and clusterofs, the
maximum supported lclustersize for compact 4B format is 16k (14 bits).
Fix this to enable compact 4B format for 16k lclusters (blocks), which
is tested on an arm64 server with 16k page size.
Fixes: 152a333a58 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230601112341.56960-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit baf6d18b11 ]
I noticed that svc_rqst_release_pages() was still unnecessarily
releasing a page when svc_rdma_recvfrom() returns zero.
Fixes: a53d5cb064 ("svcrdma: Avoid releasing a page in svc_xprt_release()")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 48f31e4964 ]
While compiling with W=1, both gcc and clang complain about a
tricky way to initialize an array by filling it with a non-zero
value and then overrride some of the array elements.
In this case the override is intentional, so just disable the
specific warning for only this part of the code.
Note: the flag "-Woverride-init" is recognized by both compilers,
but the warning msg from clang reports "-Winitializer-overrides".
The doc of clang clarifies that the two flags are synonyms, so use
here only the flag name common on both compilers.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Fixes: c297493336 ("irqchip/stm32-exti: Simplify irq description table")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230601155614.34490-1-antonio.borneo@foss.st.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fb07b8f834 ]
The hierarchy of PCH PIC, PCH PCI MSI and EIONTC is as following:
PCH PIC ------->|
|---->EIOINTC
PCH PCI MSI --->|
so the irq_data list of irq_desc for IRQs on PCH PIC and PCH PCI MSI
is like this:
irq_desc->irq_data(domain: PCH PIC)->parent_data(domain: EIOINTC)
irq_desc->irq_data(domain: PCH PCI MSI)->parent_data(domain: EIOINTC)
In eiointc_resume(), the irq_data passed into eiointc_set_irq_affinity()
should be matched to EIOINTC domain instead of PCH PIC or PCH PCI MSI
domain, so fix it.
Fixes: a90335c2df ("irqchip/loongson-eiointc: Add suspend/resume support")
Reported-by: yangqiming <yangqiming@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230614115936.5950-6-lvjianmin@loongson.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd7de3704a ]
Commit 99d055b4fd ("block: remove per-disk debugfs files in
blk_unregister_queue") moves blk_trace_shutdown() from
blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
is created through sysfs, however, there is a regression in corner
case.
blktrace can still be enabled after del_gendisk() through ioctl if
the disk is opened before del_gendisk(), and if blktrace is not shutdown
through ioctl before closing the disk, debugfs entries will be leaked.
Fix this problem by shutdown blktrace in disk_release(), this is safe
because blk_trace_remove() is reentrant.
Fixes: 99d055b4fd ("block: remove per-disk debugfs files in blk_unregister_queue")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230610022003.2557284-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7db922bae3 ]
Commit 6cce3b23f6 ("[PATCH] md: write intent bitmap support for raid10")
add bitmap support, and it changed that write io is submitted through
daemon thread because bitmap need to be updated before write io. And
later, plug is used to fix performance regression because all the write io
will go to demon thread, which means io can't be issued concurrently.
However, if bitmap is not enabled, the write io should not go to daemon
thread in the first place, and plug is not needed as well.
Fixes: 6cce3b23f6 ("[PATCH] md: write intent bitmap support for raid10")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230529131106.2123367-5-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ae6aaf769 ]
When removing a disk with replacement, the replacement will be used to
replace rdev. During this process, there is a brief window in which both
rdev and replacement are read as NULL in raid10_write_request(). This
will result in io not being submitted but it should be.
//remove //write
raid10_remove_disk raid10_write_request
mirror->rdev = NULL
read rdev -> NULL
mirror->rdev = mirror->replacement
mirror->replacement = NULL
read replacement -> NULL
Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
to prevent memory reordering.
Fixes: 475b0321a4 ("md/raid10: writes should get directed to replacement as well as original.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230602091839.743798-3-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34817a2441 ]
There are two check of 'mreplace' in raid10_sync_request(). In the first
check, 'need_replace' will be set and 'mreplace' will be used later if
no-Faulty 'mreplace' exists, In the second check, 'mreplace' will be
set to NULL if it is Faulty, but 'need_replace' will not be changed
accordingly. null-ptr-deref occurs if Faulty is set between two check.
Fix it by merging two checks into one. And replace 'need_replace' with
'mreplace' because their values are always the same.
Fixes: ee37d7314a ("md/raid10: Fix raid10 replace hang when new added disk faulty")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230527072218.2365857-2-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 301867b1c1 ]
If we write a large number to md/bitmap_set_bits, md_bitmap_checkpage()
will return -EINVAL because 'page >= bitmap->pages', but the return value
was not checked immediately in md_bitmap_get_counter() in order to set
*blocks value and slab-out-of-bounds occurs.
Move check of 'page >= bitmap->pages' to md_bitmap_get_counter() and
return directly if true.
Fixes: ef42567335 ("md/bitmap: optimise scanning of empty bitmaps.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230515134808.3936750-2-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 665e89ab7c ]
The below-mentioned patch was intended to simplify refcounting on the
svc_serv used by locked. The goal was to only ever have a single
reference from the single thread. To that end we dropped a call to
lockd_start_svc() (except when creating thread) which would take a
reference, and dropped the svc_put(serv) that would drop that reference.
Unfortunately we didn't also remove the svc_get() from
lockd_create_svc() in the case where the svc_serv already existed.
So after the patch:
- on the first call the svc_serv was allocated and the one reference
was given to the thread, so there are no extra references
- on subsequent calls svc_get() was called so there is now an extra
reference.
This is clearly not consistent.
The inconsistency is also clear in the current code in lockd_get()
takes *two* references, one on nlmsvc_serv and one by incrementing
nlmsvc_users. This clearly does not match lockd_put().
So: drop that svc_get() from lockd_get() (which used to be in
lockd_create_svc().
Reported-by: Ido Schimmel <idosch@idosch.org>
Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u
Fixes: b73a297204 ("lockd: move lockd_start_svc() call into lockd_create_svc()")
Signed-off-by: NeilBrown <neilb@suse.de>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4f1731df60 ]
In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating
'wake_batch' is not atomic:
t1: t2:
_blk_mq_tag_busy blk_mq_tag_busy
inc active_queues
// assume 1->2
inc active_queues
// 2 -> 3
blk_mq_update_wake_batch
// calculate based on 3
blk_mq_update_wake_batch
/* calculate based on 2, while active_queues is actually 3. */
Fix this problem by protecting them wih 'tags->lock', this is not a hot
path, so performance should not be concerned. And now that all writers
are inside the lock, switch 'actives_queues' from atomic to unsigned
int.
Fixes: 180dccb0db ("blk-mq: fix tag_get wait task can't be awakened")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84b9b44b99 ]
This driver fails to link when CRYPTO is disabled, or in a loadable
module:
WARNING: unmet direct dependencies detected for CRYPTO_GCM
WARNING: unmet direct dependencies detected for CRYPTO_AEAD2
Depends on [m]: CRYPTO [=m]
Selected by [y]:
- SEV_GUEST [=y] && VIRT_DRIVERS [=y] && AMD_MEM_ENCRYPT [=y]
x86_64-linux-ld: crypto/aead.o: in function `crypto_register_aeads':
Fixes: fce96cf044 ("virt: Add SEV-SNP guest driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230117171416.2715125-1-arnd@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d2af77e31 ]
When blkg_alloc() is called to allocate a blkcg_gq structure
with the associated blkg_iostat_set's, there are 2 fields within
blkg_iostat_set that requires proper initialization - blkg & sync.
The former field was introduced by commit 3b8cc62987 ("blk-cgroup:
Optimize blkcg_rstat_flush()") while the later one was introduced by
commit f733164829 ("blk-cgroup: reimplement basic IO stats using
cgroup rstat").
Unfortunately those fields in the blkg_iostat_set's are not properly
re-initialized when they are cleared in v1's blkcg_reset_stats(). This
can lead to a kernel panic due to NULL pointer access of the blkg
pointer. The missing initialization of sync is less problematic and
can be a problem in a debug kernel due to missing lockdep initialization.
Fix these problems by re-initializing them after memory clearing.
Fixes: 3b8cc62987 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Fixes: f733164829 ("blk-cgroup: reimplement basic IO stats using cgroup rstat")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230606180724.2455066-1-longman@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5dee19b6b2 ]
When calculating an end address based on an unsigned int number of pages,
any value greater than or equal to 0x100000 that is shift PAGE_SHIFT bits
results in a 0 value, resulting in an invalid end address. Change the
number of pages variable in various routines from an unsigned int to an
unsigned long to calculate the end address correctly.
Fixes: 5e5ccff60a ("x86/sev: Add helper for validating pages in early enc attribute changes")
Fixes: dc3f3d2474 ("x86/mm: Validate memory when changing the C-bit")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/6a6e4eea0e1414402bac747744984fa4e9c01bb6.1686063086.git.thomas.lendacky@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1f0c5ea04 ]
bio_first_page_all(bio)->mapping->host is not compatible with large
folios, since the first page of the bio is not necessarily the head page
of the folio, and therefore it might not have the mapping pointer set.
Therefore, move the dereference of ->mapping->host into
verify_data_blocks(), which works with a folio.
(Like the commit that this Fixes, this hasn't actually been tested with
large folios yet, since the filesystems that use fs/verity/ don't
support that yet. But based on code review, I think this is needed.)
Fixes: 5d0f0e57ed ("fsverity: support verifying data from large folios")
Link: https://lore.kernel.org/r/20230604022101.48342-1-ebiggers@kernel.org
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8fcd94add6 ]
The "ahash" API, like the other scatterlist-based crypto APIs such as
"skcipher", comes with some well-known limitations. First, it can't
easily be used with vmalloc addresses. Second, the request struct can't
be allocated on the stack. This adds complexity and a possible failure
point that needs to be worked around, e.g. using a mempool.
The only benefit of ahash over "shash" is that ahash is needed to access
traditional memory-to-memory crypto accelerators, i.e. drivers/crypto/.
However, this style of crypto acceleration has largely fallen out of
favor and been superseded by CPU-based acceleration or inline crypto
engines. Also, ahash needs to be used asynchronously to take full
advantage of such hardware, but fs/verity/ has never done this.
On all systems that aren't actually using one of these ahash-only crypto
accelerators, ahash just adds unnecessary overhead as it sits between
the user and the underlying shash algorithms.
Also, XFS is planned to cache fsverity Merkle tree blocks in the
existing XFS buffer cache. As a result, it will be possible for a
single Merkle tree block to be split across discontiguous pages
(https://lore.kernel.org/r/20230405233753.GU3223426@dread.disaster.area).
This data will need to be hashed. It is easiest to work with a vmapped
address in this case. However, ahash is incompatible with this.
Therefore, let's convert fs/verity/ from ahash to shash. This
simplifies the code, and it should also slightly improve performance for
everyone who wasn't actually using one of these ahash-only crypto
accelerators, i.e. almost everyone (or maybe even everyone)!
Link: https://lore.kernel.org/r/20230516052306.99600-1-ebiggers@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Stable-dep-of: d1f0c5ea04 ("fsverity: don't use bio_first_page_all() in fsverity_verify_bio()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 967c28b23f ]
After heavily stressing EROFS with several images which include a
hand-crafted image of repeated patterns for more than 46 days, I found
two chains could be linked with each other almost simultaneously and
form a loop so that the entire loop won't be submitted. As a
consequence, the corresponding file pages will remain locked forever.
It can be _only_ observed on data-deduplicated compressed images.
For example, consider two chains with five pclusters in total:
Chain 1: 2->3->4->5 -- The tail pcluster is 5;
Chain 2: 5->1->2 -- The tail pcluster is 2.
Chain 2 could link to Chain 1 with pcluster 5; and Chain 1 could link
to Chain 2 at the same time with pcluster 2.
Since hooked chains are all linked locklessly now, I have no idea how
to simply avoid the race. Instead, let's avoid hooked chains completely
until I could work out a proper way to fix this and end users finally
tell us that it's needed to add it back.
Actually, this optimization can be found with multi-threaded workloads
(especially even more often on deduplicated compressed images), yet I'm
not sure about the overall system impacts of not having this compared
with implementation complexity.
Fixes: 267f2492c8 ("erofs: introduce multi-reference pclusters (fully-referenced)")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Link: https://lore.kernel.org/r/20230526201459.128169-4-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a13bd91be2 ]
commit 50e34d7881 ("block: disable the elevator int del_gendisk")
move rq_qos_exit() from disk_release() to del_gendisk(), this will
introduce some problems:
1) If rq_qos_add() is triggered by enabling iocost/iolatency through
cgroupfs, then it can concurrent with del_gendisk(), it's not safe to
write 'q->rq_qos' concurrently.
2) Activate cgroup policy that is relied on rq_qos will call
rq_qos_add() and blkcg_activate_policy(), and if rq_qos_exit() is
called in the middle, null-ptr-dereference will be triggered in
blkcg_activate_policy().
3) blkg_conf_open_bdev() can call blkdev_get_no_open() first to find the
disk, then if rq_qos_exit() from del_gendisk() is done before
rq_qos_add(), then memory will be leaked.
This patch add a new disk level mutex 'rq_qos_mutex':
1) The lock will protect rq_qos_exit() directly.
2) For wbt that doesn't relied on blk-cgroup, rq_qos_add() can only be
called from disk initialization for now because wbt can't be
destructed until rq_qos_exit(), so it's safe not to protect wbt for
now. Hoever, in case that rq_qos dynamically destruction is supported
in the furture, this patch also protect rq_qos_add() from wbt_init()
directly, this is enough because blk-sysfs already synchronize
writers with disk removal.
3) For iocost and iolatency, in order to synchronize disk removal and
cgroup configuration, the lock is held after blkdev_get_no_open()
from blkg_conf_open_bdev(), and is released in blkg_conf_exit().
In order to fix the above memory leak, disk_live() is checked after
holding the new lock.
Fixes: 50e34d7881 ("block: disable the elevator int del_gendisk")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230414084008.2085155-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ed8b50975 ]
Since commit 3b5c3f000c2e ("s390/kasan: move shadow mapping
to decompressor") the decompressor establishes mappings for
the shadow memory and sets initial protection attributes to
RWX. The decompressed kernel resets protection to RW+NX
later on.
In case a shadow memory range is not aligned on page boundary
(e.g. as result of mem= kernel command line parameter use),
the "Checked W+X mappings: FAILED, 1 W+X pages found" warning
hits.
Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Fixes: 557b19709d ("s390/kasan: move shadow mapping to decompressor")
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 247c8d2f98 ]
A couple of functions from fs/pipe.c are used both internally
and for the watch queue code, but the declaration is only
visible when the latter is enabled:
fs/pipe.c:1254:5: error: no previous prototype for 'pipe_resize_ring'
fs/pipe.c:758:15: error: no previous prototype for 'account_pipe_buffers'
fs/pipe.c:764:6: error: no previous prototype for 'too_many_pipe_buffers_soft'
fs/pipe.c:771:6: error: no previous prototype for 'too_many_pipe_buffers_hard'
fs/pipe.c:777:6: error: no previous prototype for 'pipe_is_unprivileged_user'
Make the visible unconditionally to avoid these warnings.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Message-Id: <20230516195629.551602-1-arnd@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a4cba07e64 upstream.
Apparently, despite the name Digital Input Configuration Register, the
settings in the DIN_CONFIGx registers also affect other channel
functions. In particular, setting a non-zero value in the DIN_SINK
field breaks the resistance measurement function.
Now, one can of course argue that specifying a drive-strength-microamp
property along with a adi,ch-func which is not one of the digital
input functions is a bug in the device tree. However, we have a rather
complicated setup with instances of ad74412r on external hardware
modules, and have set a default drive-strength-microamp in our DT
fragments describing those, merely modifying the adi,ch-func settings
to reflect however the modules have been wired up. And restricting
this setting to just being done for digital input doesn't make the
driver any more complex.
Fixes: 504eb48558 (iio: ad74413r: wire up support for drive-strength-microamp property)
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20230503105042.453755-1-linux@rasmusvillemoes.dk
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3adbaa30d9 upstream.
The driver can register a typec port if suitable firmware properties are
present. But if the driver is removed through sysfs unbind, rmmod or
similar, then it does not clean up after itself and the typec port
device remains registered. This can be seen in sysfs, where stale typec
ports get left over in /sys/class/typec.
In order to fix this we have to add an i2c_driver remove function and
call typec_unregister_port(), which is a no-op in the case where no
typec port is created and the pointer remains NULL.
In the process we should also put the fwnode_handle when the typec port
isn't registered anymore, including if an error occurs during probe. The
typec subsystem does not increase or decrease the reference counter for
us, so we track it in the driver's private data.
Note that the conditional check on TYPEC_PWR_MODE_PD was removed in the
probe path because a call to tusb320_set_adv_pwr_mode() will perform an
even more robust validation immediately after, hence there is no
functional change here.
Fixes: bf7571c00d ("extcon: usbc-tusb320: Add USB TYPE-C support")
Cc: stable@vger.kernel.org
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c0aabed9ca upstream.
In scenarios where pullup relies on resume (get sync) to initialize
the controller and set the run stop bit, then core_init is followed by
gadget_resume which will eventually set run stop bit.
But in cases where the core_init fails, the return value is not sent
back to udc appropriately. So according to UDC the controller has
started but in reality we never set the run stop bit.
On systems like Android, there are uevents sent to HAL depending on
whether the configfs_bind / configfs_disconnect were invoked. In the
above mentioned scnenario, if the core init fails, the run stop won't
be set and the cable plug-out won't result in generation of any
disconnect event and userspace would never get any uevent regarding
cable plug out and we never call pullup(0) again. Furthermore none of
the next Plug-In/Plug-Out's would be known to configfs.
Return back the appropriate result to UDC to let the userspace/
configfs know that the pullup failed so they can take appropriate
action.
Fixes: 77adb8bdf4 ("usb: dwc3: gadget: Allow runtime suspend if UDC unbinded")
Cc: stable <stable@kernel.org>
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Message-ID: <20230618120949.14868-1-quic_kriskura@quicinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ffa5f7a3bf upstream.
The new LARA-R6 product variant identified by the "01B" string can be
configured (by AT interface) in three different USB modes:
* Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial
interfaces
* RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial
interfaces and 1 RmNet virtual network interface
* CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial
interface and 1 CDC-ECM virtual network interface
The first 4 interfaces of all the 3 USB configurations (default, RmNet,
CDC-ECM) are the same.
In default mode LARA-R6 01B exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parser/alternative functions
In RmNet mode LARA-R6 01B exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parser/alternative functions
If 4: RMNET interface
In CDC-ECM mode LARA-R6 01B exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parser/alternative functions
If 4: CDC-ECM interface
Signed-off-by: Davide Tronchin <davide.tronchin.94@gmail.com>
Link: https://lore.kernel.org/r/20230622092921.12651-1-davide.tronchin.94@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb49c45532 upstream.
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c7873e336 upstream.
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33313a747e upstream.
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock. This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified. Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c137381f71 upstream.
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24be4d0b46 upstream.
Commit ae870a68b5 ("arm64/mm: Convert to using
lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if
CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the
ifdef.
As a result, building kernel without the config fails with undeclared
variable error as below:
arch/arm64/mm/fault.c: In function 'do_page_fault':
arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'?
624 | vma = lock_mm_and_find_vma(mm, addr, regs);
| ^~~
| vmap
Fix it by moving the declaration out of the ifdef.
Fixes: ae870a68b5 ("arm64/mm: Convert to using lock_mm_and_find_vma()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 249bed821b upstream.
The version is fetched once in check_version(), which then does some
validation and then overwrites the version in userspace with the API
version supported by the kernel. copy_params() then fetches the version
from userspace *again*, and this time no validation is done. The result
is that the kernel's version number is completely controllable by
userspace, provided that userspace can win a race condition.
Fix this flaw by not copying the version back to the kernel the second
time. This is not exploitable as the version is not further used in the
kernel. However, it could become a problem if future patches start
relying on the version field.
Cc: stable@vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1b37563ca upstream.
gtags considers any file outside of its current working directory
"outside the source tree" and refuses to index it. For O= kernel builds,
or when "make" is invoked from a directory other then the kernel source
tree, gtags ignores the entire kernel source and generates an empty
index.
Force-set gtags current working directory to the kernel source tree.
Due to commit 9da0763bdd ("kbuild: Use relative path when building in
a subdir of the source tree"), if the kernel build is done in a
sub-directory of the kernel source tree, the kernel Makefile will set
the kernel's $srctree to ".." for shorter compile-time and run-time
warnings. Consequently, the list of files to be indexed will be in the
"../*" form, rendering all such paths invalid once gtags switches to the
kernel source tree as its current working directory.
If gtags indexing is requested and the build directory is not the kernel
source tree, index all files in absolute-path form.
Note, indexing in absolute-path form will not affect the generated
index, as paths in gtags indices are always relative to the gtags "root
directory" anyway (as evidenced by "gtags --dump").
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f0220af58 upstream.
commit eb0764b822 ("cxl/port: Enable the HDM decoder capability for switch ports")
...was added on the observation of CXL memory not being accessible after
setting up a region on a "cold-plugged" device. A "cold-plugged" CXL
device is one that was not present at boot, so platform-firmware/BIOS
has no chance to set it up.
While it is true that the debug found the enable bit clear in the
host-bridge's instance of the global control register (CXL 3.0
8.2.4.19.2 CXL HDM Decoder Global Control Register), that bit is
described as:
"This bit is only applicable to CXL.mem devices and shall
return 0 on CXL Host Bridges and Upstream Switch Ports."
So it is meant to be zero, and further testing confirmed that this "fix"
had no effect on the failure. Revert it, and be more vigilant about
proposed fixes in the future. Since the original copied stable@, flag
this revert for stable@ as well.
Cc: <stable@vger.kernel.org>
Fixes: eb0764b822 ("cxl/port: Enable the HDM decoder capability for switch ports")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168685882012.3475336.16733084892658264991.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f66066bc51 upstream.
While our user stacks can grow either down (all common architectures) or
up (parisc and the ia64 register stack), the initial stack setup when we
copy the argument and environment strings to the new stack at execve()
time is always done by extending the stack downwards.
But it turns out that in commit 8d7071af89 ("mm: always expand the
stack with the mmap write lock held"), as part of making the stack
growing code more robust, 'expand_downwards()' was now made to actually
check the vma flags:
if (!(vma->vm_flags & VM_GROWSDOWN))
return -EFAULT;
and that meant that this execve-time stack expansion started failing on
parisc, because on that architecture, the stack flags do not contain the
VM_GROWSDOWN bit.
At the same time the new check in expand_downwards() is clearly correct,
and simplified the callers, so let's not remove it.
The solution is instead to just codify the fact that yes, during
execve(), the stack grows down. This not only matches reality, it ends
up being particularly simple: we already have special execve-time flags
for the stack (VM_STACK_INCOMPLETE_SETUP) and use those flags to avoid
page migration during this setup time (see vma_is_temporary_stack() and
invalid_migration_vma()).
So just add VM_GROWSDOWN to that set of temporary flags, and now our
stack flags automatically match reality, and the parisc stack expansion
works again.
Note that the VM_STACK_INCOMPLETE_SETUP bits will be cleared when the
stack is finalized, so we only add the extra VM_GROWSDOWN bit on
CONFIG_STACK_GROWSUP architectures (ie parisc) rather than adding it in
general.
Link: https://lore.kernel.org/all/612eaa53-6904-6e16-67fc-394f4faa0e16@bell.net/
Link: https://lore.kernel.org/all/5fd98a09-4792-1433-752d-029ae3545168@gmx.de/
Fixes: 8d7071af89 ("mm: always expand the stack with the mmap write lock held")
Reported-by: John David Anglin <dave.anglin@bell.net>
Reported-and-tested-by: Helge Deller <deller@gmx.de>
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 112a7f9c8e upstream.
ACPI r6.5, sec 6.5.4, describes how AML is unable to access an
OperationRegion unless _REG has been called to connect a handler:
The OS runs _REG control methods to inform AML code of a change in the
availability of an operation region. When an operation region handler is
unavailable, AML cannot access data fields in that region. (Operation
region writes will be ignored and reads will return indeterminate data.)
The PCI core does not call _REG at any time, leading to the undefined
behavior mentioned in the spec.
The spec explains that _REG should be executed to indicate whether a
given region can be accessed:
Once _REG has been executed for a particular operation region, indicating
that the operation region handler is ready, a control method can access
fields in the operation region. Conversely, control methods must not
access fields in operation regions when _REG method execution has not
indicated that the operation region handler is ready.
An example included in the spec demonstrates calling _REG when devices are
turned off: "when the host controller or bridge controller is turned off
or disabled, PCI Config Space Operation Regions for child devices are
no longer available. As such, ETH0’s _REG method will be run when it
is turned off and will again be run when PCI1 is turned off."
It is reported that ASMedia PCIe GPIO controllers fail functional tests
after the system has returning from suspend (S3 or s2idle). This is because
the BIOS checks whether the OSPM has called the _REG method to determine
whether it can interact with the OperationRegion assigned to the device as
part of the other AML called for the device.
To fix this issue, call acpi_evaluate_reg() when devices are transitioning
to D3cold or D0.
[bhelgaas: split pci_power_t checking to preliminary patch]
Link: https://uefi.org/specs/ACPI/6.5/06_Device_Configuration.html#reg-region
Link: https://lore.kernel.org/r/20230620140451.21007-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5557b62634 upstream.
Previously acpi_pci_set_power_state() assumed the requested power state was
valid (PCI_D0 ... PCI_D3cold). If a caller supplied something else, we
could index outside the state_conv[] array and pass junk to
acpi_device_set_power().
Validate the pci_power_t parameter and return -EINVAL if it's invalid.
Link: https://lore.kernel.org/r/20230621222857.GA122930@bhelgaas
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 03f889378f upstream.
MMU version of lock_mm_and_find_vma releases the mm lock before
returning when VMA is not found. Do the same in noMMU version.
This fixes hang on an attempt to handle protection fault.
Fixes: d85a143b69 ("xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d85a143b69 upstream.
It turns out that xtensa has a really odd configuration situation: you
can do a no-MMU config, but still have the page fault code enabled.
Which doesn't sound all that sensible, but it turns out that xtensa can
have protection faults even without the MMU, and we have this:
config PFAULT
bool "Handle protection faults" if EXPERT && !MMU
default y
help
Handle protection faults. MMU configurations must enable it.
noMMU configurations may disable it if used memory map never
generates protection faults or faults are always fatal.
If unsure, say Y.
which completely violated my expectations of the page fault handling.
End result: Guenter reports that the xtensa no-MMU builds all fail with
arch/xtensa/mm/fault.c: In function ‘do_page_fault’:
arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’
because I never exposed the new lock_mm_and_find_vma() function for the
no-MMU case.
Doing so is simple enough, and fixes the problem.
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: a050ba1e74 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e55e5df193 upstream.
As already mentioned in my merge message for the 'expand-stack' branch,
we have something like 24 different versions of the page fault path for
all our different architectures, all just _slightly_ different due to
various historical reasons (usually related to exactly when they
branched off the original i386 version, and the details of the other
architectures they had in their history).
And a few of them had some silly mistake in the conversion.
Most of the architectures call the faulting address 'address' in the
fault path. But not all. Some just call it 'addr'. And if you end up
doing a bit too much copy-and-paste, you end up with the wrong version
in the places that do it differently.
In this case it was csky.
Fixes: a050ba1e74 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea3f827287 upstream.
In commit 8d7071af89 ("mm: always expand the stack with the mmap write
lock held") I tried to deal with the remaining odd page fault handling
cases. The oddest one is ia64, which has stacks that grow both up and
down. And because ia64 was _so_ odd, I asked people to verify the end
result.
But a close second oddity is parisc, which is the only one that has a
main stack growing up (our "CONFIG_STACK_GROWSUP" config option). But
it looked obvious enough that I didn't worry about it.
I should have worried a bit more. Not because it was particularly
complex, but because I just used the wrong variable name.
The previous vma isn't called "prev", it's called "prev_vma". Blush.
Fixes: 8d7071af89 ("mm: always expand the stack with the mmap write lock held")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b26eadbf2 upstream.
The sparc32 conversion to lock_mm_and_find_vma() in commit a050ba1e74
("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()")
missed the fact that we didn't actually have a 'regs' pointer available
in the 'force_user_fault()' case.
It's there in the regular page fault path ("do_sparc_fault()"), but not
the window underflow/overflow paths.
Which is all fine - we can just pass in a NULL pointer. The register
state is only used to avoid deadlock with kernel faults, which is not
the case for any of these register window faults.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a050ba1e74 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8c716bc68 upstream.
There is no xas_pause(&xas) in collapse_file()'s main loop, at the points
where it does xas_unlock_irq(&xas) and then continues.
That would explain why, once two weeks ago and twice yesterday, I have
hit the VM_BUG_ON_PAGE(page != xas_load(&xas), page) since "mm/khugepaged:
fix iteration in collapse_file" removed the xas_set(&xas, index) just
before it: xas.xa_node could be left pointing to a stale node, if there
was concurrent activity on the file which transformed its xarray.
I tried inserting xas_pause()s, but then even bootup crashed on that
VM_BUG_ON_PAGE(): there appears to be a subtle "nextness" implicit in
xas_pause().
xas_next() and xas_pause() are good for use in simple loops, but not in
this one: xas_set() worked well until now, so use xas_set(&xas, index)
explicitly at the head of the loop; and change that VM_BUG_ON_PAGE() not
to need its own xas_set(), and not to interfere with the xa_state (which
would probably stop the crashes from xas_pause(), but I trust that less).
The user-visible effects of this bug (if VM_BUG_ONs are configured out)
would be data loss and data leak - potentially - though in practice I
expect it is more likely that a subsequent check (e.g. on mapping or on
nr_none) would notice an inconsistency, and just abandon the collapse.
Link: https://lore.kernel.org/linux-mm/f18e4b64-3f88-a8ab-56cc-d1f5f9c58d4@google.com/
Fixes: c8a8f3b4a9 ("mm/khugepaged: fix iteration in collapse_file")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Stevens <stevensd@chromium.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a425ac5365 upstream.
It feels very unlikely that anybody would want to do a GUP in an
unmapped area under the stack pointer, but real users sometimes do some
really strange things. So add a (temporary) warning for the case where
a GUP fails and expanding the stack might have made it work.
It's trivial to do the expansion in the caller as part of getting the mm
lock in the first place - see __access_remote_vm() for ptrace, for
example - it's just that it's unnecessarily painful to do it deep in the
guts of the GUP lookup when we might have to drop and re-take the lock.
I doubt anybody actually does anything quite this strange, but let's be
proactive: adding these warnings is simple, and will make debugging it
much easier if they trigger.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d7071af89 upstream.
This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from the vm helper functions again.
For some cases, we just avoid expanding the stack at all: drivers and
page pinning really shouldn't be extending any stacks. Let's see if any
strange users really wanted that.
It's worth noting that architectures that weren't converted to the new
lock_mm_and_find_vma() helper function are left using the legacy
"expand_stack()" function, but it has been changed to drop the mmap_lock
and take it for writing while expanding the vma. This makes it fairly
straightforward to convert the remaining architectures.
As a result of dropping and re-taking the lock, the calling conventions
for this function have also changed, since the old vma may no longer be
valid. So it will now return the new vma if successful, and NULL - and
the lock dropped - if the area could not be extended.
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64
Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f313c51d26 upstream.
This is a small step towards a model where GUP itself would not expand
the stack, and any user that needs GUP to not look up existing mappings,
but actually expand on them, would have to do so manually before-hand,
and with the mm lock held for writing.
It turns out that execve() already did almost exactly that, except it
didn't take the mm lock at all (it's single-threaded so no locking
technically needed, but it could cause lockdep errors). And it only did
it for the CONFIG_STACK_GROWSUP case, since in that case GUP has
obviously never expanded the stack downwards.
So just make that CONFIG_STACK_GROWSUP case do the right thing with
locking, and enable it generally. This will eventually help GUP, and in
the meantime avoids a special case and the lockdep issue.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f440fa1ac9 upstream.
Make calls to extend_vma() and find_extend_vma() fail if the write lock
is required.
To avoid making this a flag-day event, this still allows the old
read-locking case for the trivial situations, and passes in a flag to
say "is it write-locked". That way write-lockers can say "yes, I'm
being careful", and legacy users will continue to work in all the common
cases until they have been fully converted to the new world order.
Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cd76c50d0 upstream.
This is one of the simple cases, except there's no pt_regs pointer.
Which is fine, as lock_mm_and_find_vma() is set up to work fine with a
NULL pt_regs.
Powerpc already enabled LOCK_MM_AND_FIND_VMA for the main CPU faulting,
so we can just use the helper without any extra work.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a050ba1e74 upstream.
This does the simple pattern conversion of alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa to the lock_mm_and_find_vma()
helper. They all have the regular fault handling pattern without odd
special cases.
The remaining architectures all have something that keeps us from a
straightforward conversion: ia64 and parisc have stacks that can grow
both up as well as down (and ia64 has special address region checks).
And m68k, microblaze, openrisc, sparc64, and um end up having extra
rules about only expanding the stack down a limited amount below the
user space stack pointer. That is something that x86 used to do too
(long long ago), and it probably could just be skipped, but it still
makes the conversion less than trivial.
Note that this conversion was done manually and with the exception of
alpha without any build testing, because I have a fairly limited cross-
building environment. The cases are all simple, and I went through the
changes several times, but...
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b35ca3e45 upstream.
arm has an additional check for address < FIRST_USER_ADDRESS before
expanding the stack. Since FIRST_USER_ADDRESS is defined everywhere
(generally as 0), move that check to the generic expand_downwards().
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae870a68b5 upstream.
This converts arm64 to use the new page fault helper. It was very
straightforward, but still needed a fix for the "obvious" conversion I
initially did. Thanks to Suren for the fix and testing.
Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com>
Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eda0047296 upstream.
This is done as a separate patch from introducing the new
lock_mm_and_find_vma() helper, because while it's an obvious change,
it's not what x86 used to do in this area.
We already abort the page fault on fatal signals anyway, so why should
we wait for the mmap lock only to then abort later? With the new helper
function that returns without the lock held on failure anyway, this is
particularly easy and straightforward.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2508ec5a5 upstream.
.. and make x86 use it.
This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.
We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.
That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.
So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.
Note: I say "common helper", but it really only handles the normal
stack-grows-down case. Which is all architectures except for PA-RISC
and IA64. So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.
It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.
But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd00dd2585 upstream.
Check the write offset end bounds before using it as the offset into the
pivot array. This avoids a possible out-of-bounds access on the pivot
array if the write extends to the last slot in the node, in which case the
node maximum should be used as the end pivot.
akpm: this doesn't affect any current callers, but new users of mapletree
may encounter this problem if backported into earlier kernels, so let's
fix it in -stable kernels in case of this.
Link: https://lkml.kernel.org/r/20230506024752.2550-1-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e38910c007 upstream.
With commit d674a8f123 ("can: isotp: isotp_sendmsg(): fix return
error on FC timeout on TX path") the missing correct return value in
the case of a protocol error was introduced.
But the way the error value has been read and sent to the user space
does not follow the common scheme to clear the error after reading
which is provided by the sock_error() function. This leads to an error
report at the following write() attempt although everything should be
working.
Fixes: d674a8f123 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path")
Reported-by: Carsten Schmidt <carsten.schmidt-achim@t-online.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230607072708.38809-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7893093a7 upstream.
TLDR: It's a mess.
When kexec() is executed on a system with offline CPUs, which are parked in
mwait_play_dead() it can end up in a triple fault during the bootup of the
kexec kernel or cause hard to diagnose data corruption.
The reason is that kexec() eventually overwrites the previous kernel's text,
page tables, data and stack. If it writes to the cache line which is
monitored by a previously offlined CPU, MWAIT resumes execution and ends
up executing the wrong text, dereferencing overwritten page tables or
corrupting the kexec kernels data.
Cure this by bringing the offlined CPUs out of MWAIT into HLT.
Write to the monitored cache line of each offline CPU, which makes MWAIT
resume execution. The written control word tells the offlined CPUs to issue
HLT, which does not have the MWAIT problem.
That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as
those make it come out of HLT.
A follow up change will put them into INIT, which protects at least against
NMI and SMI.
Fixes: ea53069231 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f9c9987bf5 upstream.
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an
obvious choice as all what is needed is a cache line which is not written
by other CPUs.
But there is a use case where a "dead" CPU needs to be brought out of
MWAIT: kexec().
This is required as kexec() can overwrite text, pagetables, stacks and the
monitored cacheline of the original kernel. The latter causes MWAIT to
resume execution which obviously causes havoc on the kexec kernel which
results usually in triple faults.
Use a dedicated per CPU storage to prepare for that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f5e7eb786 upstream.
Tony reported intermittent lockups on poweroff. His analysis identified the
wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that
on SME enabled machines a kexec() does not leave any stale data in the
caches when switching from encrypted to non-encrypted mode or vice versa.
That wbinvd() is conditional on the SME feature bit which is read directly
from CPUID. But that readout does not check whether the CPUID leaf is
available or not. If it's not available the CPU will return the value of
the highest supported leaf instead. Depending on the content the "SME" bit
might be set or not.
That's incorrect but harmless. Making the CPUID readout conditional makes
the observed hangs go away, but it does not fix the underlying problem:
CPU0 CPU1
stop_other_cpus()
send_IPIs(REBOOT); stop_this_cpu()
while (num_online_cpus() > 1); set_online(false);
proceed... -> hang
wbinvd()
WBINVD is an expensive operation and if multiple CPUs issue it at the same
time the resulting delays are even larger.
But CPU0 already observed num_online_cpus() going down to 1 and proceeds
which causes the system to hang.
This issue exists independent of WBINVD, but the delays caused by WBINVD
make it more prominent.
Make this more robust by adding a cpumask which is initialized to the
online CPU mask before sending the IPIs and CPUs clear their bit in
stop_this_cpu() after the WBINVD completed. Check for that cpumask to
become empty in stop_other_cpus() instead of watching num_online_cpus().
The cpumask cannot plug all holes either, but it's better than a raw
counter and allows to restrict the NMI fallback IPI to be sent only the
CPUs which have not reported within the timeout window.
Fixes: 08f253ec37 ("x86/cpu: Clear SME feature flag when not in use")
Reported-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com
Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
description:Create a report to help us fix your issue
body:
- type:markdown
attributes:
value:|
**Is this the right place for my bug report?**
This repository contains the Linux kernel used on the Raspberry Pi.
If you believe that the issue you are seeing is kernel-related, this is the right place.
If not, we have other repositories for the GPU firmware at [github.com/raspberrypi/firmware](https://github.com/raspberrypi/firmware) and Raspberry Pi userland applications at [github.com/raspberrypi/userland](https://github.com/raspberrypi/userland).
If you have problems with the Raspbian distribution packages, report them in the [github.com/RPi-Distro/repo](https://github.com/RPi-Distro/repo).
If you simply have a question, then [the Raspberry Pi forums](https://www.raspberrypi.org/forums) are the best place to ask it.
- type:textarea
id:description
attributes:
label:Describe the bug
description:|
Add a clear and concise description of what you think the bug is.
validations:
required:true
- type:textarea
id:reproduce
attributes:
label:Steps to reproduce the behaviour
description:|
List the steps required to reproduce the issue.
validations:
required:true
- type:dropdown
id:model
attributes:
label:Device (s)
description:Onwhich device you are facing the bug?
multiple:true
options:
- Raspberry Pi Zero
- Raspberry Pi Zero W/WH
- Raspberry Pi Zero 2 W
- Raspberry Pi 1 Mod. A
- Raspberry Pi 1 Mod. A+
- Raspberry Pi 1 Mod. B
- Raspberry Pi 1 Mod. B+
- Raspberry Pi 2 Mod. B
- Raspberry Pi 2 Mod. B v1.2
- Raspberry Pi 3 Mod. A+
- Raspberry Pi 3 Mod. B
- Raspberry Pi 3 Mod. B+
- Raspberry Pi 4 Mod. B
- Raspberry Pi 400
- Raspberry Pi CM1
- Raspberry Pi CM3
- Raspberry Pi CM3 Lite
- Raspberry Pi CM3+
- Raspberry Pi CM3+ Lite
- Raspberry Pi CM4
- Raspberry Pi CM4 Lite
- Other
validations:
required:true
- type:textarea
id:system
attributes:
label:System
description:|
Copy and paste the results of the raspinfo command in to this section.
Alternatively, copy and paste a pastebin link, or add answers to the following questions:
* Which OS and version (`cat /etc/rpi-issue`)?
* Which firmware version (`vcgencmd version`)?
* Which kernel version (`uname -a`)?
validations:
required:true
- type:textarea
id:logs
attributes:
label:Logs
description:|
If applicable, add the relevant output from `dmesg` or similar.
about:"Please do not use GitHub for asking questions. If you simply have a question, then the Raspberry Pi forums are the best place to ask it. Thanks in advance for helping us keep the issue tracker clean!"
- name:"⛔ Problems with the Raspbian distribution packages"
url:https://github.com/RPi-Distro/repo
about:"If you have problems with the Raspbian distribution packages, please report them in the github.com/RPi-Distro/repo."
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.