Compare commits

..

367 Commits

Author SHA1 Message Date
Hui Wang
66d86f0e76 vc_sm: Let it support to build in the non-src folder
If we build the kernel with "-O=$non-src-folder", this driver will
introdcue a building error because of the header's location.

Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:15 +01:00
Hui Wang
ded9db62f4 rtl8192cu: Let it support to build in the non-src folder
If we build the kernel with "-O=$non-src-folder", this driver will
introdcue a building error because of the header's location.

Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:15 +01:00
Phil Elwell
618efe6f07 drm/vc4: Prevent load tracking from breaking FKMS
Firmware KMS uses a mixture of VC4 processing and dedicated code. The
load tracking support in VC4 assumes it is dealing with vc4_plane_state
objects when up-casting with container_of, but FKMS uses unadorned
drm_plane_state structures causing the VC4 code to read off the end
into random portions of memory. Work around the problem in a minimally-
invasive way by over-allocating the FKMS plane state structures to be
large enough to contain a vc4_plane_state, filling the remainder with
zeroes.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:15 +01:00
Phil Elwell
38f2d0d117 Revert "drm/vc4: Disable load tracking by default"
This reverts commit 124fba550eeed6ef766e65759a6d8dfdc436d68e.
2019-09-17 11:20:15 +01:00
Phil Elwell
fa6f425274 Revert "configs: Nobble I2S sound cards for now due to modern dai_link api breakage"
This reverts commit f574dc066e.
2019-09-17 11:20:15 +01:00
Hui Wang
ad021fc01f ASoC: pisound: fix the parameter for spi_device_match
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:15 +01:00
Hui Wang
a744810f7e ASoC: allo-boss-dac: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:15 +01:00
Hui Wang
62c935517d ASoC: allo-piano-dac-plus: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
f436efe19c ASoC: allo-piano-dac: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
869a5f1c4c ASoC: audioinjector-octo-soundcard: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
a2da654b12 ASoC: audioinjector-pi-soundcard: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
4d25ac4149 ASoC: audiosense-pi: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
ef0fea4e31 ASoC: digidac1-soundcard: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
4fd717ceb0 ASoC: dionaudio_loco-v2: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
0d60b662c8 ASoC: dionaudio_loco: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
88b2c146db ASoC: fe-pi-audio: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
7ce33e9dd9 ASoC: hifiberry_dacplus: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:14 +01:00
Hui Wang
64b06d8f64 ASoC: i-sabre-q2m: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:13 +01:00
Hui Wang
d608e4d709 ASoC: iqaudio-codec: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:13 +01:00
Hui Wang
9c5f0eb104 ASoC: pisound: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:13 +01:00
Hui Wang
0643358c2f ASoC: rpi-proto: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:13 +01:00
Hui Wang
247f7797c1 ASoC: rpi-simple-soundcard: use modern dai_link style
Signed-off-by: Hui Wang <hui.wang@canonical.com>
2019-09-17 11:20:13 +01:00
Phil Elwell
c8588a0e4b net: bcmgenet: Workaround for Pi 4B network issue
Some combinations of Pi 4Bs and Ethernet switches don't reliably get a
DCHP-assigned IP address, leaving the unit with a self=assigned 169.254
address.

Forcing renegotiation has been found to be an effective workaround, so
add an automatic renegotiation after the link comes up for the first
time; enable it with genet.force_reneg=y - by default it is disabled.

See: https://github.com/raspberrypi/linux/issues/3108

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:13 +01:00
Phil Elwell
ae7c6e7b2d drm/vc4: Disable load tracking by default
The load tracking support appears to be broken which is causing
updates to be rejected (and duplicate mouse pointers!), so disable
it for now.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:13 +01:00
Phil Elwell
a336c4fa74 bcm2835_mmc: Remove vestigial threaded IRQ
With SDIO processing now managed by the MMC framework with a
workqueue, the bcm2835_mmc driver no longer needs a threaded
IRQ.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:13 +01:00
Matthias Reichl
ff803f1cca clk: clk-hifiberry-dacpro: fix kconfig
Make the clock driver and independent configuration option
and change the HifiBerry DAC+ and HifiBerry DAC+ADC kconfig
to select it.

This allows building only the Hifiberry DAC+ADC driver.

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:20:13 +01:00
Matthias Reichl
561c35285f ASoC: hifiberry_dacplusadc: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:20:13 +01:00
Matthias Reichl
a031353aa6 ASoC: hifiberry_dacplusadc: fix DAI link setup
The driver only defines a single DAI link and the code that tries
to setup the second (non-existent) DAI link looks wrong - using dmic
as a CPU/platform driver doesn't make any sense.

The DT overlay doesn't define a dmic property, so the code was never
executed (otherwise it would have resulted in a memory corruption).

So drop the offending code to prevent issues if a dmic property
should be added to the DT overlay.

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:20:12 +01:00
Matthias Reichl
b65d5c3a4d ASoC: rpi-wm8804-soundcard: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:20:12 +01:00
Matthias Reichl
338626941f ASoC: justboom-dac: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:20:12 +01:00
Phil Elwell
67c4745f8f pcie-brcmstb: Don't set DMA ops for root complex
A change to arm_get_dma_map_ops has stopped get_dma_ops from working
on the root complex, causing an error to be logged. However, there is
no need to override the DMA ops in that case, so skip it and
eliminate the error message.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:12 +01:00
Matthias Reichl
4f6f2754a1 ASoC: iqaudio-dac: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:20:12 +01:00
Matthias Reichl
3a3f99def6 ASoC: rpi-cirrus: use modern dai_link style
Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:20:12 +01:00
popcornmix
07aa13c6f0 configs: Nobble I2S sound cards for now due to modern dai_link api breakage 2019-09-17 11:20:12 +01:00
popcornmix
390d5a6a3c clk-bcm2835: Avoid null pointer exception
clk_desc_array[BCM2835_PLLB] doesn't exist so we dereference null when iterating
2019-09-17 11:20:12 +01:00
popcornmix
5533659af3 bcm2835_mmc: fixup 2019-09-17 11:20:12 +01:00
popcornmix
497803890b bcm2835-cpufreq: fixup 2019-09-17 11:20:12 +01:00
Phil Elwell
845d0322ef bcm2835-dma: Add proper 40-bit DMA support
The 40-bit additions are not fully tested, but it should be
capable of supporting both 40-bit memcpy on BCM2711 and regular
Lite channels on BCM2835.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:12 +01:00
Marek Behún
aa393f3dee staging: vc04_services: fix compiling in separate directory
The vc04_services Makefiles do not respect the O=path argument
correctly: include paths in CFLAGS are given relatively to object path,
not source path. Compiling in a separate directory yields #include
errors.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
2019-09-17 11:20:11 +01:00
Andrei Gherzan
f2ee4f0fd5 arm64/mm: Limit the DMA zone for arm64
On RaspberryPi, only the first 1Gb can be used for DMA[1].

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2019-July/665986.html

Signed-off-by: Andrei Gherzan <andrei@balena.io>
2019-09-17 11:20:11 +01:00
Phil Elwell
a01d088483 i2c: bcm2835: Set clock-stretch timeout to 35ms
The BCM2835 I2C blocks have a register to set the clock-stretch
timeout - how long the device is allowed to hold SCL low - in bus
cycles. The current driver doesn't write to the register, therefore
the default value of 64 cycles is being used for all devices.

Set the timeout to the value recommended for SMBus - 35ms.

See: https://github.com/raspberrypi/linux/issues/3064

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Jonathan Bell
043e900fdb xhci: add quirk for host controllers that don't update endpoint DCS
Seen on a VLI VL805 PCIe to USB controller. For non-stream endpoints
at least, if the xHC halts on a particular TRB due to an error then
the DCS field in the Out Endpoint Context maintained by the hardware
is not updated with the current cycle state.

Using the quirk XHCI_EP_CTX_BROKEN_DCS and instead fetch the DCS bit
from the TRB that the xHC stopped on.

See: https://github.com/raspberrypi/linux/issues/3060

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Phil Elwell
5a302425d4 tty: amba-pl011: Make TX optimisation conditional
pl011_tx_chars takes a "from_irq" parameter to reduce the number of
register accesses. When from_irq is true the function assumes that the
FIFO is half empty and writes up to half a FIFO's worth of bytes
without polling the FIFO status register, the reasoning being that
the function is being called as a result of the TX interrupt being
raised. This logic would work were it not for the fact that
pl011_rx_chars, called from pl011_int before pl011_tx_chars, releases
the spinlock before calling tty_flip_buffer_push.

A user thread writing to the UART claims the spinlock and ultimately
calls pl011_tx_chars with from_irq set to false. This reverts to the
older logic that polls the FIFO status register before sending every
byte. If this happen on an SMP system during the section of the IRQ
handler where the spinlock has been released, then by the time the TX
interrupt handler is called, the FIFO may already be full, and any
further writes are likely to be lost.

The fix involves adding a per-port flag that is true iff running from
within the interrupt handler and the spinlock has not yet been released.
This flag is then used as the value for the from_irq parameter of
pl011_tx_chars, causing polling to be used in the unsafe case.

Fixes: 1e84d22322 ("serial/amba-pl011: Refactor and simplify TX FIFO handling")

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Dave Stevenson
343c1bd880 staging: vc-sm-cma: Fix the few remaining coding style issues
Fix a few minor checkpatch complaints to make the driver clean

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Dave Stevenson
e91510e492 staging: vcsm-cma: Rework to use dma APIs, not CMA
Due to a misunderstanding of the DMA mapping APIs, I made
the wrong decision on how to implement this.

Rework to use dma_alloc_coherent instead of the CMA
API. This also allows it to be built as a module easily.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Dave Stevenson
48e37e9927 staging: vcsm-cma: Remove cache manipulation ioctl from ARM64
The cache flushing ioctls are used by the Pi3 HEVC hw-assisted
decoder as it needs finer grained flushing control than dma_ops
allow.
These cache calls are not present for ARM64, therefore disable
them. We are not actively supporting 64bit kernels at present,
and the use case of the HEVC decoder is fairly limited.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Dave Stevenson
63ab0a41f9 drm/vc4: Add support for color encoding on YUV planes
Adds signalling for BT601/709/2020, and limited/full range
(on BT601).

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Chris Miller
3785873ee2 drm: vc4_dsi: Fix DMA channel and memory leak in vc4 (#3012)
Signed-off-by: Chris G Miller <chris@creative-electronics.net>
2019-09-17 11:20:11 +01:00
Phil Elwell
da5e9d74f4 drm/vc4: Ignore HVS unless initialised
An upstream commit to report HVS underruns causes VC4 in firmware KMS
mode to cross into the HVS side, where it crashes due to a NULL hvs
pointer.

Make the underrun masking conditional on the hvs pointer being
initialised.

Fixes: 531a1b622d ("drm/vc4: Report HVS underrun errors")

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Dave Stevenson
28365453f2 drm/vc4: Limit fkms to modes <= 85Hz
Selecting 1080p100 and 120 has very limited gain, but don't want
to block VGA85 and similar.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:11 +01:00
Dave Stevenson
0ddf0d6683 drm/vc4: In FKMS look at the modifiers correctly for SAND
Incorrect masking was used in the switch for the modifier,
therefore for SAND (which puts the column pitch in the
modifier) it didn't match.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
0540b8343e drm: vc4: Add status of which display is updated through vblank
Previously multiple  displays were slaved off the same SMI
interrupt, triggered by HVS channel 1 (HDMI0).
This doesn't work if you only have a DPI or DSI screen (HVS channel
0), and gives slightly erroneous results with dual HDMI as the
events for HDMI1 are incorrect.

Use SMIDSW0 and SMIDSW1 registers to denote which display has
triggered the vblank.
Handling should be backwards compatible with older firmware.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
61678e5095 drm/vc4: Remove 340MHz clock limit from FKMS now scrambling issues resolved
Firmware TMDS scrambling is now being correctly configured, so
we can use it.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
cc60ff4e84 drm/vc4: Fix T-format modifiers in FKMS.
The wrong vc_image formats were being checked for in the switch
statement. Correct these.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
211de22f65 drm/vc4: Max resolution of 7680 is conditional on being Pi4
The max resolution had been increased from 2048 to 7680 for all
platforms. This code is common with Pi0-3 which have a max render
target for GL of 2048, therefore the increased resolution has to
be conditional on the platform.
Switch based on whether the bcm2835-v3d node is found, as that is
not present on Pi4. (There is a potential configuration on Pi0-3
with no v3d, but this is very unlikely).

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
eadec28cce drm/vc4: fkms to query the VPU for HDMI clock limits
The VPU has configured clocks for 4k (or not) via config.txt,
and will limit the choice of video modes based on that.
Make fkms query it for these limits too to avoid selecting modes
that can not be handled by the current clock setup.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
e739ec2598 drm/vc4: Correct SAND support for FKMS.
It was accepting NV21 which doesn't map through, but
also wasn't advertising the modifier so nothing would know
to request it.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
4811319abe drm: vc4: Fixup typo when setting HDMI aspect ratio
Assignment was to the wrong structure.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
75c64a2d80 drm/vc4: Support the VEC in FKMS
Extends the DPI/DSI support to also report the VEC output
which supports interlacing too.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Jonathan Bell
8f5e1acf41 drm: vc4: handle the case where there are no available displays
It's reasonable for the firmware to return zero as the number of
attached displays. Handle this case as otherwise drm thinks that
the DSI panel is attached, which is nonsense.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
febbddb9d1 drm: vc4: Probe DPI/DSI timings from the firmware
For DPI and DSI displays query the firmware as to the configuration
and add it as the only mode for DRM.

In theory we can add plumbing for setting the DPI/DSI mode from
KMS, but this is not being added at present as the support frameworks
aren't present in the firmware.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
b49ab1ff08 drm: vc4-firmware-kms: Fix DSI display support
The mode was incorrectly listed as interlaced, which was then
rejected.
Correct this and FKMS works with the DSI display.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:10 +01:00
Dave Stevenson
5ec20dd570 drm: vc4: Log flags in fkms mode set
The flags contain info such as limited/full range RGB, aspect
ratio, and a fwe other useful things.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
f4acb58b8c drm: vc4-firmware-kms: Remove incorrect overscan support.
The overscan support was required for the old mailbox API
in order to match up the cursor and frame buffer planes.
With the newer API directly talking to dispmanx there is no
difference, therefore FKMS does not need to make any
adjustments.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
307eb6fb52 drm: vc4: FKMS reads the EDID from fw, and supports mode setting
This extends FKMS to read the EDID from the display, and support
requesting a particular mode via KMS.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
099bbf0362 drm: vc4: Increase max_width/height to 7680.
There are some limits still being investigated that stop
us going up to 8192, but 7680 is sufficient for dual 4k
displays.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
eee1dc05dd drm: vc4: Bring fkms into line with kms in blocking doublescan modes
Implement vc4_crtc_mode_valid so that it blocks doublescan modes

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
a689afa501 drm: vc4: Iterate over all planes in vc4_crtc_[dis|en]able
Fixes a FIXME where the overlay plane wouldn't be restored.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
f788940a5b drm: vc4: Remove unused vc4_fkms_cancel_page_flip function
"32a3dbe drm/vc4: Nuke preclose hook" removed vc4_cancel_page_flip,
but vc4_fkms_cancel_page_flip was still be added to with the
fkms driver, even though it was never called.
Nuke it too.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
3b1745681b drm: vc4: Add support for H & V flips on each plane for FKMS
They are near zero cost options for the HVS, therefore they
may as well be implemented, and it allows us to invert the
DSI display.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
e8356982f3 drm: vc4: Need to call drm_crtc_vblank_[on|off] from vc4_crtc_[en|dis]able
vblank needs to be enabled and disabled by the driver to avoid the
DRM framework complaining in the kernel log.

vc4_fkms_disable_vblank needs to signal that we don't want vblank
callbacks too.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
5cb569dc91 drm/vc4: Set the display number when querying the display resolution
Without this the two displays got set to the same resolution.
(Requires a firmware bug fix to work).

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
346a76050a drm: vc4: Query the display ID for each display in FKMS
Replace the hard coded list of display IDs for a mailbox call
that returns the display ID for each display that has been
detected.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
51734288cc drm: vc4: Remove now unused structure.
Cleaning up structure that was unused after
fbb59a2 drm: vc4: Add an overlay plane to vc4-firmware-kms

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:09 +01:00
Dave Stevenson
7878e7115c drm: vc4: Select display to blank during initialisation
Otherwise the rainbow splash screen remained in the display list

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:08 +01:00
Dave Stevenson
66b56be0b3 drm: vc4: Fix build warning
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:08 +01:00
Dave Stevenson
0e2dfc4f87 drm: vc4: Add support for multiple displays to fkms
There is a slightly nasty hack in that all crtcs share the
same SMI interrupt from the firmware. This seems to currently
work well enough, but ought to be fixed at a later date.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:08 +01:00
Dave Stevenson
078a9a1b54 drm: vc4: Increase max screen size to 4096x4096.
We now should support 4k screens, therefore this limit needs to
be increased.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:08 +01:00
Dave Stevenson
e9cdfef029 drm: vc4: Add an overlay plane to vc4-firmware-kms
This uses a new API that is exposed via the mailbox service
to stick an element straight on the screen using DispmanX.

The primary and cursor planes have also been switched to using
the new plane API, and it supports layering based on the DRM
zpos parameter.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:08 +01:00
Dave Stevenson
c7f54ad58e gpu: vc4-fkms: Switch to the newer mailbox frame buffer API.
The old mailbox FB API was ideally deprecated but still used by
the FKMS driver.
Update to the newer API.

NB This needs current firmware that accepts ARM allocated buffers
through the newer API.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:08 +01:00
Eric Anholt
d01d5df752 drm/vc4: Fix vblank timestamping for firmwarekms.
The core doesn't expect a false return from the scanoutpos function in
normal usage, so we were doing the precise vblank timestamping path
and thus "immediate" vblank disables (even though firmwarekms can't
actually disable vblanks interrupts, sigh), and the kernel would get
confused when getting timestamp info when also turning vblanks back
on.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:20:08 +01:00
Eric Anholt
bd70099ca3 drm/vc4: Expose the format modifiers for firmware kms.
This should technically not expose VC4_T_TILED on pi4.  However, if we
don't expose anything, then userspace will assume that display can
handle whatever modifiers 3d can do (UIF on 2711).  By exposing a
list, that will get intersected with what 3D can do so that we get T
tiling for display on 2710 and linear on 2711.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:20:08 +01:00
Eric Anholt
6a87f0b017 drm/vc4: Fix synchronization firmwarekms against GL rendering.
We would present the framebuffer immediately without waiting for
rendering to finish first, resulting in stuttering and flickering as a
window was dragged around when the GPU was busy enough to not just win
the race.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:20:08 +01:00
Eric Anholt
e6d373bb54 drm/v3d: Hook up the runtime PM ops.
In translating the runtime PM code from vc4, I missed the ".pm"
assignment to actually connect them up.  Fixes missing MMU setup if
runtime PM resets V3D.

Signed-off-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit ca197699af29baa8236c74c53d4904ca8957ee06)
2019-09-17 11:20:08 +01:00
Eric Anholt
067e1d0aff drm/v3d: Skip MMU flush if the device is currently off.
If it's off, we know it will be reset on poweron, so the MMU won't
have any TLB cached from before this point.  Avoids failed waits for
MMU flush to reply.

Signed-off-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 3ee4e2e0a9e9587eacbb69b067bbc72ab2cdc47b)
2019-09-17 11:20:08 +01:00
Eric Anholt
a772e9dac4 drm/v3d: Add support for 2711.
Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:20:08 +01:00
Eric Anholt
e6064eee5a drm/vc4: Fix oops at boot with firmwarekms on 4.19.
Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:20:07 +01:00
Phil Elwell
7ec36c1f84 arm: bcm2835: Add bcm2838 compatible string.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:07 +01:00
Jonathan Bell
0347d40d47 usbhid: call usb_fixup_endpoint after mangling intervals
Lets the mousepoll override mechanism work with xhci.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:07 +01:00
Jonathan Bell
8facff2b38 xhci: implement xhci_fixup_endpoint for interval adjustments
Must be called in a non-atomic context, after the endpoint
has been registered with the hardware via xhci_add_endpoint
and before the first URB is submitted for the endpoint.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:07 +01:00
Jonathan Bell
222cd59552 usb: add plumbing for updating interrupt endpoint interval state
xHCI caches device and endpoint data after the interface is configured,
so an explicit command needs to be issued for any device driver wanting
to alter the polling interval of an endpoint.

Add usb_fixup_endpoint() to allow drivers to do this. The fixup must be
called after calculating endpoint bandwidth requirements but before any
URBs are submitted.

If polling intervals are shortened, any bandwidth reservations are no
longer valid but in practice polling intervals are only ever relaxed.

Limit the scope to interrupt transfers for now.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:07 +01:00
Stefan Wahren
b8bf68251a HACK: clk-bcm2835: Add BCM2838_CLOCK_EMMC2 support
The new BCM2838 supports an additional emmc2 clock. So add a new
compatible to register this clock only for BCM2838.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
2019-09-17 11:20:07 +01:00
Eric Anholt
34027be5a1 clk: bcm2835: Allow reparenting leaf clocks while they're running.
This falls under the same "we can reprogram glitch-free as long as we
pause generation" rule as updating the div/frac fields.  This can be
used for runtime reclocking of V3D to manage power leakage.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:20:07 +01:00
Eric Anholt
47ad87c736 clk: bcm2835: Add support for setting leaf clock rates while running.
As long as you wait for !BUSY, you can do glitch-free updates of clock
rate while the clock is running.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:20:07 +01:00
Phil Elwell
b55ec03bef config: Permit LPAE and PCIE_BRCMSTB on BCM2835 2019-09-17 11:20:07 +01:00
James Hughes
1ff942f696 Pulled in the multi frame buffer support from the Pi3 repo 2019-09-17 11:20:07 +01:00
Dave Stevenson
9d17706df6 staging: vcsm-cma: Fixup the alloc code handling of kernel_id
The allocation code had been copied in from an old branch prior
to having added the IDR for 64bit support. It was therefore pushing
a pointer into the kernel_id field instead of an IDR handle, the
lookup therefore failed, and we never released the buffer.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:07 +01:00
Dave Stevenson
7d70fd8ecc staging: vcsm-cma: Drop logging level on messages in vc_sm_release_resource
They weren't errors but were logged as such.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:07 +01:00
Dave Stevenson
7d7d7bcb49 staging: vcsm-cma: Alter dev node permissions to 0666
Until the udev rules are updated, open up access to this node by
default.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Dave Stevenson
61f54bef7b staging: vcsm-cma: Add cache control ioctls
The old driver allowed for direct cache manipulation and that
was used by various clients. Replicate here.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Dave Stevenson
a24a46689c staging: vc-sm-cma: Add in userspace allocation API
Replacing the functionality from the older vc-sm driver,
add in a userspace API that allows allocation of buffers,
and importing of dma-bufs.
The driver hands out dma-buf fds, therefore much of the
handling around lifespan and odd mmaps from the old driver
goes away.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Dave Stevenson
d88b49ef7c staging: vc-sm-cma: Update TODO.
The driver is already a platform driver, so that can be
deleted from the TODO.
There are no known issues that need to be resolved.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Dave Stevenson
09d575425c staging: vc-sm-cma: Add in allocation for VPU requests.
Module has to change from tristate to bool as all CMA functions
are boolean.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Dave Stevenson
44e0fdf041 staging: vc-sm-cma: Remove obsolete comment and make function static
Removes obsolete comment about wanting to pass a function
pointer into mmal-vchiq as we now do.
As the function is passed as a function pointer, the function itself
can be static.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Eric Anholt
d374e5f1bc soc: bcm: bcm2835-pm: Add support for 2711.
Without the actual power management part any more, there's a lot less
to set up for V3D.  We just need to clear the RSTN field for the power
domain, and expose the reset controller for toggling it again.

This is definitely incomplete -- the old ISP and H264 is in the old
bridge, but since we have no consumers of it I've just done the
minimum to get V3D working.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:20:06 +01:00
Phil Elwell
7734f96b94 clk-bcm2835: Don't wait for pllh lock
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Jonathan Bell
ccc3318ad6 drivers: char: add chardev for mmap'ing Argon control registers
Based on the gpiomem driver, allow mapping of the decoder register
spaces such that userspace can access control/status registers.
This driver is intended for use with a custom ffmpeg backend accelerator
prior to a v4l2 driver being written.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Martin Sperl
4a7c1b19a4 spi: bcm2835: enable shared interrupt support
Add shared interrupt support for this driver.

Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
2019-09-17 11:20:06 +01:00
Tim Gover
6f37cd65db pinctrl-bcm2835: Add support for BCM2838
GPIO configuration on BCM2838 is largely the same as BCM2835 except for
the pull up/down configuration. The old mechanism has been replaced
by new registers which don't require the fixed delay.

Detect BCN2838 at run-time and use the new mechanism. Backwards
compatibility for the device-tree configuration has been retained.
2019-09-17 11:20:06 +01:00
Phil Elwell
c864b9ac47 usb: xhci: Show that the VIA VL805 supports LPM
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:06 +01:00
Tim Gover
8334a2e05c usb: xhci: Disable the XHCI 5 second timeout
If the VL805 EEPROM has not been programmed then boot will hang for five
seconds. The timeout seems to be arbitrary and is an unecessary
delay on the first boot. Remove the timeout.

This is common code and probably can't be upstreamed unless the timeout
can be configurable somehow or perhaps the XHCI driver can be skipped
on the first boot.
2019-09-17 11:20:06 +01:00
Jonathan Bell
4b5f41ab6e phy: bcm54213pe: configure the LED outputs to be more user-friendly
The default state was both LEDs indicating link speed.

Change the default configuration to
- Amber: 1000/100 link speed indication
- Green: link present + activity indication

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:05 +01:00
Jonathan Bell
dead1f7b1d phy: broadcom: split out the BCM54213PE from the BCM54210E IDs
The last nibble is a revision ID, and the 54213pe is a later rev
than the 54210e. Running the 54210e setup code on a 54213pe results
in a broken RGMII interface.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:05 +01:00
Jonathan Bell
e78d71574b net: genet: enable link energy detect powerdown for external PHYs
There are several warts surrounding bcmgenet_mii_probe() as this
function is called from ndo_open, but it's calling registration-type
functions. The probe should be called at probe time and refactored
such that the PHY device data can be extracted to limit the scope
of this flag to Broadcom PHYs.

For now, pass this flag in as it puts our attached PHY into a low-power
state when disconnected.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
2019-09-17 11:20:05 +01:00
Phil Elwell
b60f233a48 bcmgenet: Better coalescing parameter defaults
Set defaults for TX and RX packet coalescing to be equivalent to:

  # ethtool -C eth0 tx-frames 10
  # ethtool -C eth0 rx-usecs 50

This may be something we want to set via DT parameters in the
future.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:05 +01:00
Jonathan Bell
c5b2a9b52c bcmgenet: constrain max DMA burst length 2019-09-17 11:20:05 +01:00
popcornmix
c4253739cd bcm2835-pcm.c: Support multichannel audio 2019-09-17 11:20:05 +01:00
Phil Elwell
bc771e77c6 vchiq: Add 36-bit address support
Conditional on a new compatible string, change the pagelist encoding
such that the top 24 bits are the pfn, leaving 8 bits for run length
(-1).

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:05 +01:00
Stefan Wahren
85c850dbfb thermal: brcmstb_thermal: Add BCM2838 support
The BCM2838 has an AVS TMON hardware block. This adds the necessary
support to the brcmstb_thermal driver ( no trip handling ).

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
2019-09-17 11:20:05 +01:00
Stefan Wahren
780ebe5641 hwrng: iproc-rng200: Add BCM2838 support
The HWRNG on the BCM2838 is compatible to iproc-rng200, so add the
support to this driver instead of bcm2835-rng.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
2019-09-17 11:20:05 +01:00
Stefan Wahren
1aa159351e mmc: sdhci-iproc: Add support for emmc2 of the BCM2838
The emmc2 interface of the BCM2838 should be integrated in sdhci-iproc
to avoid code redundancy. Except 32 bit only access no other quirks are
known yet. Add an additional compatible string for upstream proposal.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
2019-09-17 11:20:05 +01:00
Phil Elwell
cc979fdcae mmc: sdhci: Mask "spurious" interrupts
Add a filter for "spurious" Transfer Complete interrupts, attempting
to make it as specific as possible:
* INT_DATA_END (transfer complete) is set
* There is a stop command in progress
* There is no data transfer in progress

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:05 +01:00
Phil Elwell
8d48dfa964 mmc: bcm2835-sdhost: Support 64-bit physical addresses
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:04 +01:00
Phil Elwell
98437a9a04 arm: bcm2835: DMA can only address 1GB
The legacy peripherals can only address the first gigabyte of RAM, so
ensure that DMA allocations are restricted to that region.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:04 +01:00
Phil Elwell
a75622fa27 pcie-brcmstb: Changes for BCM2711
The initial brcmstb PCIe driver - originally taken from the V3(?)
patch set - has been modified significantly for the BCM2711.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:04 +01:00
Jim Quinlan
d852e15503 dt-bindings: pci: Add DT docs for Brcmstb PCIe device
The DT bindings description of the Brcmstb PCIe device is described.  This
node can be used by almost all Broadcom settop box chips, using
ARM, ARM64, or MIPS CPU architectures.

Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
2019-09-17 11:20:04 +01:00
Phil Elwell
2c7a440999 PCI: brcmstb: Add MSI capability
This commit adds MSI to the Broadcom STB PCIe host controller. It does
not add MSIX since that functionality is not in the HW.  The MSI
controller is physically located within the PCIe block, however, there
is no reason why the MSI controller could not be moved elsewhere in
the future.

Since the internal Brcmstb MSI controller is intertwined with the PCIe
controller, it is not its own platform device but rather part of the
PCIe platform device.

Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
2019-09-17 11:20:04 +01:00
Phil Elwell
9f7d09014d PCI: brcmstb: Add dma-range mapping for inbound traffic
The Broadcom STB PCIe host controller is intimately related to the
memory subsystem.  This close relationship adds complexity to how cpu
system memory is mapped to PCIe memory.  Ideally, this mapping is an
identity mapping, or an identity mapping off by a constant.  Not so in
this case.

Consider the Broadcom reference board BCM97445LCC_4X8 which has 6 GB
of system memory.  Here is how the PCIe controller maps the
system memory to PCIe memory:

  memc0-a@[        0....3fffffff] <=> pci@[        0....3fffffff]
  memc0-b@[100000000...13fffffff] <=> pci@[ 40000000....7fffffff]
  memc1-a@[ 40000000....7fffffff] <=> pci@[ 80000000....bfffffff]
  memc1-b@[300000000...33fffffff] <=> pci@[ c0000000....ffffffff]
  memc2-a@[ 80000000....bfffffff] <=> pci@[100000000...13fffffff]
  memc2-b@[c00000000...c3fffffff] <=> pci@[140000000...17fffffff]

Although there are some "gaps" that can be added between the
individual mappings by software, the permutation of memory regions for
the most part is fixed by HW.  The solution of having something close
to an identity mapping is not possible.

The idea behind this HW design is that the same PCIe module can
act as an RC or EP, and if it acts as an EP it concatenates all
of system memory into a BAR so anything can be accessed.  Unfortunately,
when the PCIe block is in the role of an RC it also presents this
"BAR" to downstream PCIe devices, rather than offering an identity map
between its system memory and PCIe space.

Suppose that an endpoint driver allocs some DMA memory.  Suppose this
memory is located at 0x6000_0000, which is in the middle of memc1-a.
The driver wants a dma_addr_t value that it can pass on to the EP to
use.  Without doing any custom mapping, the EP will use this value for
DMA: the driver will get a dma_addr_t equal to 0x6000_0000.  But this
won't work; the device needs a dma_addr_t that reflects the PCIe space
address, namely 0xa000_0000.

So, essentially the solution to this problem must modify the
dma_addr_t returned by the DMA routines routines.  There are two
ways (I know of) of doing this:

(a) overriding/redefining the dma_to_phys() and phys_to_dma() calls
that are used by the dma_ops routines.  This is the approach of

	arch/mips/cavium-octeon/dma-octeon.c

In ARM and ARM64 these two routines are defiend in asm/dma-mapping.h
as static inline functions.

(b) Subscribe to a notifier that notifies when a device is added to a
bus.  When this happens, set_dma_ops() can be called for the device.
This method is mentioned in:

    http://lxr.free-electrons.com/source/drivers/of/platform.c?v=3.16#L152

where it says as a comment

    "In case if platform code need to use own special DMA
    configuration, it can use Platform bus notifier and
    handle BUS_NOTIFY_ADD_DEVICE event to fix up DMA
    configuration."

Solution (b) is what this commit does.  It uses its own set of
dma_ops which are wrappers around the arch_dma_ops.  The
wrappers translate the dma addresses before/after invoking
the arch_dma_ops, as appropriate.

Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
2019-09-17 11:20:04 +01:00
Phil Elwell
aef7cf9ea4 PCI: brcmstb: Add Broadcom STB PCIe host controller driver
This commit adds the basic Broadcom STB PCIe controller.  Missing is
the ability to process MSI and also handle dma-ranges for inbound
memory accesses.  These two functionalities are added in subsequent
commits.

The PCIe block contains an MDIO interface.  This is a local interface
only accessible by the PCIe controller.  It cannot be used or shared
by any other HW.  As such, the small amount of code for this
controller is included in this driver as there is little upside to put
it elsewhere.

Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
2019-09-17 11:20:04 +01:00
Tim Gover
3293fe6db9 Fix copy_from_user if BCM2835_FAST_MEMCPY=n
The change which introduced CONFIG_BCM2835_FAST_MEMCPY unconditionally
changed the behaviour of arm_copy_from_user. The page pinning code
is not safe on ARMv7 if LPAE & high memory is enabled and causes
crashes which look like PTE corruption.

Make __copy_from_user_memcpy conditional on CONFIG_2835_FAST_MEMCPY=y
which is really an ARMv6 / Pi1 optimization and not necessary on newer
ARM processors.
2019-09-17 11:20:04 +01:00
Phil Elwell
e8b93bbd7a arm: bcm2835: Fix FIQ early ioremap
The ioremapping creates mappings within the vmalloc area. The
equivalent early function, create_mapping, now checks that the
requested explicit virtual address is between VMALLOC_START and
VMALLOC_END. As there is no reason to have any correlation between
the physical and virtual addresses, put the required mappings at
VMALLOC_START and above.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:04 +01:00
Phil Elwell
a83835496d bcm2835-sdhost: Fix DMA channel leak on error/remove
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:04 +01:00
Phil Elwell
8fc06681e3 w1: w1-gpio: Make GPIO an output for strong pullup
The logic to drive the data line high to implement a strong pullup
assumed that the pin was already an output - setting a value does
not change an input.

See: https://github.com/raspberrypi/firmware/issues/1143

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:20:04 +01:00
Dave Stevenson
e52b546fd5 staging: bcm2835-codec: Add support for setting S_PARM and G_PARM
Video encode can use the frame rate for rate control calculations,
therefore plumb it through from V4L2's [S|G]_PARM ioctl.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:04 +01:00
Dave Stevenson
acc1a57de6 staging: bcm2835-codec: Convert V4L2 nsec timestamps to MMAL usec
V4L2 uses nsecs, whilst MMAL uses usecs, but the code wasn't converting
between them. This upsets video encode rate control.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Marcel Holtmann
268014c824 Bluetooth: Check key sizes only when Secure Simple Pairing is enabled
The encryption is only mandatory to be enforced when both sides are using
Secure Simple Pairing and this means the key size check makes only sense
in that case.

On legacy Bluetooth 2.0 and earlier devices like mice the encryption was
optional and thus causing an issue if the key size check is not bound to
using Secure Simple Pairing.

Fixes: d5bb334a8e ("Bluetooth: Align minimum encryption key size for LE and BR/EDR connections")
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2019-09-17 11:20:03 +01:00
Dave Stevenson
f464a6cd07 staging: mmal-vchiq: Fix memory leak in error path
On error, vchiq_mmal_component_init could leave the
event context allocated for ports.
Clean them up in the error path.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
24caf23d4e staging: mmal-vchiq: Free the event context for control ports
vchiq_mmal_component_init calls init_event_context for the
control port, but vchiq_mmal_component_finalise didn't free
it, causing a memory leak..

Add the free call.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
9cb88f124b staging: bcm2835-codec: Remove height padding for ISP role
The ISP has no need for heights to be a multiple of macroblock
sizes, therefore doesn't require the align on the height.
Remove it for the ISP role. (It is required for the codecs).

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
85730558b7 staging: bcm2835-codec: Correct port width calc for truncation
The calculation converting from V4L2 bytesperline to MMAL
width had an operator ordering issue that lead to Bayer raw 10
(and 12 and 14) setting an incorrect stride for the buffer.
Correct this operation ordering issue.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
e6bfec5bd0 staging: bcm2835-codec: Refactor default resolution code
The default resolution code was different for each role
as compressed formats need to pass bytesperline as 0 and
set up customised buffer sizes.
This is common setup, therefore amend get_sizeimage and
get_bytesperline to do the correct thing whether compressed
or uncompressed.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
d1a96bd0bd media: bcm2835-unicam: Add support for enum framesizes and frameintervals
vidioc_enum_framesizes and vidioc_enum_frameintervals weren't implemented,
therefore clients couldn't enumerate the supported resolutions.

Implement them by forwarding on to the sensor driver.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
75702e3397 staging: bcm2835_codec: Clean up logging on unloading the driver
The log line was missing a closing \n, so wasn't added to the
log immediately.
Adds the function of the V4L2 device that is being unregistered
too.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
e0c4cf259f staging: vc-sm-cma: Ensure mutex and idr are destroyed
map_lock and kernelid_map are created in probe, but not released
in release should the vcsm service not connect (eg running the
cutdown firmware).

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
e26fa78054 staging: vc-sm-cma: Don't fail if debugfs calls fail.
Return codes from debugfs calls should never alter the
flow of the main code.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
59035556b8 staging: vc-sm-cma: Use devm_ allocs for sm_state.
Use managed allocations for sm_state, removing reliance on
manual management.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:03 +01:00
Dave Stevenson
84d46c189f staging: vc-sm-cma: Remove the debugfs directory on remove
Without removing that, reloading the driver fails.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
e1ce5fc02b staging: bcm2835-codec: NULL component handle on queue_setup failure
queue_setup tries creating the relevant MMAL component and configures
the input and output ports as we're expecting to start streaming.
If the port configuration failed then it destroyed the component,
but failed to clear the component handle, therefore release tried
destroying the component again.
Adds some logging should the port config fail as well.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
b119fd93e0 staging: vc_sm_cma: Remove erroneous misc_deregister
Code from the misc /dev node was still present in
bcm2835_vc_sm_cma_remove, which caused a NULL deref.
Remove it.

See #2885.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
b7f4b8217d staging: bcm2835_codec: Include timing info in SPS headers
Inserting timing information into the VUI block of the SPS is
optional with the VPU encoder.
GStreamer appears to require them when using V4L2 M2M, therefore
set the option to enable them from the encoder.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
888104b806 staging: mmal-vchiq: Update mmal_parameters.h with recently defined params
mmal_parameters.h hasn't been updated to reflect additions made
over the last few years. Update it to reflect the currently
supported parameters.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
3c6cc7f6e0 staging: bcm2835_codec: Fix handling of VB2_MEMORY_DMABUF buffers
If the queue is configured as VB2_MEMORY_DMABUF then vb2_core_expbuf
fails as it ensures the queue is defined as VB2_MEMORY_MMAP.

Correct the handling so that we unmap the buffer from vcsm and the
VPU on cleanup, and then correctly get the dma buf of the new buffer.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
a540003ab2 staging: bcm2835_codec: Add an option for ignoring Bayer formats.
This is a workaround for GStreamer currently not identifying Bayer
as a raw format, therefore any device that supports it does not
match the criteria for v4l2convert.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
430d73d62b staging: bcm2835_codec: Add support for the ISP as an M2M device
The MMAL ISP component can also use this same V4L2 wrapper to
provide a M2M format conversion and resizer.
Instantiate 3 V4L2 devices now, one for each of decode, encode,
and isp.
The ISP currently doesn't expose any controls via V4L2, but this
can be extended in the future.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
f8a0dca7d2 staging: bcm2835_codec: Query supported formats from the component
The driver was previously working with hard coded tables of
which video formats were supported by each component.
The components advertise this information via a MMAL parameter,
so retrieve the information from there during probe, and store
in the state structure for that device.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
ee14ce49c0 staging: mmal-vchiq: If the VPU returns an error, don't negate it
There is an enum for the errors that the VPU can return.
port_parameter_get was negating that value, but also using -EINVAL
from the Linux error codes.
Pass the VPU error code as positive values. Should the function
need to pass a Linux failure, then return that as negative.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
28eab80594 staging: mmal-vchiq: Always return the param size from param_get
mmal-vchiq is a reimplementation of the userland library for MMAL.
When getting a parameter, the client provides the storage and
the size of the storage. The VPU then returns the size of the
parameter that it wished to return, and as much as possible of
that parameter is returned to the client.

The implementation previously only returned the size provided
by the VPU should it exceed the buffer size. So for parameters
such as the supported encodings list the client had no idea
how much of the provided storage had been populated.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
1fb948a143 staging: mmal_vchiq: Add in the Bayer encoding formats
The list of formats was copied before Bayer support was added.
The ISP supports Bayer and is being supported by the bcm2835_codec
driver, so add in the encodings for them.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:02 +01:00
Dave Stevenson
2ecdad2324 staging: vc-sm-cma: Fix up for 64bit builds
There were a number of logging lines that were using
inappropriate formatting under 64bit kernels.

The kernel_id field passed to/from the VPU was being
abused for storing the struct vc_sm_buffer *.
This breaks with 64bit kernels, so change to using an IDR.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
633bf98917 staging: vc-sm-cma: Use a void* pointer as the handle within the kernel
The driver was using an unsigned int as the handle to the outside world,
and doing a nasty cast to the struct dmabuf when handed it back.
This breaks badly with a 64 bit kernel where the pointer doesn't fit
in an unsigned int.

Switch to using a void* within the kernel. Reality is that it is
a struct dma_buf*, but advertising it as such to other drivers seems
to encourage the use of it as such, and I'm not sure on the implications
of that.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
ef9b82917e staging: vc-sm-cma: Correct DMA configuration.
Now that VCHIQ is setting up the DMA configuration as our
parent device, don't try to configure it during probe.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
b120fe4fc7 staging: bcm2835-codec: Fix potentially uninitialised vars
src_m2m_buf and dst_m2m_buf were printed in log messages
when there are code paths that don't initialise them.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
a3d5b7ee32 staging: bcm2835-codec: variable vb2 may be used uninitialised
In op_buffer_cb, the failure path checked whether there was
an associated vb2 buffer before the variable vb2 had been
assigned.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
c10c137852 media:bcm2835-unicam: Power on subdev on open/release, not streaming
The driver was powering on the source subdevice as part of STREAMON,
and powering it off in STREAMOFF. This isn't so great if there is a
significant amount of setup required for your device.

Copy the approach taken in the Atmel ISC driver where s_power(1) is called
on first file handle open, and s_power(0) is called on the last release.

See https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=232437

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
5caf58c478 media: ov5647: Use gpiod_set_value_cansleep
All calls to the gpio library are in contexts that can sleep,
therefore there is no issue with having those GPIOs controlled
by controllers which require sleeping (eg I2C GPIO expanders).

Switch to using gpiod_set_value_cansleep instead of gpiod_set_value
to avoid triggering the warning in gpiolib should the GPIO
controller need to sleep.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
8c4fdd297d clk: clk-bcm2835: Use %zd when printing size_t
The debug text for how many clocks have been registered
uses "%d" with a size_t. Correct it to "%zd".

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
94133e729a char: vc_mem: Fix all coding style issues.
Cleans up all checkpatch errors in vc_mem.c and vc_mem.h
No functional change to the code.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
b4ad2de807 char: vc_mem: Fix up compat ioctls for 64bit kernel
compat_ioctl wasn't defined, so 32bit user/64bit kernel
always failed.
VC_MEM_IOC_MEM_PHYS_ADDR was defined with parameter size
unsigned long, so the ioctl cmd changes between sizes.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
39832be043 staging: mmal-vchiq: Fix client_component for 64 bit kernel
The MMAL client_component field is used with the event
mechanism to allow the client to identify the component for
which the event is generated.
The field is only 32bits in size, therefore we can't use a
pointer to the component in a 64 bit kernel.

Component handles are already held in an array per VCHI
instance, so use the array index as the client_component handle
to avoid having to create a new IDR for this purpose.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
d2f5289585 char: vcio: Fail probe if rpi_firmware is not found.
Device Tree is now the only supported config mechanism, therefore
uncomment the block of code that fails the probe if the
firmware node can't be found.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:01 +01:00
Dave Stevenson
f5c9e11f58 char: vcio: Add compat ioctl handling
There was no compat ioctl handler, so 32 bit userspace on a
64 bit kernel failed as IOCTL_MBOX_PROPERTY used the size
of char*.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
dfa374404f staging: vc04_services: Add a V4L2 M2M codec driver
This adds a V4L2 memory to memory device that wraps the MMAL
video decode and video_encode components for H264 and MJPEG encode
and decode, MPEG4, H263, and VP8 decode (and MPEG2 decode
if the appropriate licence has been purchased).

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
5df11de31d media: videobuf2: Allow exporting of a struct dmabuf
videobuf2 only allowed exporting a dmabuf as a file descriptor,
but there are instances where having the struct dma_buf is
useful within the kernel.

Split the current implementation into two, one step which
exports a struct dma_buf, and the second which converts that
into an fd.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
5e28b4f9e7 staging: vc04_services: Use vc-sm-cma to support zero copy
With the vc-sm-cma driver we can support zero copy of buffers between
the kernel and VPU. Add this support to vchiq-mmal.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
e0b687518d staging: vc04_services: Add new vc-sm-cma driver
This new driver allows contiguous memory blocks to be imported
into the VideoCore VPU memory map, and manages the lifetime of
those objects, only releasing the source dmabuf once the VPU has
confirmed it has finished with it.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
81ef7c66c3 staging: vc04_services: Fixup vchiq-mmal include ordering
There were dependencies on including the headers in the correct
order. Fix up the headers so that they include the other
headers that they depend on themselves.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
9212780222 staging: vc04_services: Support sending data to MMAL ports
Add the ability to send data to ports. This only supports
zero copy mode as the required bulk transfer setup calls
are not done.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
30bbcf9012 staging: mmal-vchiq: Add support for event callbacks.
(Preparation for the codec driver).
The codec uses the event mechanism to report things such as
resolution changes. It is signalled by the cmd field of the buffer
being non-zero.

Add support for passing this information out to the client.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
e988837704 staging: mmal-vchiq: Make a mmal_buf struct for passing parameters
The callback from vchi_mmal to the client was growing lots of extra
parameters. Consolidate them into a single struct instead of
growing the list further.
The struct is associated with the client buffer, therefore there
are various changes to setup various containers for the struct,
and pass the appropriate members.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
3cc44b7f39 staging: mmal-vchiq: Make timeout a defined parameter
The timeout period for VPU communications is a useful thing
to extend when debugging.
Set it via a define, rather than a magic number buried in the code.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
4115c74e3d staging: mmal-vchiq: Avoid use of bool in structures
Fixes up a checkpatch error "Avoid using bool structure members
because of possible alignment issues".

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
f978275c99 staging: mmal-vchiq: Allocate and free components as required
The existing code assumed that there would only ever be 4 components,
and never freed the entries once used.
Allow arbitrary creation and destruction of components.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:20:00 +01:00
Dave Stevenson
9865da7342 staging: vc04_services: Split vchiq-mmal into a module
In preparation for adding a video codec V4L2 module which also
wants to use vchiq-mmal functions, split it out into an
independent module.
The minimum number of changes have been made to achieve this
(eg straight moves where possible) so existing checkpatch
errors will still be present.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
c7744300ae staging: bcm2835-camera: Ensure timestamps never go backwards.
There is an awkward situation with H264 header bytes. Currently
they are returned with a PTS of 0 because they aren't associated
with a timestamped frame to encode. These are handled by either
returning the timestamp of the last buffer to have been received,
or in the case of the first buffer the timestamp taken at
start_streaming.
This results in a race where the current frame may have started
before we take the start time, which results in the first encoded
frame having an earlier timestamp than the header bytes.

Ensure that we never return a negative delta to the user by checking
against the previous timestamp.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
665f41ba52 staging: bcm2835-camera: Fix logical continuation splits
Fix checkpatch errors for "Logical continuations should be
on the previous line".

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Stefan Wahren
1ba9ed60ce staging: vchiq_arm: Fix platform device unregistration
In error case platform_device_register_data would return an ERR_PTR
instead of NULL. So we better check this before unregistration.

Fixes: 37b7b3087a ("staging/vc04_services: Register a platform device for the camera driver.")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
2019-09-17 11:19:59 +01:00
Dave Stevenson
00d7c97d0a media: tc358743: Return an appropriate colorspace from tc358743_set_fmt
When calling tc358743_set_fmt, the code was calling tc358743_get_fmt
to choose a valid format. However that sets the colorspace
based on what was read back from the chip. When you set the format,
then the driver would choose and program the colorspace based
on the format code.

The result was that if you called try or set format for UYVY
when the current format was RGB3 then you would get told sRGB,
and try RGB3 when current was UYVY and you would get told
SMPTE170M.

The value programmed into the chip is determined by this driver,
therefore there is no need to read back the value. Return the
colorspace based on the format set/tried instead.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
cb6e7e21e7 media: bcm2835-unicam: Pass through the colorspace on try_fmt
The current colorspace was always returned from try_fmt for no
good reason.
Return what the source subdevice returns instead.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
ecb83c60e5 MAINTAINERS: Add entry for BCM2835 Unicam driver
Adds entry for the new BCM2835 Unicam (CSI-2 receiver) driver

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
77a6f5413c media: bcm2835-unicam: Driver for CCP2/CSI2 camera interface
Add driver for the Unicam camera receiver block on
BCM283x processors.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
a5c25dec96 dt-bindings: Document BCM283x CSI2/CCP2 receiver
Document the DT bindings for the CSI2/CCP2 receiver peripheral
(known as Unicam) on BCM283x SoCs.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Acked-by: Rob Herring <robh@kernel.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
b2f4c94175 media: videodev2: Add helper defines for printing FOURCCs
New helper defines that allow printing of a FOURCC using
printf(V4L2_FOURCC_CONV, V4L2_FOURCC_CONV_ARGS(fourcc));

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
91d0ea6964 media: adv7180: Add YPrPb support for ADV7282M
The ADV7282M can support YPbPr on AIN1-3, but this was
not selectable from the driver. Add it to the list of
supported input modes.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
cb2a0c534f media: adv7180: Default to the first valid input
The hardware default is differential CVBS on AIN1 & 2, which
isn't very useful.

Select the first input that is defined as valid for the
chip variant (typically CVBS_AIN1).

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:59 +01:00
Dave Stevenson
dd6df047cc media: tc358743: Check I2C succeeded during probe.
The probe for the TC358743 reads the CHIPID register from
the device and compares it to the expected value of 0.
If the I2C request fails then that also returns 0, so
the driver loads thinking that the device is there.

Generally I2C communications are reliable so there is
limited need to check the return value on every transfer,
therefore only amend the one read during probe to check
for I2C errors.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:58 +01:00
Dave Stevenson
86c358cc27 media: tc358743: Add support for 972Mbit/s link freq.
Adds register setups for running the CSI lanes at 972Mbit/s,
which allows 1080P50 UYVY down 2 lanes.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:58 +01:00
Philipp Zabel
9acccafc0c media: tc358743: fix connected/active CSI-2 lane reporting
g_mbus_config was supposed to indicate all supported lane numbers, not
only the number of those currently in active use. Since the TC358743
can dynamically reduce the number of active lanes if the required
bandwidth allows for it, report all lane numbers up to the connected
number of lanes as supported in pdata mode.
In device tree mode, do not report lane count and clock mode at all, as
the receiver driver can determine these from the device tree.

To allow communicating the number of currently active lanes, add a new
bitfield to the v4l2_mbus_config flags. This is a temporary fix, to be
used only until a better solution is found.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-09-17 11:19:58 +01:00
Dave Stevenson
598bb0c643 media: tc358743: Increase FIFO level to 374.
The existing fixed value of 16 worked for UYVY 720P60 over
2 lanes at 594MHz, or UYVY 1080P60 over 4 lanes. (RGB888
1080P60 needs 6 lanes at 594MHz).
It doesn't allow for lower resolutions to work as the FIFO
underflows.

374 is required for 1080P24-30 UYVY over 2 lanes @ 972Mbit/s, but
>374 means that the FIFO underflows on 1080P50 UYVY over 2 lanes
@ 972Mbit/s.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:58 +01:00
Dave Stevenson
6acb9f331e media: ov5647: Add support for non-continuous clock mode
The driver was only supporting continuous clock mode
although this was not stated anywhere.
Non-continuous clock saves a small amount of power and
on some SoCs is easier to interface with.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:58 +01:00
Dave Stevenson
9828b3d610 media: ov5647: Add support for PWDN GPIO.
Add support for an optional GPIO connected to PWDN on the sensor.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:58 +01:00
Dave Stevenson
1071ab8313 [media] Documentation: DT: add device tree for PWDN control
Add optional GPIO pwdn to connect to the PWDN line on the sensor.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:58 +01:00
Dave Stevenson
cc75ca0967 media: ov5647: Add set_fmt and get_fmt calls.
There's no way to query the subdevice for the supported
resolutions.
Add set_fmt and get_fmt implementations.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:58 +01:00
Klaus Schulz
518eebb54a sound: pcm512x-codec: Adding 352.8kHz samplerate support 2019-09-17 11:19:58 +01:00
IQaudIO
edb26a1704 Added IQaudIO Pi-Codec board support (#2969)
Add support for the IQaudIO Pi-Codec board.

Signed-off-by: Gordon <gordon@iqaudio.com>

Fixed 48k timing issue
2019-09-17 11:19:58 +01:00
P33M
f7ff81f08b lan78xx: use default alignment for rx buffers
The lan78xx uses a 12-byte hardware rx header, so there is no need
to allocate SKBs with NET_IP_ALIGN set. Removes alignment faults
in both dwc_otg and in ipv6 processing.
2019-09-17 11:19:58 +01:00
Phil Elwell
545bc4b031 sound: Fixes for audioinjector-octo under 4.19
1. Move the DT alias declaration to the I2C shim in the cases
where the shim is enabled. This works around a problem caused by a
4.19 commit [1] that generates DT/OF uevents for I2C drivers.

2. Fix the diagnostics in an error path of the soundcard driver to
correctly identify the reason for the failure to load.

3. Move the declaration of the clock node in the overlay outside
the I2C node to avoid warnings.

4. Sort the overlay nodes so that dependencies are only to earlier
fragments, in an attempt to get runtime dtoverlay application to
work (it still doesn't...)

See: https://github.com/Audio-Injector/Octo/issues/14
Signed-off-by: Phil Elwell <phil@raspberrypi.org>

[1] af503716ac ("i2c: core: report OF style module alias for devices registered via OF")
2019-09-17 11:19:58 +01:00
FERHAT Nicolas
271da4a156 Audiophonics I-Sabre 9038Q2M DAC driver
Signed-off-by: Audiophonics <contact@audiophonics.fr>
2019-09-17 11:19:57 +01:00
Phil Howard
124ffc3b1f rtc: rv3028: Add backup switchover mode support
Signed-off-by: Phil Howard <phil@pimoroni.com>
2019-09-17 11:19:57 +01:00
Dave Stevenson
1ca156c875 drm: vc4: Programming the CTM is conditional on running full KMS
vc4_ctm_commit writes to HVS registers, so this is only applicable
when in full KMS mode, not in firmware KMS mode. Add this conditional.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:57 +01:00
Phil Elwell
e82fcb7768 bcm2835-dma: Add support for per-channel flags
Add the ability to interpret the high bits of the dreq specifier as
flags to be included in the DMA_CS register. The motivation for this
change is the ability to set the DISDEBUG flag for SD card transfers
to avoid corruption when using the VPU debugger.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:57 +01:00
Dave Stevenson
51382445b6 gpu: vc4_firmware_kms: Fix up 64 bit compile warnings.
Resolve two build warnings with regard using incorrectly
sized parameters in logging messages on 64 bit builds.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:57 +01:00
popcornmix
3db0bf04ed Revert "staging: bcm2835-audio: Drop DT dependency"
This reverts commit b7491a9fca.
2019-09-17 11:19:57 +01:00
Phil Elwell
1e57679f81 Revert "staging: vchiq: delete vchiq_killable.h"
This reverts commit 2da56630b1.
2019-09-17 11:19:57 +01:00
Phil Elwell
45a6af3f0a lan78xx: EEE support is now a PHY property
Now that EEE support is a property of the PHY, use the PHY's DT node
when querying the EEE-related properties.

See: https://github.com/raspberrypi/linux/issues/2882

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:57 +01:00
Phil Elwell
de66db02b6 configs: Enable the AD193x codecs
See: https://github.com/raspberrypi/linux/issues/2850

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:57 +01:00
Serge Schneider
ceef490ef1 mfd: Add rpi_sense_core of compatible string 2019-09-17 11:19:57 +01:00
HiFiBerry
e8b613d7d7 Added driver for the HiFiBerry DAC+ ADC (#2694)
Signed-off-by: Daniel Matuschek <daniel@hifiberry.com>

hifiberry_dacplusadc: switch to snd_soc_dai_set_bclk_ratio

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:57 +01:00
Phil Elwell
36861fe0d4 spi: spi-bcm2835: Disable forced software CS
With GPIO CS used by the DTBs, allow hardware CS to be selected by an
overlay.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:57 +01:00
Phil Elwell
a306498a94 spi: spi-bcm2835: Re-enable HW CS
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:57 +01:00
b-ak
fedbb9f7b4 ASoC: Add support for AudioSense-Pi add-on soundcard
AudioSense-Pi is a RPi HAT based on a TI's TLV320AIC32x4 stereo codec

This hardware provides multiple audio I/O capabilities to the RPi.
The codec connects to the RPi's SoC through the I2S Bus.

The following devices can be connected through a 3.5mm jack
	1. Line-In: Plain old audio in from mobile phones, PCs, etc.,
	2. Mic-In: Connect a microphone
	3. Line-Out: Connect the output to a speaker
	4. Headphones: Connect a Headphone w or w/o microphones

Multiple Inputs:
	It supports the following combinations
	1. Two stereo Line-Inputs and a microphone
	2. One stereo Line-Input and two microphones
	3. Two stereo Line-Inputs, a microphone and
		one mono line-input (with h/w hack)
	4. One stereo Line-Input, two microphones and
		one mono line-input (with h/w hack)

Multiple Outputs:
	Audio output can be routed to the headphones or
		speakers (with additional hardware)

Signed-off-by: b-ak <anur.bhargav@gmail.com>
2019-09-17 11:19:56 +01:00
Joshua Emele
08c58a7492 lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: https://github.com/raspberrypi/linux/issues/2447
2019-09-17 11:19:56 +01:00
Ezekiel Bethel
4cd1ceae0a bcm2835_smi: re-add dereference to fix DMA transfers 2019-09-17 11:19:56 +01:00
Dave Stevenson
519a437690 firmware: raspberrypi: Report the fw git hash during probe
The firmware can now report the git hash from which it was built
via the mailbox, so report it during probe.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:56 +01:00
Dave Stevenson
de9f0908ca firmware: raspberrypi: Report the fw variant during probe
The driver already reported the firmware build date during probe.
The mailbox calls have been extended to also report the variant
 1 = standard start.elf
 2 = start_x.elf (includes camera stack)
 3 = start_db.elf (includes assert logging)
 4 = start_cd.elf (cutdown version for smallest memory footprint).
Log the variant during probe.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:56 +01:00
Dave Stevenson
625ee0e75f staging: bcm2835-camera: Ensure H264 header bytes get a sensible timestamp
H264 header come from VC with 0 timestamps, which means they get a
strange timestamp when processed with VC/kernel start times,
particularly if used with the inline header option.
Remember the last frame timestamp and use that if set, or otherwise
use the kernel start time.

https://github.com/raspberrypi/linux/issues/1836

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:56 +01:00
Phil Elwell
5232d9ce71 net: lan78xx: Support auto-downshift to 100Mb/s
Ethernet cables with faulty or missing pairs (specifically pairs C and
D) allow auto-negotiation to 1000Mbs, but do not support the successful
establishment of a link. Add a DT property, "microchip,downshift-after",
to configure the number of auto-negotiation failures after which it
falls back to 100Mbs. Valid values are 2, 3, 4, 5 and 0, where 0 means
never downshift.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:56 +01:00
Matthias Reichl
656ae9e10f rpi-wm8804-soundcard: configure wm8804 clocks only on rate change
This should avoid clicks when stopping and immediately afterwards
starting a stream with the same samplerate as before.

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:56 +01:00
Matthias Reichl
bc445e2641 rpi-wm8804-soundcard: drop PWRDN register writes
Since kernel 4.0 the PWRDN register bits are under DAPM
control from the wm8804 driver.

Drop code that modifies that register to avoid interfering
with DAPM.

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:56 +01:00
Phil Elwell
f58e43e8fa lan78xx: disable interrupts for PHY irqs
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:56 +01:00
Phil Elwell
68e287cb23 gpiolib: Don't prevent IRQ usage of output GPIOs
Upstream Linux deems using output GPIOs to generate IRQs as a bogus
use case, even though the BCM2835 GPIO controller is capable of doing
so. A number of users would like to make use of this facility, so
disable the checks.

See: https://github.com/raspberrypi/linux/issues/2527

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:56 +01:00
James Hughes
ec11a8fe3d Update issue templates (#2736) 2019-09-17 11:19:56 +01:00
Serge Schneider
e77d76e805 drivers: thermal: step_wise: avoid throttling at hysteresis temperature after dropping below it
Signed-off-by: Serge Schneider <serge@raspberrypi.org>
2019-09-17 11:19:55 +01:00
Ram Chandrasekar
b56f104334 drivers: thermal: step_wise: add support for hysteresis
Step wise governor increases the mitigation level when the temperature
goes above a threshold and will decrease the mitigation when the
temperature falls below the threshold. If it were a case, where the
temperature hovers around a threshold, the mitigation will be applied
and removed at every iteration. This reaction to the temperature is
inefficient for performance.

The use of hysteresis temperature could avoid this ping-pong of
mitigation by relaxing the mitigation to happen only when the
temperature goes below this lower hysteresis value.

Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
2019-09-17 11:19:55 +01:00
Phil Elwell
cc6250f122 sc16is7xx: Don't spin if no data received
See: https://github.com/raspberrypi/linux/issues/2676

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:55 +01:00
Stefan Wahren
290efb0b0b firmware: raspberrypi: Add backward compatible get_throttled
Avoid a hard userspace ABI change by adding a compatible get_throttled
sysfs entry. Its value is now feed by the GET_THROTTLED requests of the
new hwmon driver. The first access to get_throttled will generate
a warning.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
2019-09-17 11:19:55 +01:00
Stefan Wahren
4583b08d3f hwmon: raspberrypi: Prevent voltage low warnings from filling log
Although the correct fix for low voltage warnings is to
improve the power supply, the current implementation
of the detection can fill the log if the warning
happens freqently. This replaces the logging with
slightly custom ratelimited logging.

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
2019-09-17 11:19:55 +01:00
detule
57f0a27f51 vchiq_2835_arm: Implement a DMA pool for small bulk transfers (#2699)
During a bulk transfer we request a DMA allocation to hold the
scatter-gather list.  Most of the time, this allocation is small
(<< PAGE_SIZE), however it can be requested at a high enough frequency
to cause fragmentation and/or stress the CMA allocator (think time
spent in compaction here, or during allocations elsewhere).

Implement a pool to serve up small DMA allocations, falling back
to a coherent allocation if the request is greater than
VCHIQ_DMA_POOL_SIZE.

Signed-off-by: Oliver Gjoneski <ogjoneski@gmail.com>
2019-09-17 11:19:55 +01:00
popcornmix
1b08d96dc1 cxd2880: CXD2880_SPI_DRV should select DVB_CXD2880 with MEDIA_SUBDRV_AUTOSELECT 2019-09-17 11:19:55 +01:00
Serge Schneider
1b8f5ac6d6 Add rpi-poe-fan driver
Signed-off-by: Serge Schneider <serge@raspberrypi.org>

PoE HAT driver cleanup

* Fix undeclared variable in rpi_poe_fan_suspend
* Add SPDX-License-Identifier
* Expand PoE acronym in Kconfig help
* Give clearer error message on of_property_count_u32_elems fail
* Add documentation
* Add vendor to of_device_id compatible string.
* Rename m_data_s struct to fw_data_s
* Fix typos

Fixes: #2665

Signed-off-by: Serge Schneider <serge@raspberrypi.org>
2019-09-17 11:19:55 +01:00
Phil Elwell
e1aebb45a7 lan78xx: Move enabling of EEE into PHY init code
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: https://github.com/raspberrypi/linux/issues/2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:55 +01:00
Phil Elwell
c0db3bfb19 brcmfmac: Re-enable firmware roaming support
As of 4.18, a firmware that implements the update_connect_params
method but doesn't claim to support roaming causes an error. We
disabled firmware roaming in 4.4 [1] because it appeared to
prevent disconnects, but let's try with the current firmware to see
if things have improved.

[1] dd91880117

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:55 +01:00
Dave Stevenson
706448e71c net: lan78xx: Disable TCP Segmentation Offload (TSO)
TSO seems to be having issues when packets are dropped and the
remote end uses Selective Acknowledge (SACK) to denote that
data is missing. The missing data is never resent, so the
connection eventually stalls.

There is a module parameter of enable_tso added to allow
further debugging without forcing a rebuild of the kernel.

https://github.com/raspberrypi/linux/issues/2449
https://github.com/raspberrypi/linux/issues/2482

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:55 +01:00
Phil Elwell
fe73eac1e4 of: configfs: Use of_overlay_fdt_apply API call
The published API to the dynamic overlay application mechanism now
takes a Flattened Device Tree blob as input so that it can manage the
lifetime of the unflattened tree. Conveniently, the new API call -
of_overlay_fdt_apply - is virtually a drop-in replacement for
create_overlay, which can now be deleted.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:55 +01:00
Phil Elwell
71fa23832b irqchip: irq-bcm2835: Calc. FIQ_START at boot-time
ad83c7cb2f ("irqchip/irq-bcm2836: Add support for DT interrupt polarity")
changed the way that the BCM2836/7 local interrupts are mapped; instead
of being pre-mapped they are now mapped on-demand. A side effect of this
change is that the call to irq_of_parse_and_map from armctrl_of_init
creates a new mapping, forming a gap between the IRQs and the FIQs. This
 gap breaks the FIQ<->IRQ mapping which up to now has been done by assuming:

1) that the value of FIQ_START is the same as the number of normal IRQs
that will be mapped (still true), and

2) that this value is also the offset between an IRQ and its equivalent
FIQ (which is no longer the case).

Remove both assumptions by measuring the interval between the last IRQ
and the last FIQ, passing it as the parameter to init_FIQ().

Fixes: https://github.com/raspberrypi/linux/issues/2432

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:54 +01:00
Phil Elwell
792e4b7ca1 firmware/raspberrypi: Notify firmware of a reboot
Register for reboot notifications, sending RPI_FIRMWARE_NOTIFY_REBOOT
over the mailbox interface on reception.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:54 +01:00
Nick Bulleid
ec5e2870c6 Add ability to export gpio used by gpio-poweroff
Signed-off-by: Nick Bulleid <nedbulleid@fastmail.com>

Added export feature to gpio-poweroff documentation

Signed-off-by: Nick Bulleid <nedbulleid@fastmail.com>
2019-09-17 11:19:54 +01:00
popcornmix
cd04fde19b hid: Reduce default mouse polling interval to 60Hz
Reduces overhead when using X
2019-09-17 11:19:54 +01:00
Phil Elwell
1ec4e43caa lan78xx: Read initial EEE status from DT
Add two new DT properties:
* microchip,eee-enabled  - a boolean to enable EEE
* microchip,tx-lpi-timer - time in microseconds to wait before entering
                           low power state

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:54 +01:00
hdoverobinson
2be3b49561 added capture_clear option to pps-gpio via dtoverlay (#2433) 2019-09-17 11:19:54 +01:00
Phil Elwell
1ba1643bef i2c-gpio: Also set bus numbers from reg property
I2C busses can be assigned specific bus numbers using aliases in
Device Tree - string properties where the name is the alias and the
value is the path to the node. The current DT parameter mechanism
does not allow property names to be derived from a parameter value
in any way, so it isn't possible to generate unique or matching
aliases for nodes from an overlay that can generate multiple
instances, e.g. i2c-gpio.

Work around this limitation (at least temporarily) by allowing
the i2c adapter number to be initialised from the "reg" property
if present.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:54 +01:00
Eric Anholt
a7e9dd37ed drm/vc4: Don't wait for vblank on fkms cursor updates.
We don't use the same async update path between fkms and normal kms,
and the normal kms workaround ended up making us wait.  This became a
larger problem in rpi-4.14.y, as the USB HID update rate throttling
got (accidentally?) dropped.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:54 +01:00
Dave Stevenson
7e9d86c1a5 gpu:vc4-fkms: Update driver to not use plane->crtc.
Following on from
commit 2f958af7fc ("drm/vc4: Stop updating plane->fb/crtc")
do the same in the firmwarekms driver and look at plane_state->crtc
instead.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:54 +01:00
popcornmix
7bd4c486e5 vc4_firmware_kms: fix build 2019-09-17 11:19:54 +01:00
Eric Anholt
9c681547fd drm/vc4: Remove duplicate primary/cursor fields from FKMS driver.
The CRTC has those fields and we can just use them.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:54 +01:00
Eric Anholt
6c345b4561 drm/vc4: Skip SET_CURSOR_INFO when the cursor contents didn't change.
Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:54 +01:00
Eric Anholt
50ab55d55f drm/vc4: Fix warning about vblank interrupts before DRM core is ready.
The SMICS interrupt fires continuously, but since it's 1/100 the rate
of the USB interrupts, we don't really need a way to turn it off.  We
do need to make sure that we don't tell DRM about it until DRM has
asked for the interrupt at least once, because otherwise it will throw
a warning at boot time.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:54 +01:00
popcornmix
cd7a9d3720 vc4_fkms: Apply firmware overscan offset to hardware cursor 2019-09-17 11:19:53 +01:00
Eric Anholt
a7c81b0240 drm/vc4: Add missing enable/disable vblank handlers in fkms.
Fixes hang at boot in 4.14.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:53 +01:00
Eric Anholt
173987df9f drm/vc4: Add FB modifier support to firmwarekms.
Signed-off-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 11752d7348)
2019-09-17 11:19:53 +01:00
Eric Anholt
8e2453e106 drm/vc4: Add support for setting DPMS in firmwarekms.
This ensures that the screen goes blank during DPMS (screensaver),
including the cursor.  Planes don't necessarily get disabled during
CRTC disable, so we need to be careful to not leave them on or turn
them back on early.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:53 +01:00
Eric Anholt
95c8ff2588 drm/vc4: Fix sending of page flip completion events in FKMS mode.
In the rewrite of vc4_crtc.c for fkms, I dropped the part of the
CRTC's atomic flush handler that moved the completion event from the
proposed atomic state change to the CRTC's current state.  That meant
that when full screen pageflipping happened (glxgears -fullscreen in
X, compton, por weston), the app would end up blocked firever waiting
to draw its next frame.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:53 +01:00
Eric Anholt
5c7fab1ada drm/vc4: Add DRM_DEBUG_ATOMIC for the insides of fkms.
Trying to debug weston on fkms involved figuring out what calls I was
making to the firmware.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:53 +01:00
Eric Anholt
eefd4be2a8 drm/vc4: Name the primary and cursor planes in fkms.
This makes debugging nicer, compared to trying to remember what the
IDs are.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:53 +01:00
Eric Anholt
992f490321 drm/vc4: Add a mode for using the closed firmware for display.
Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:53 +01:00
Eric Anholt
46e0182d3d raspberrypi-firmware: Export the general transaction function.
The vc4-firmware-kms module is going to be doing the MBOX FB call.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:53 +01:00
Phil Elwell
152c429f52 serial: 8250: bcm2835aux - suppress EPROBE_DEFER
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:53 +01:00
Phil Elwell
1c1f919210 ARM: Activate FIQs to avoid __irq_startup warnings
There is a new test in __irq_startup that the IRQ is activated, which
hasn't been the case for FIQs since they bypass some of the usual setup.

Augment enable_fiq to include a call to irq_activate to avoid the
warning.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:53 +01:00
Phil Elwell
ca0ee54b0c dwc-otg: FIQ: Fix "bad mode in data abort handler"
Create a semi-static mapping for the USB registers early in the boot
process, before additional kernel threads are started, so all threads
will have the mappings from the start. This avoids the need for
data aborts to lazily update them.

See: https://github.com/raspberrypi/linux/issues/2450

Signed-off-by: Floris Bos <bos@je-eigen-domein.nl>
2019-09-17 11:19:53 +01:00
Noralf Trønnes
27166ab42e ARM: bcm2835: Set Serial number and Revision
The VideoCore bootloader passes in Serial number and
Revision number through Device Tree. Make these available to
userspace through /proc/cpuinfo.

Mainline status:

There is a commit in linux-next that standardize passing the serial
number through Device Tree (string: /serial-number):
ARM: 8355/1: arch: Show the serial number from devicetree in cpuinfo

There was an attempt to do the same with the revision number, but it
didn't get in:
[PATCH v2 1/2] arm: devtree: Set system_rev from DT revision

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
2019-09-17 11:19:52 +01:00
Phil Elwell
5f174f8b47 cgroup: Disable cgroup "memory" by default
Some Raspberry Pis have limited RAM and most users won't use the
cgroup memory support so it is disabled by default. Enable with:

    cgroup_enable=memory

See: https://github.com/raspberrypi/linux/issues/1950

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:52 +01:00
Phil Elwell
c03d95f2e0 mcp2515: Use DT-supplied interrupt flags
The MCP2515 datasheet clearly describes a level-triggered interrupt
pin. Therefore the receiving interrupt controller must also be
configured for level-triggered operation otherwise there is a danger
of a missed interrupt condition blocking all subsequent interrupts.
The ONESHOT flag ensures that the interrupt is masked until the
threaded interrupt handler exits.

Rather than change the flags globally (they must have worked for at
least one user), allow the flags to be overridden from Device Tree
in the event that the device has a DT node.

See: https://github.com/raspberrypi/linux/issues/2175
     https://github.com/raspberrypi/linux/issues/2263

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:52 +01:00
James Hughes
cdeb1e96c0 AXI performance monitor driver (#2222)
Uses the debugfs I/F to provide access to the AXI
bus performance monitors.

Requires the new mailbox peripheral access for access
to the VPU performance registers, system bus access
is done using direct register reads.

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
2019-09-17 11:19:52 +01:00
popcornmix
7fcbd1d89b cache: export clean and invalidate
hack: cache: Fix linker error
2019-09-17 11:19:52 +01:00
Phil Elwell
81039749db Revert "build/arm64: Add rules for .dtbo files for dts overlays"
DT build rules are now in the common top-level Makefile.

This reverts commit dce5b0fbdd.
2019-09-17 11:19:52 +01:00
Khem Raj
7faf64305c build/arm64: Add rules for .dtbo files for dts overlays
We now create overlays as .dtbo files.

Signed-off-by: Khem Raj <raj.khem@gmail.com>
2019-09-17 11:19:52 +01:00
Michael Zoran
fd9266513d ARM64: Force hardware emulation of deprecated instructions. 2019-09-17 11:19:52 +01:00
Michael Zoran
d10980bba5 ARM64: Round-Robin dispatch IRQs between CPUs.
IRQ-CPU mapping is round robined on ARM64 to increase
concurrency and allow multiple interrupts to be serviced
at a time.  This reduces the need for FIQ.

Signed-off-by: Michael Zoran <mzoran@crowfest.net>
2019-09-17 11:19:52 +01:00
popcornmix
74f3fa02d0 config: Add default configs 2019-09-17 11:19:52 +01:00
Phil Elwell
e3714c7dd4 hci_h5: Don't send conf_req when ACTIVE
Without this patch, a modem and kernel can continuously bombard each
other with conf_req and conf_rsp messages, in a demented game of tag.
2019-09-17 11:19:52 +01:00
Cheong2K
6bae335f53 brcm: adds support for BCM43341 wifi
brcmfmac: Disable power management

Disable wireless power saving in the brcmfmac WLAN driver. This is a
temporary measure until the connectivity loss resulting from power
saving is resolved.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

brcmfmac: Use original country code as a fallback

Commit 73345fd212:

    brcmfmac: Configure country code using device specific settings

prevents region codes from working on devices that lack a region code
translation table. In the event of an absent table, preserve the old
behaviour of using the provided code as-is.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

brcmfmac: Plug memory leak in brcmf_fill_bss_param

See: https://github.com/raspberrypi/linux/issues/1471

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

brcmfmac: do not use internal roaming engine by default

Some evidence of curing disconnects with this disabled, so make it a default.
Can be overridden with module parameter roamoff=0
See: http://projectable.me/optimize-my-pi-wi-fi/

brcmfmac: Change stop_ap sequence

Patch from Broadcom/Cypress to resolve a customer error

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:52 +01:00
Pantelis Antoniou
774a23ee85 OF: DT-Overlay configfs interface
This is a port of Pantelis Antoniou's v3 port that makes use of the
new upstreamed configfs support for binary attributes.

Original commit message:

Add a runtime interface to using configfs for generic device tree overlay
usage. With it its possible to use device tree overlays without having
to use a per-platform overlay manager.

Please see Documentation/devicetree/configfs-overlays.txt for more info.

Changes since v2:
- Removed ifdef CONFIG_OF_OVERLAY (since for now it's required)
- Created a documentation entry
- Slight rewording in Kconfig

Changes since v1:
- of_resolve() -> of_resolve_phandles().

Originally-signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Signed-off-by: Phil Elwell <phil@raspberrypi.org>

DT configfs: Fix build errors on other platforms

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

DT configfs: fix build error

There is an error when compiling rpi-4.6.y branch:
  CC      drivers/of/configfs.o
drivers/of/configfs.c:291:21: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
   .default_groups = of_cfs_def_groups,
                     ^
drivers/of/configfs.c:291:21: note: (near initialization for 'of_cfs_subsys.su_group.default_groups.next')

The .default_groups is linked list since commit
1ae1602de0.
This commit uses configfs_add_default_group to fix this problem.

Signed-off-by: Slawomir Stepien <sst@poczta.fm>

configfs: New of_overlay API
2019-09-17 11:19:51 +01:00
popcornmix
b908e9c989 net: Add non-mainline source for rtl8192cu wlan
We are now syncing with version from:
https://github.com/pvaret/rtl8192cu-fixes
2019-09-17 11:19:51 +01:00
popcornmix
526802f76e bcm2835-virtgpio: Virtual GPIO driver
Add a virtual GPIO driver that uses the firmware mailbox interface to
request that the VPU toggles LEDs.
2019-09-17 11:19:51 +01:00
P33M
4cb3e8181c rpi_display: add backlight driver and overlay
Add a mailbox-driven backlight controller for the Raspberry Pi DSI
touchscreen display. Requires updated GPU firmware to recognise the
mailbox request.

Signed-off-by: Gordon Hollingworth <gordon@raspberrypi.org>

Add Raspberry Pi firmware driver to the dependencies of backlight driver

Otherwise the backlight driver fails to build if the firmware
loading driver is not in the kernel

Signed-off-by: Alex Riesen <alexander.riesen@cetitec.com>
2019-09-17 11:19:51 +01:00
Tim Gover
5cc4fa2ede ASoC: Create a generic Pi Hat WM8804 driver
Reduce the amount of duplicated code by creating a generic driver for
Pi Hat digi cards using the WM8804 codec.

This replaces the
Allo DigiOne, Hifiberry Digi/Pro, JustBoom Digi and IQAudIO Digi
dedicate soundcard drivers with a generic driver.

There are no significant changes to the runtime behavior of the drivers
and end users should not have to change any configuration settings
after upgrading.

Minor changes
* Check the return value of snd_soc_component_update_bits
* Added some pr_debug tracing
* Various checkpatch tidyups
* Updated allodigi-one to use use 128FS at > 96 Khz. This appears to
  be an omission in the original driver code so followed the Hifiberry
  DAC driver approach.
2019-09-17 11:19:51 +01:00
popcornmix
21b21e30e0 ASoC: Add Kconfig and Makefile for sound/soc/bcm
Signed-off-by: popcornmix <popcornmix@gmail.com>
2019-09-17 11:19:51 +01:00
Tim Gover
90c73624b2 ASoC: Add generic RPI driver for simple soundcards.
The RPI simple sound card driver provides a generic ALSA SOC card driver
supporting a variety of Pi HAT soundcards. The intention is to avoid
the duplication of code for cards that can't be fully supported by
the soc simple/graph cards but are otherwise almost identical.

This initial commit adds support for the ADAU1977 ADC, Google VoiceHat,
HifiBerry AMP, HifiBerry DAC and RPI DAC.

Signed-off-by: Tim Gover <tim.gover@raspberrypi.org>

ASoC: Use correct card name in rpi-simple driver

Use the specific card name from drvdata instead of the snd_rpi_simple

rpi-simple-soundcard: Use nicer driver name "RPi-simple"

Rename the driver from "RPI simple soundcard" to "RPi-simple" so that
the driver name won't be mangled allowing to be used unaltered as the
card conf filename.
2019-09-17 11:19:51 +01:00
allocom
a6046c1c20 Driver and overlay for Allo Katana DAC
Allo Katana DAC: Updated default values

Signed-off-by: Jaikumar <jaikumar@cem-solutions.com>

Added mute stream func

Signed-off-by: Jaikumar <jaikumar@cem-solutions.net>
2019-09-17 11:19:51 +01:00
Peter Malkin
619d26eb0b Driver support for Google voiceHAT soundcard.
ASoC: googlevoicehat-codec: Use correct device when grabbing GPIO

The fixup for the VoiceHAT in 4.18 incorrectly tried to find the
sdmode GPIO pin under the card device, not the codec device.
This failed, and therefore caused the device probe to fail.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

ASoC: googlevoicehat-codec: Reformat for kernel coding standards

Fix all whitespace, indentation, and bracing errors.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

ASoC: googlevoicehat-codec: Make driver function structure const

Make voicehat_component_driver a const structure.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

ASoC: googlevoicehat-codec: Only convert from ms to jiffies once

Minor optimisation and allows to become checkpatch clean.
A msec value is read out of DT or from a define, and convert once to
jiffies, rather than every time that it is used.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:51 +01:00
Matt Flax
59306adf7e Add support for the AudioInjector.net Octo sound card
AudioInjector Octo: sample rates, regulators, reset

This patch adds new sample rates to the Audioinjector Octo sound card. The
new supported rates are (in kHz) :
96, 48, 32, 24, 16, 8, 88.2, 44.1, 29.4, 22.05, 14.7

Reference the bcm270x DT regulators in the overlay.

This patch adds a reset GPIO for the AudioInjector.net octo sound card.

Audioinjector octo : Make the playback and capture symmetric

This patch ensures that the sample rate and channel count of the audioinjector
octo sound card are symmetric.

audioinjector-octo: Add continuous clock feature

By user request, add a switch to prevent the clocks being stopped when
the stream is paused, stopped or shutdown. Provide access to the switch
by adding a 'non-stop-clocks' parameter to the audioinjector-addons
overlay.

See: https://github.com/raspberrypi/linux/issues/2409

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:51 +01:00
Fe-Pi
3c685ea18a Add support for Fe-Pi audio sound card. (#1867)
Fe-Pi Audio Sound Card is based on NXP SGTL5000 codec.
Mechanical specification of the board is the same the Raspberry Pi Zero.
3.5mm jacks for Headphone/Mic, Line In, and Line Out.

Signed-off-by: Henry Kupis <fe-pi@cox.net>
2019-09-17 11:19:51 +01:00
Miquel
a4062a0580 sound: Support for Dion Audio LOCO-V2 DAC-AMP HAT
Signed-off-by: Miquel Blauw <info@dionaudio.nl>

ASoC: dionaudio_loco-v2: fix S24_LE format

Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.

Also remove hw_params and ops as they are no longer needed.

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:50 +01:00
Matthias Reichl
932c7dcd94 ASoC: Add driver for Cirrus Logic Audio Card
Note: due to problems with deferred probing of regulators
the following softdep should be added to a modprobe.d file

softdep arizona-spi pre: arizona-ldo1

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:50 +01:00
gtrainavicius
034efbfe01 Support for Blokas Labs pisound board
Pisound dynamic overlay (#1760)

Restructuring pisound-overlay.dts, so it can be loaded and unloaded dynamically using dtoverlay.

Print a logline when the kernel module is removed.

pisound improvements:

* Added a writable sysfs object to enable scripts / user space software
to blink MIDI activity LEDs for variable duration.
* Improved hw_param constraints setting.
* Added compatibility with S16_LE sample format.
* Exposed some simple placeholder volume controls, so the card appears
in volumealsa widget.

Add missing SND_PISOUND selects dependency to SND_RAWMIDI

Without it the Pisound module fails to compile.
See https://github.com/raspberrypi/linux/issues/2366

Updates for Pisound module code:

	* Merged 'Fix a warning in DEBUG builds' (1c8b82b).
	* Updating some strings and copyright information.
	* Fix for handling high load of MIDI input and output.
	* Use dual rate oversampling ratio for 96kHz instead of single
	  rate one.

Signed-off-by: Giedrius Trainavicius <giedrius@blokas.io>

Fixing memset call in pisound.c

Signed-off-by: Giedrius Trainavicius <giedrius@blokas.io>

Fix for Pisound's MIDI Input getting blocked for a while in rare cases.

There was a possible race condition which could lead to Input's FIFO queue
to be underflown, causing high amount of processing in the worker thread for
some period of time.

Signed-off-by: Giedrius Trainavicius <giedrius@blokas.io>

Fix for Pisound kernel module in Real Time kernel configuration.

When handler of data_available interrupt is fired, queue_work ends up
getting called and it can block on a spin lock which is not allowed in
interrupt context. The fix was to run the handler from a thread context
instead.

Pisound: Remove spinlock usage around spi_sync
2019-09-17 11:19:50 +01:00
BabuSubashChandar
dd830be9e2 Add support for Allo Boss DAC add-on board for Raspberry Pi. (#1924)
Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Reviewed-by: Deepak <deepak@zilogic.com>
Reviewed-by: BabuSubashChandar <babusubashchandar@zilogic.com>

Add support for new clock rate and mute gpios.

Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Reviewed-by: Deepak <deepak@zilogic.com>
Reviewed-by: BabuSubashChandar <babusubashchandar@zilogic.com>

ASoC: allo-boss-dac: fix S24_LE format

Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.

Signed-off-by: Matthias Reichl <hias@horus.com>

ASoC: allo-boss-dac: transmit S24_LE with 64 BCLK cycles

Signed-off-by: Matthias Reichl <hias@horus.com>

allo-boss-dac: switch to snd_soc_dai_set_bclk_ratio

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:50 +01:00
Raashid Muhammed
af9767f8c0 Add support for Allo Piano DAC 2.1 plus add-on board for Raspberry Pi.
The Piano DAC 2.1 has support for 4 channels with subwoofer.

Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Reviewed-by: Vijay Kumar B. <vijaykumar@zilogic.com>
Reviewed-by: Raashid Muhammed <raashidmuhammed@zilogic.com>

Add clock changes and mute gpios (#1938)

Also improve code style and adhere to ALSA coding conventions.

Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Reviewed-by: Vijay Kumar B. <vijaykumar@zilogic.com>
Reviewed-by: Raashid Muhammed <raashidmuhammed@zilogic.com>

PianoPlus: Dual Mono & Dual Stereo features added (#2069)

allo-piano-dac-plus: Master volume added + fixes

Master volume added, which controls both DACs volumes.

See: https://github.com/raspberrypi/linux/pull/2149

Also fix initial max volume, default mode value, and unmute.

Signed-off-by: allocom <sparky-dev@allo.com>

ASoC: allo-piano-dac-plus: fix S24_LE format

Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.

Signed-off-by: Matthias Reichl <hias@horus.com>

sound: bcm: Fix memset dereference warning

This warning appears with GCC 6.4.0 from toolchains.bootlin.com:

../sound/soc/bcm/allo-piano-dac-plus.c: In function ‘snd_allo_piano_dac_init’:
../sound/soc/bcm/allo-piano-dac-plus.c:711:30: warning: argument to ‘sizeof’ in ‘memset’ call is the same expression as the destination; did you mean to dereference it? [-Wsizeof-pointer-memaccess]
  memset(glb_ptr, 0x00, sizeof(glb_ptr));
                              ^

Suggested-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
2019-09-17 11:19:50 +01:00
Clive Messer
c15521cc28 Allo Piano DAC boards: Initial 2 channel (stereo) support (#1645)
Add initial 2 channel (stereo) support for Allo Piano DAC (2.0/2.1) boards,
using allo-piano-dac-pcm512x-audio overlay and allo-piano-dac ALSA ASoC
machine driver.

NB. The initial support is 2 channel (stereo) ONLY!
(The Piano DAC 2.1 will only support 2 channel (stereo) left/right output,
 pending an update to the upstream pcm512x codec driver, which will have
 to be submitted via upstream. With the initial downstream support,
 provided by this patch, the Piano DAC 2.1 subwoofer outputs will
 not function.)

Signed-off-by: Baswaraj K <jaikumar@cem-solutions.net>
Signed-off-by: Clive Messer <clive.messer@digitaldreamtime.co.uk>
Tested-by: Clive Messer <clive.messer@digitaldreamtime.co.uk>

ASoC: allo-piano-dac: fix S24_LE format

Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.

Also remove hw_params and ops as they are no longer needed.

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:50 +01:00
DigitalDreamtime
571b9a0ea1 Add support for Dion Audio LOCO DAC-AMP HAT
Using dedicated machine driver and pcm5102a codec driver.

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
2019-09-17 11:19:50 +01:00
escalator2015
bbb897e8a8 New driver for RRA DigiDAC1 soundcard using WM8741 + WM8804 2019-09-17 11:19:50 +01:00
Matt Flax
035368cfac New AudioInjector.net Pi soundcard with low jitter audio in and out.
Contains the sound/soc/bcm ALSA machine driver and necessary alterations to the Kconfig and Makefile.
Adds the dts overlay and updates the Makefile and README.
Updates the relevant defconfig files to enable building for the Raspberry Pi.
Thanks to Phil Elwell (pelwell) for the review, simple-card concepts and discussion. Thanks to Clive Messer for overlay naming suggestions.

Added support for headphones, microphone and bclk_ratio settings.

This patch adds headphone and microphone capability to the Audio Injector sound card. The patch also sets the bit clock ratio for use in the bcm2835-i2s driver. The bcm2835-i2s can't handle an 8 kHz sample rate when the bit clock is at 12 MHz because its register is only 10 bits wide which can't represent the ch2 offset of 1508. For that reason, the rate constraint is added.
2019-09-17 11:19:50 +01:00
Aaron Shaw
c9e4f9883e Add Support for JustBoom Audio boards
justboom-dac: Adjust for ALSA API change

As of 4.4, snd_soc_limit_volume now takes a struct snd_soc_card *
rather than a struct snd_soc_codec *.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

ASoC: justboom-dac: fix S24_LE format

Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.

Also remove hw_params as it's no longer needed.

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:50 +01:00
Waldemar Brodkorb
1f9f5382e9 Add driver for rpi-proto
Forward port of 3.10.x driver from https://github.com/koalo
We are using a custom board and would like to use rpi 3.18.x
kernel. Patch works fine for our embedded system.

URL to the audio chip:
http://www.mikroe.com/add-on-boards/audio-voice/audio-codec-proto/

Playback tested with devicetree enabled.

Signed-off-by: Waldemar Brodkorb <wbrodkorb@conet.de>
2019-09-17 11:19:50 +01:00
Daniel Matuschek
be87e5d640 Added driver for HiFiBerry Amp amplifier add-on board
The driver contains a low-level hardware driver for the TAS5713 and the
drivers for the Raspberry Pi I2S subsystem.

TAS5713: return error if initialisation fails

Existing TAS5713 driver logs errors during initialisation, but does not return
an error code. Therefore even if initialisation fails, the driver will still be
loaded, but won't work. This patch fixes this. I2C communication error will now
reported correctly by a non-zero return code.

HiFiBerry Amp: fix device-tree problems

Some code to load the driver based on device-tree-overlays was missing. This is added by this patch.
2019-09-17 11:19:50 +01:00
Daniel Matuschek
94f31955ef Added support for HiFiBerry DAC+
The driver is based on the HiFiBerry DAC driver. However HiFiBerry DAC+ uses
a different codec chip (PCM5122), therefore a new driver is necessary.

Add support for the HiFiBerry DAC+ Pro.

The HiFiBerry DAC+ and DAC+ Pro products both use the existing bcm sound driver with the DAC+ Pro having a special clock device driver representing the two high precision oscillators.

An addition bug fix is included for the PCM512x codec where by the physical size of the sample frame is used in the calculation of the LRCK divisor as it was found to be wrong when using 24-bit depth sample contained in a little endian 4-byte sample frame.

Limit PCM512x "Digital" gain to 0dB by default with HiFiBerry DAC+

24db_digital_gain DT param can be used to specify that PCM512x
codec "Digital" volume control should not be limited to 0dB gain,
and if specified will allow the full 24dB gain.

Add dt param to force HiFiBerry DAC+ Pro into slave mode

"dtoverlay=hifiberry-dacplus,slave"

Add 'slave' param to use HiFiBerry DAC+ Pro in slave mode,
with Pi as master for bit and frame clock.

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>

Fixed a bug when using 352.8kHz sample rate

Signed-off-by: Daniel Matuschek <daniel@hifiberry.com>

ASoC: pcm512x: revert downstream changes

This partially reverts commit 185ea05465
which was added by https://github.com/raspberrypi/linux/pull/1152

The downstream pcm512x changes caused a regression, it broke normal
use of the 24bit format with the codec, eg when using simple-audio-card.

The actual bug with 24bit playback is the incorrect usage
of physical_width in various drivers in the downstream tree
which causes 24bit data to be transmitted with 32 clock
cycles. So it's not the pcm512x that needs fixing, it's the
soundcard drivers.

Signed-off-by: Matthias Reichl <hias@horus.com>

ASoC: hifiberry_dacplus: fix S24_LE format

Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.

Signed-off-by: Matthias Reichl <hias@horus.com>

ASoC: hifiberry_dacplus: transmit S24_LE with 64 BCLK cycles

Signed-off-by: Matthias Reichl <hias@horus.com>

hifiberry_dacplus: switch to snd_soc_dai_set_bclk_ratio

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:50 +01:00
Gordon Garrity
ddefb02660 Add IQaudIO Sound Card support for Raspberry Pi
Set a limit of 0dB on Digital Volume Control

The main volume control in the PCM512x DAC has a range up to
+24dB. This is dangerously loud and can potentially cause massive
clipping in the output stages. Therefore this sets a sensible
limit of 0dB for this control.

Allow up to 24dB digital gain to be applied when using IQAudIO DAC+

24db_digital_gain DT param can be used to specify that PCM512x
codec "Digital" volume control should not be limited to 0dB gain,
and if specified will allow the full 24dB gain.

Modify IQAudIO DAC+ ASoC driver to set card/dai config from dt

Add the ability to set the card name, dai name and dai stream name, from
dt config.

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>

IQaudIO: auto-mute for AMP+ and DigiAMP+

IQAudIO amplifier mute via GPIO22. Add dt params for "one-shot" unmute
and auto mute.

Revision 2, auto mute implementing HiassofT suggestion to mute/unmute
using set_bias_level, rather than startup/shutdown....
"By default DAPM waits 5 seconds (pmdown_time) before shutting down
playback streams so a close/stop immediately followed by open/start
doesn't trigger an amp mute+unmute."

Tested on both AMP+ (via DAC+) and DigiAMP+, with both options...

dtoverlay=iqaudio-dacplus,unmute_amp
 "one-shot" unmute when kernel module loads.

dtoverlay=iqaudio-dacplus,auto_mute_amp
 Unmute amp when ALSA device opened by a client. Mute, with 5 second delay
 when ALSA device closed. (Re-opening the device within the 5 second close
 window, will cancel mute.)

Revision 4, using gpiod.

Revision 5, clean-up formatting before adding mute code.
 - Convert tab plus 4 space formatting to 2x tab
 - Remove '// NOT USED' commented code

Revision 6, don't attempt to "one-shot" unmute amp, unless card is
successfully registered.

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>

ASoC: iqaudio-dac: fix S24_LE format

Remove set_bclk_ratio call so 24-bit data is transmitted in
24 bclk cycles.

Signed-off-by: Matthias Reichl <hias@horus.com>
2019-09-17 11:19:49 +01:00
Florian Meier
5d2189399f ASoC: Add support for Rpi-DAC 2019-09-17 11:19:49 +01:00
Phil Elwell
0e1af5ace7 mfd: Add Raspberry Pi Sense HAT core driver 2019-09-17 11:19:49 +01:00
Phil Elwell
fecfb8ac9a gpio-poweroff: Allow it to work on Raspberry Pi
The Raspberry Pi firmware manages the power-down and reboot
process. To do this it installs a pm_power_off handler, causing
the gpio-poweroff module to abort the probe function.

This patch introduces a "force" DT property that overrides that
behaviour, and also adds a DT overlay to enable and control it.

Note that running in an active-low configuration (DT parameter
"active_low") requires a custom dt-blob.bin and probably won't
allow a reboot without switching off, so an external inversion
of the trigger signal may be preferable.
2019-09-17 11:19:49 +01:00
popcornmix
a417ee4ed4 Improve __copy_to_user and __copy_from_user performance
Provide a __copy_from_user that uses memcpy. On BCM2708, use
optimised memcpy/memmove/memcmp/memset implementations.

arch/arm: Add mmiocpy/set aliases for memcpy/set

See: https://github.com/raspberrypi/linux/issues/1082

copy_from_user: CPU_SW_DOMAIN_PAN compatibility

The downstream copy_from_user acceleration must also play nice with
CONFIG_CPU_SW_DOMAIN_PAN.

See: https://github.com/raspberrypi/linux/issues/1381

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:49 +01:00
popcornmix
bf2ae19364 Added Device IDs for August DVB-T 205 2019-09-17 11:19:49 +01:00
Phil Elwell
65afae07bc BCM270x_DT: Add pwr_led, and the required "input" trigger
The "input" trigger makes the associated GPIO an input.  This is to support
the Raspberry Pi PWR LED, which is driven by external hardware in normal use.

N.B. pwr_led is not available on Model A or B boards.

leds-gpio: Implement the brightness_get method

The power LED uses some clever logic that means it is driven
by a voltage measuring circuit when configured as input, otherwise
it is driven by the GPIO output value. This patch wires up the
brightness_get method for leds-gpio so that user-space can monitor
the LED value via /sys/class/gpio/led1/brightness. Using the input
trigger this returns an indication of the system power health,
otherwise it is just whatever value the trigger has written most
recently.

See: https://github.com/raspberrypi/linux/issues/1064
2019-09-17 11:19:49 +01:00
notro
17cc6cc6e8 BCM2708: Add core Device Tree support
Add the bare minimum needed to boot BCM2708 from a Device Tree.

Signed-off-by: Noralf Tronnes <notro@tronnes.org>

BCM2708: DT: change 'axi' nodename to 'soc'

Change DT node named 'axi' to 'soc' so it matches ARCH_BCM2835.
The VC4 bootloader fills in certain properties in the 'axi' subtree,
but since this is part of an upstreaming effort, the name is changed.

Signed-off-by: Noralf Tronnes notro@tronnes.org

BCM2708_DT: Correct length of the peripheral space

Use dts-dirs feature for overlays.

The kernel makefiles have a dts-dirs target that is for vendor subdirectories.

Using this fixes the install_dtbs target, which previously did not install the overlays.

BCM270X_DT: configure I2S DMA channels

Signed-off-by: Matthias Reichl <hias@horus.com>

BCM270X_DT: switch to bcm2835-i2s

I2S soundcard drivers with proper devicetree support (i.e. not linking
to the cpu_dai/platform via name but to cpu/platform via of_node)
will work out of the box without any modifications.

When the kernel is compiled without devicetree support the platform
code will instantiate the bcm2708-i2s driver and I2S soundcard drivers
will link to it via name, as before.

Signed-off-by: Matthias Reichl <hias@horus.com>

SDIO-overlay: add poll_once-boolean parameter

Add paramter to toggle sdio-device-polling
done every second or once at boot-time.

Signed-off-by: Patrick Boettcher <patrick.boettcher@posteo.de>

BCM270X_DT: Make mmc overlay compatible with current firmware

The original DT overlay logic followed a merge-then-patch procedure,
i.e. parameters are applied to the loaded overlay before the overlay
is merged into the base DTB. This sequence has been changed to
patch-then-merge, in order to support parameterised node names, and
to protect against bad overlays. As a result, overrides (parameters)
must only target labels in the overlay, but the overlay can obviously target nodes in the base DTB.

mmc-overlay.dts (that switches back to the original mmc sdcard
driver) is the only overlay violating that rule, and this patch
fixes it.

bcm270x_dt: Use the sdhost MMC controller by default

The "mmc" overlay reverts to using the other controller.

squash: Add cprman to dt

BCM270X_DT: Use clk_core for I2C interfaces

BCM270X_DT: Use bcm283x.dtsi, bcm2835.dtsi and bcm2836.dtsi

The mainline Device Tree files are quite close to downstream now.
Let's use bcm283x.dtsi, bcm2835.dtsi and bcm2836.dtsi as base files
for our dts files.

Mainline dts files are based on these files:

          bcm2835-rpi.dtsi
  bcm2835.dtsi    bcm2836.dtsi
          bcm283x.dtsi

Current downstream are based on these:

  bcm2708.dtsi    bcm2709.dtsi    bcm2710.dtsi
             bcm2708_common.dtsi

This patch introduces this dependency:

  bcm2708.dtsi    bcm2709.dtsi
          bcm2708-rpi.dtsi
          bcm270x.dtsi
  bcm2835.dtsi    bcm2836.dtsi
          bcm283x.dtsi

And:
          bcm2710.dtsi
          bcm2708-rpi.dtsi
          bcm270x.dtsi
          bcm283x.dtsi

bcm270x.dtsi contains the downstream bcm283x.dtsi diff.
bcm2708-rpi.dtsi is the downstream version of bcm2835-rpi.dtsi.

Other changes:
- The led node has moved from /soc/leds to /leds. This is not a problem
  since the label is used to reference it.
- The clk_osc reg property changes from 6 to 3.
- The gpu nodes has their interrupt property set in the base file.
- the clocks label does not point to the /clocks node anymore, but
  points to the cprman node. This is not a problem since the overlays
  that use the clock node refer to it directly: target-path = "/clocks";
- some nodes now have 2 labels since mainline and downstream differs in
  this respect: cprman/clocks, spi0/spi, gpu/vc4.
- some nodes doesn't have an explicit status = "okay" since they're not
  disabled in the base file: watchdog and random.
- gpiomem doesn't need an explicit status = "okay".
- bcm2708-rpi-cm.dts got the hpd-gpios property from bcm2708_common.dtsi,
  it's now set directly in that file.
- bcm2709-rpi-2-b.dts has the timer node moved from /soc/timer to /timer.
- Removed clock-frequency property on the bcm{2709,2710}.dtsi timer nodes.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

BCM270X_DT: Use raspberrypi-power to turn on USB power

Use the raspberrypi-power driver to turn on USB power.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

BCM270X_DT: Add a .dtbo target, use for overlays

Change the filenames and extensions to keep the pre-DDT style of
overlay (<name>-overlay.dtb) distinct from new ones that use a
different style of local fixups (<name>.dtbo), and to match other
platforms.

The RPi firmware uses the DDTK trailer atom to choose which type of
overlay to use for each kernel.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

BCM270X_DT: Don't generate "linux,phandle" props

The EPAPR standard says to use "phandle" properties to store phandles,
rather than the deprecated "linux,phandle" version. By default, dtc
generates both, but adding "-H epapr" causes it to only generate
"phandle"s, saving some space and clutter.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

BCM270X_DT: Add overlay for enc28j60 on SPI2

Works on SPI2 for compute module

BCM270X_DT: Add midi-uart0 overlay

MIDI requires 31.25kbaud, a baudrate unsupported by Linux. The
midi-uart0 overlay configures uart0 (ttyAMA0) to use a fake clock
so that requesting 38.4kbaud actually gets 31.25kbaud.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

BCM270X_DT: Add i2c-sensor overlay

The i2c-sensor overlay is a container for various pressure and
temperature sensors, currently bmp085 and bmp280. The standalone
bmp085_i2c-sensor overlay is now deprecated.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

BCM270X_DT: overlays/*-overlay.dtb -> overlays/*.dtbo (#1752)

We now create overlays as .dtbo files.

build: support for .dtbo files for dtb overlays

Kernel 4.4.6+ on RaspberryPi support .dtbo files for overlays, instead of .dtb.
Patch the kernel, which has faulty rules to generate .dtbo the way yocto does

Signed-off-by: Herve Jourdain <herve.jourdain@neuf.fr>
Signed-off-by: Khem Raj <raj.khem@gmail.com>

BCM270X: Drop position requirement for CMA in VC4 overlay.

No longer necessary since 2aefcd5761,
and will probably let peeople that want to choose a larger CMA
allocation (particularly on pi0/1).

Signed-off-by: Eric Anholt <eric@anholt.net>

BCM270X_DT: RPi Device Tree tidy

Use the upstream sdhost node, add thermal-zones, and factor out some
common elements.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

kbuild: Silence unhelpful DTC warnings

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

BCM270X_DT: DT build rules no longer arch-specific

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:49 +01:00
Phil Elwell
8a64d36852 scripts: Add mkknlimg and knlinfo scripts from tools repo
The Raspberry Pi firmware looks for a trailer on the kernel image to
determine whether it was compiled with Device Tree support enabled.
If the firmware finds a kernel without this trailer, or which has a
trailer indicating that it isn't DT-capable, it disables DT support
and reverts to using ATAGs.

The mkknlimg utility adds that trailer, having first analysed the
image to look for signs of DT support and the kernel version string.

knlinfo displays the contents of the trailer in the given kernel image.

scripts/mkknlimg: Add support for ARCH_BCM2835

Add a new trailer field indicating whether this is an ARCH_BCM2835
build, as opposed to MACH_BCM2708/9. If the loader finds this flag
is set it changes the default base dtb file name from bcm270x...
to bcm283y...

Also update knlinfo to show the status of the field.

scripts/mkknlimg: Improve ARCH_BCM2835 detection

The board support code contains sufficient strings to be able to
distinguish 2708 vs. 2835 builds, so remove the check for
bcm2835-pm-wdt which could exist in either.

Also, since the canned configuration is no longer built in (it's
a module), remove the config string checking.

See: https://github.com/raspberrypi/linux/issues/1157

scripts: Multi-platform support for mkknlimg and knlinfo

The firmware uses tags in the kernel trailer to choose which dtb file
to load. Current firmware loads bcm2835-*.dtb if the '283x' tag is true,
otherwise it loads bcm270*.dtb. This scheme breaks if an image supports
multiple platforms.

This patch adds '270X' and '283X' tags to indicate support for RPi and
upstream platforms, respectively. '283x' (note lower case 'x') is left
for old firmware, and is only set if the image only supports upstream
builds.

scripts/mkknlimg: Append a trailer for all input

Now that the firmware assumes an unsigned kernel is DT-capable, it is
helpful to be able to mark a kernel as being non-DT-capable.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

scripts/knlinfo: Decode DDTK atom

Show the DDTK atom as being a boolean.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

mkknlimg: Retain downstream-kernel detection

With the death of ARCH_BCM2708 and ARCH_BCM2709, a new way is needed to
determine if this is a "downstream" build that wants the firmware to
load a bcm27xx .dtb. The vc_cma driver is used downstream but not
upstream, making vc_cma_init a suitable predicate symbol.

mkknlimg: Find some more downstream-only strings

See: https://github.com/raspberrypi/linux/issues/1920

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

scripts: Update mkknlimg, just in case

With the removal of the vc_cma driver, mkknlimg lost an indication that
the user had built a downstream kernel. Update the script, adding a few
more key strings, in case it is still being used.

Note that mkknlimg is now deprecated, except to tag kernels as upstream
(283x), and thus requiring upstream DTBs.

See: https://github.com/raspberrypi/linux/issues/2239

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:49 +01:00
Noralf Trønnes
e6452be341 firmware: bcm2835: Support ARCH_BCM270x
Support booting without Device Tree.
Turn on USB power.
Load driver early because of lacking support for deferred probing
in many drivers.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

firmware: bcm2835: Don't turn on USB power

The raspberrypi-power driver is now used to turn on USB power.

This partly reverts commit:
firmware: bcm2835: Support ARCH_BCM270x

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
2019-09-17 11:19:49 +01:00
Noralf Trønnes
f1a0d92609 char: broadcom: Add vcio module
Add module for accessing the mailbox property channel through
/dev/vcio. Was previously in bcm2708-vcio.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
2019-09-17 11:19:49 +01:00
popcornmix
4267d004c4 Add Chris Boot's i2c driver
i2c-bcm2708: fixed baudrate

Fixed issue where the wrong CDIV value was set for baudrates below 3815 Hz (for 250MHz bus clock).
In that case the computed CDIV value was more than 0xffff. However the CDIV register width is only 16 bits.
This resulted in incorrect setting of CDIV and higher baudrate than intended.
Example: 3500Hz -> CDIV=0x11704 -> CDIV(16bit)=0x1704 -> 42430Hz
After correction: 3500Hz -> CDIV=0x11704 -> CDIV(16bit)=0xffff -> 3815Hz
The correct baudrate is shown in the log after the cdiv > 0xffff correction.

Perform I2C combined transactions when possible

Perform I2C combined transactions whenever possible, within the
restrictions of the Broadcomm Serial Controller.

Disable DONE interrupt during TA poll

Prevent interrupt from being triggered if poll is missed and transfer
starts and finishes.

i2c: Make combined transactions optional and disabled by default

i2c: bcm2708: add device tree support

Add DT support to driver and add to .dtsi file.
Setup pins in .dts file.
i2c is disabled by default.

Signed-off-by: Noralf Tronnes <notro@tronnes.org>

bcm2708: don't register i2c controllers when using DT

The devices for the i2c controllers are in the Device Tree.
Only register devices when not using DT.

Signed-off-by: Noralf Tronnes <notro@tronnes.org>

I2C: Only register the I2C device for the current board revision

i2c_bcm2708: Fix clock reference counting

Fix grabbing lock from atomic context in i2c driver

2 main changes:
- check for timeouts in the bcm2708_bsc_setup function as indicated by this comment:
      /* poll for transfer start bit (should only take 1-20 polls) */
  This implies that the setup function can now fail so account for this everywhere it's called
- Removed the clk_get_rate call from inside the setup function as it locks a mutex and that's not ok since we call it from under a spin lock.

i2c-bcm2708: When using DT, leave the GPIO setup to pinctrl

i2c-bcm2708: Increase timeouts to allow larger transfers

Use the timeout value provided by the I2C_TIMEOUT ioctl when waiting
for completion. The default timeout is 1 second.

See: https://github.com/raspberrypi/linux/issues/260

i2c-bcm2708/BCM270X_DT: Add support for I2C2

The third I2C bus (I2C2) is normally reserved for HDMI use. Careless
use of this bus can break an attached display - use with caution.

It is recommended to disable accesses by VideoCore by setting
hdmi_ignore_edid=1 or hdmi_edid_file=1 in config.txt.

The interface is disabled by default - enable using the
i2c2_iknowwhatimdoing DT parameter.

bcm2708-spi: Don't use static pin configuration with DT

Also remove superfluous error checking - the SPI framework ensures the
validity of the chip_select value.

i2c-bcm2708: Remove non-DT support

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

Set the BSC_CLKT clock streching timeout to 35ms as per SMBus specs.

Fixes i2c_bcm2708: Write to FIFO correctly - v2 (#1574)

* i2c: fix i2c_bcm2708: Clear FIFO before sending data

Make sure FIFO gets cleared before trying to send
data in case of a repeated start (COMBINED=Y).

* i2c: fix i2c_bcm2708: Only write to FIFO when not full

Check if FIFO can accept data before writing.
To avoid a peripheral read on the last iteration of a loop,
both bcm2708_bsc_fifo_fill and ~drain are changed as well.
2019-09-17 11:19:48 +01:00
popcornmix
5c30e2eb32 Add cpufreq driver
Signed-off-by: popcornmix <popcornmix@gmail.com>

bcm2835-cpufreq: Change licence to GPLv2

Signed-off-by: Eben Upton <eben.upton@broadcom.com>
Signed-off-by: Dom Cobley <dom@raspberrypi.com>
2019-09-17 11:19:48 +01:00
popcornmix
2f13bd7001 Revert "Add SMI NAND driver"
This reverts commit b1521a1f77c32e0754edf41064f8eac574380634.
2019-09-17 11:19:48 +01:00
Luke Wren
bd22f71dfe Add SMI NAND driver
Signed-off-by: Luke Wren <wren6991@gmail.com>
2019-09-17 11:19:48 +01:00
Martin Sperl
e2975886b2 MISC: bcm2835: smi: use clock manager and fix reload issues
Use clock manager instead of self-made clockmanager.

Also fix some error paths that showd up during development
(especially missing release of dma resources on rmmod)

Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
2019-09-17 11:19:48 +01:00
Luke Wren
ca5f2d36af Add SMI driver
Signed-off-by: Luke Wren <wren6991@gmail.com>
2019-09-17 11:19:48 +01:00
Luke Wren
b14192f81b Add /dev/gpiomem device for rootless user GPIO access
Signed-off-by: Luke Wren <luke@raspberrypi.org>

bcm2835-gpiomem: Fix for ARCH_BCM2835 builds

Build on ARCH_BCM2835, and fail to probe if no IO resource.

See: https://github.com/raspberrypi/linux/issues/1154
2019-09-17 11:19:48 +01:00
Tim Gover
c61c713a58 vcsm: VideoCore shared memory service for BCM2835
Add experimental support for the VideoCore shared memory service.
This allows user processes to allocate memory from VideoCore's
GPU relocatable heap and mmap the buffers. Additionally, the memory
handles can passed to other VideoCore services such as MMAL, OpenMax
and DispmanX

TODO
* This driver was originally released for BCM28155 which has a different
  cache architecture to BCM2835. Consequently, in this release only
  uncached mappings are supported. However, there's no fundamental
  reason which cached mappings cannot be support or BCM2835
* More refactoring is required to remove the typedefs.
* Re-enable the some of the commented out debug-fs statistics which were
  disabled when migrating code from proc-fs.
* There's a lot of code to support sharing of VCSM in order to support
  Android. This could probably done more cleanly or perhaps just
  removed.

Signed-off-by: Tim Gover <timgover@gmail.com>

config: Disable VC_SM for now to fix hang with cutdown kernel

vcsm: Use boolean as it cannot be built as module

On building the bcm_vc_sm as a module we get the following error:

v7_dma_flush_range and do_munmap are undefined in vc-sm.ko.

Fix by making it not an option to build as module

vcsm: Add ioctl for custom cache flushing

vc-sm: Move headers out of arch directory

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

vcsm: Treat EBUSY as success rather than SIGBUS

Currently if two cores access the same page concurrently one will return VM_FAULT_NOPAGE
and the other VM_FAULT_SIGBUS crashing the user code.

Also report when mapping fails.

Signed-off-by: popcornmix <popcornmix@gmail.com>

vcsm: Provide new ioctl to clean/invalidate a 2D block

vcsm: Convert to loading via device tree.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

VCSM: New option to import a DMABUF for VPU use

Takes a dmabuf, and then calls over to the VPU to wrap
it into a suitable handle.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

vcsm: fix multi-platform build

vcsm: add macros for cache functions

vcsm: use dma APIs for cache functions

* Will handle multi-platform builds

vcsm: Fix up macros to avoid breaking numbers used by existing apps

vcsm: Define cache operation constants in user header

Without this change, users have to use raw values (1, 2, 3) to specify
cache operation.

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Support for finding user/vc handle in memory pool

vmcs_sm_{usr,vc}_handle_from_pid_and_address() were failing to find
handle if specified user pointer is not exactly the one that the memory
locking call returned even if the pointer is in range of map/resource.
So fixed the functions to match the range.

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Unify cache manipulating functions

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Fix obscure conditions

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Fix memory leaking on clean_invalid2 ioctl handler

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Describe the use of cache operation constants

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Fix obscure conditions again

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Add no-op cache operation constant

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Revert to do page-table-walk-based cache manipulating on some ioctl calls

On FLUSH, INVALID, CLEAN_INVALID ioctl calls, cache operations based on
page table walk were used in case that the buffer of the cache is not
pinned.  So reverted to do page-table-based cache manipulating.

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Define cache operation constants in user header

Without this change, users have to use raw values (1, 2, 3) to specify
cache operation.

Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>

vcsm: Updates for changed vchiq interface

vcsm: Fix an NULL dereference in the import_dmabuf error path

resource was dereferenced even though it was NULL.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

vcsm: Use struct service_creation

vcsm: Fix makefile include on out-of-tree builds

The vc_sm module tries to include the 'fs' directory from the
$(srctree). $(srctree) is already provided by the build system, and
causes the include path to be duplicated.

With -Werror this fails to compile.

Remove the unnecessary variable.

Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

vcsm: Remove set but unused variable

The 'success' variable is set by the call to vchi_service_close() but never checked.
Remove it, keeping the call in place.

Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

vcsm: Reduce scope of local functions

The functions:

  vc_vchi_sm_send_msg
  vc_sm_ioctl_alloc
  vc_sm_ioctl_alloc_share
  vc_sm_ioctl_import_dmabuf

Are declared without a prototype. They are not used outside of this
module, thus - convert them to static functions.

Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
2019-09-17 11:19:48 +01:00
popcornmix
cd866377db vc_mem: Add vc_mem driver for querying firmware memory addresses
Signed-off-by: popcornmix <popcornmix@gmail.com>

BCM270x: Move vc_mem

Make the vc_mem module available for ARCH_BCM2835 by moving it.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
2019-09-17 11:19:48 +01:00
Phil Elwell
d7a881187e Adding bcm2835-sdhost driver, and an overlay to enable it
BCM2835 has two SD card interfaces. This driver uses the other one.

bcm2835-sdhost: Error handling fix, and code clarification

bcm2835-sdhost: Adding overclocking option

Allow a different clock speed to be substitued for a requested 50MHz.
This option is exposed using the "overclock_50" DT parameter.
Note that the sdhost interface is restricted to integer divisions of
core_freq, and the highest sensible option for a core_freq of 250MHz
is 84 (250/3 = 83.3MHz), the next being 125 (250/2) which is much too
high.

Use at your own risk.

bcm2835-sdhost: Round up the overclock, so 62 works for 62.5Mhz

Also only warn once for each overclock setting.

bcm2835-sdhost: Improve error handling and recovery

1) Expose the hw_reset method to the MMC framework, removing many
   internal calls by the driver.

2) Reduce overclock setting on error.

3) Increase timeout to cope with high capacity cards.

4) Add properties and parameters to control pio_limit and debug.

5) Reduce messages at probe time.

bcm2835-sdhost: Further improve overclock back-off

bcm2835-sdhost: Clear HBLC for PIO mode

Also update pio_limit default in overlay README.

bcm2835-sdhost: Add the ERASE capability

See: https://github.com/raspberrypi/linux/issues/1076

bcm2835-sdhost: Ignore CRC7 for MMC CMD1

It seems that the sdhost interface returns CRC7 errors for CMD1,
which is the MMC-specific SEND_OP_COND. Returning these errors to
the MMC layer causes a downward spiral, but ignoring them seems
to be harmless.

bcm2835-mmc/sdhost: Remove ARCH_BCM2835 differences

The bcm2835-mmc driver (and -sdhost driver that copied from it)
contains code to handle SDIO interrupts in a threaded interrupt
handler rather than waking the MMC framework thread. The change
follows a patch from Russell King that adds the facility as the
preferred way of working.

However, the new code path is only present in ARCH_BCM2835
builds, which I have taken to be a way of testing the waters
rather than making the change across the board; I can't see
any technical reason why it wouldn't be enabled for MACH_BCM270X
builds. So this patch standardises on the ARCH_BCM2835 code,
removing the old code paths.

bcm2835-sdhost: Don't log timeout errors unless debug=1

The MMC card-discovery process generates timeouts. This is
expected behaviour, so reporting it to the user serves no purpose.
Suppress the reporting of timeout errors unless the debug flag
is on.

bcm2835-sdhost: Add workaround for odd behaviour on some cards

For reasons not understood, the sdhost driver fails when reading
sectors very near the end of some SD cards. The problem could
be related to the similar issue that reading the final sector
of any card as part of a multiple read never completes, and the
workaround is an extension of the mechanism introduced to solve
that problem which ensures those sectors are always read singly.

bcm2835-sdhost: Major revision

This is a significant revision of the bcm2835-sdhost driver. It
improves on the original in a number of ways:

1) Through the use of CMD23 for reads it appears to avoid problems
   reading some sectors on certain high speed cards.
2) Better atomicity to prevent crashes.
3) Higher performance.
4) Activity logging included, for easier diagnosis in the event
   of a problem.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-sdhost: Restore ATOMIC flag to PIO sg mapping

Allocation problems have been seen in a wireless driver, and
this is the only change which might have been responsible.

SQUASH: bcm2835-sdhost: Only claim one DMA channel

With both MMC controllers enabled there are few DMA channels left. The
bcm2835-sdhost driver only uses DMA in one direction at a time, so it
doesn't need to claim two channels.

See: https://github.com/raspberrypi/linux/issues/1327

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-sdhost: Workaround for "slow" sectors

Some cards have been seen to cause timeouts after certain sectors are
read. This workaround enforces a minimum delay between the stop after
reading one of those sectors and a subsequent data command.

Using CMD23 (SET_BLOCK_COUNT) avoids this problem, so good cards will
not be penalised by this workaround.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-sdhost: Firmware manages the clock divisor

The bcm2835-sdhost driver hands control of the CDIV clock divisor
register to matching firmware, allowing it to adjust to a changing
core clock. This removes the need to use the performance governor or
to enable io_is_busy on the on-demand governor in order to get the
best SD performance.

N.B. As SD clocks must be an integer divisor of the core clock, it is
possible that the SD clock for "turbo" mode can be different (even
lower) than "normal" mode.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-sdhost: Reset the clock in task context

Since reprogramming the clock can now involve a round-trip to the
firmware it must not be done at atomic context, and a tasklet
is not a task.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-sdhost: Don't exit cmd wait loop on error

The FAIL flag can be set in the CMD register before command processing
is complete, leading to spurious "failed to complete" errors. This has
the effect of promoting harmless CRC7 errors during CMD1 processing
into errors that can delay and even prevent booting.

Also:
1) Convert the last KERN_ERROR message in the register dumping to
   KERN_INFO.
2) Remove an unnecessary reset call from  bcm2835_sdhost_add_host.

See: https://github.com/raspberrypi/linux/pull/1492

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-sdhost: mmc_card_blockaddr fix

Get the definition of mmc_card_blockaddr from drivers/mmc/core/card.h.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-sdhost: New timer API

mmc: bcm2835-sdhost: Support underclocking

Support underclocking of the SD bus in two ways:
1. using the max-frequency DT property (which currently has no DT
   parameter), and
2. using the exiting sd_overclock parameter.

The two methods differ slightly - in the former the MMC subsystem is
aware of the underclocking, while in the latter it isn't - but the
end results should be the same.

See: https://github.com/raspberrypi/linux/issues/2350

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

mmc: bcm2835-sdhost: Add include

highmem.h (needed for kmap_atomic) is pulled in by one of the other
include files, but only with some CONFIG settings. Make the inclusion
explicit to cater for cases where the CONFIG setting is absent.

See: https://github.com/raspberrypi/linux/issues/2366

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

mmc/bcm2835-sdhost: Recover from MMC_SEND_EXT_CSD

If the user issues an "mmc extcsd read", the SD controller receives
what it thinks is a SEND_IF_COND command with an unexpected data block.
The resulting operations leave the FSM stuck in READWAIT, a state which
persists until the MMC framework resets the controller, by which point
the root filesystem is likely to have been unmounted.

A less heavyweight solution is to detect the condition and nudge the
FSM by asserting the (self-clearing) FORCE_DATA_MODE bit.

N.B. This workaround was essentially discovered by accident and without
a full understanding the inner workings of the controller, so it is
fortunate that the "fix" only modifies error paths.

See: https://github.com/raspberrypi/linux/issues/2728

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

mmc: bcm2835-sdhost: Fix warnings on arm64

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-sdhost: Allow for sg entries that cross pages

The dma_complete handling code calculates a virtual address for a page
then adds an offset, but if the offset is more than a page and HIGHMEM
is in use then the summed address could be in an unmapped (or just
incorrect) page.

The upstream SDHOST driver allows for this possibility - copy the code
that does so.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:48 +01:00
gellert
b19858d151 MMC: added alternative MMC driver
mmc: Disable CMD23 transfers on all cards

Pending wire-level investigation of these types of transfers
and associated errors on bcm2835-mmc, disable for now. Fallback of
CMD18/CMD25 transfers will be used automatically by the MMC layer.

Reported/Tested-by: Gellert Weisz <gellert@raspberrypi.org>

mmc: bcm2835-mmc: enable DT support for all architectures

Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now.
Enable Device Tree support for all architectures.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

mmc: bcm2835-mmc: fix probe error handling

Probe error handling is broken in several places.
Simplify error handling by using device managed functions.
Replace pr_{err,info} with dev_{err,info}.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

bcm2835-mmc: Add locks when accessing sdhost registers

bcm2835-mmc: Add range of debug options for slowing things down

bcm2835-mmc: Add option to disable some delays

bcm2835-mmc: Add option to disable MMC_QUIRK_BLK_NO_CMD23

bcm2835-mmc: Default to disabling MMC_QUIRK_BLK_NO_CMD23

bcm2835-mmc: Adding overclocking option

Allow a different clock speed to be substitued for a requested 50MHz.
This option is exposed using the "overclock_50" DT parameter.
Note that the mmc interface is restricted to EVEN integer divisions of
250MHz, and the highest sensible option is 63 (250/4 = 62.5), the
next being 125 (250/2) which is much too high.

Use at your own risk.

bcm2835-mmc: Round up the overclock, so 62 works for 62.5Mhz

Also only warn once for each overclock setting.

mmc: bcm2835-mmc: Make available on ARCH_BCM2835

Make the bcm2835-mmc driver available for use on ARCH_BCM2835.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

BCM270x_DT: add bcm2835-mmc entry

Add Device Tree entry for bcm2835-mmc.
In non-DT mode, don't add the device in the board file.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

bcm2835-mmc: Don't overwrite MMC capabilities from DT

bcm2835-mmc: Don't override bus width capabilities from devicetree

Take out the force setting of the MMC_CAP_4_BIT_DATA host capability
so that the result read from devicetree via mmc_of_parse() is
preserved.

bcm2835-mmc: Only claim one DMA channel

With both MMC controllers enabled there are few DMA channels left. The
bcm2835-mmc driver only uses DMA in one direction at a time, so it
doesn't need to claim two channels.

See: https://github.com/raspberrypi/linux/issues/1327

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-mmc: New timer API

mmc: bcm2835-mmc: Support underclocking

Support underclocking of the SD bus using the max-frequency DT property
(which currently has no DT parameter). The sd_overclock parameter
already provides another way to achieve the same thing which should be
equivalent in end result, but it is a bug not to support max-frequency
as well.

See: https://github.com/raspberrypi/linux/issues/2350

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

mmc/bcm2835: Recover from MMC_SEND_EXT_CSD

If the user issues an "mmc extcsd read", the SD controller receives
what it thinks is a SEND_IF_COND command with an unexpected data block.
The resulting operations leave the FSM stuck in READWAIT, a state which
persists until the MMC framework resets the controller, by which point
the root filesystem is likely to have been unmounted.

A less heavyweight solution is to detect the condition and nudge the
FSM by asserting the (self-clearing) FORCE_DATA_MODE bit.

N.B. This workaround was essentially discovered by accident and without
a full understanding the inner workings of the controller, so it is
fortunate that the "fix" only modifies error paths.

See: https://github.com/raspberrypi/linux/issues/2728

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

bcm2835-mmc: Fix DMA channel leak

The BCM2835 MMC host driver requests a DMA channel on probe but neglects
to release the channel in the probe error path and on driver unbind.

I'm seeing this happen on every boot of the Compute Module 3: On first
driver probe, DMA channel 2 is allocated and then leaked with a "could
not get clk, deferring probe" message. On second driver probe, channel 4
is allocated.

Fix it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>

bcm2835-mmc: Fix struct mmc_host leak on probe

The BCM2835 MMC host driver requests the bus address of the host's
register map on probe.  If that fails, the driver leaks the struct
mmc_host allocated earlier.

Fix it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>

bcm2835-mmc: Fix duplicate free_irq() on remove

The BCM2835 MMC host driver requests its interrupt as a device-managed
resource, so the interrupt is automatically freed after the driver is
unbound.

However on driver unbind, bcm2835_mmc_remove() frees the interrupt
explicitly to avoid invocation of the interrupt handler after driver
structures have been torn down.

The interrupt is thus freed twice, leading to a WARN splat in
__free_irq().  Fix by not requesting the interrupt as a device-managed
resource.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>

bcm2835-mmc: Handle mmc_add_host() errors

The BCM2835 MMC host driver calls mmc_add_host() but doesn't check its
return value.  Errors occurring in that function are therefore not
handled.  Fix it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>

bcm2835-mmc: Deduplicate reset of driver data on remove

The BCM2835 MMC host driver sets the device's driver data pointer to
NULL on ->remove() even though the driver core subsequently does the
same in __device_release_driver().  Drop the duplicate assignment.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
2019-09-17 11:19:48 +01:00
Florian Meier
e294409342 dmaengine: Add support for BCM2708
Add support for DMA controller of BCM2708 as used in the Raspberry Pi.
Currently it only supports cyclic DMA.

Signed-off-by: Florian Meier <florian.meier@koalo.de>

dmaengine: expand functionality by supporting scatter/gather transfers sdhci-bcm2708 and dma.c: fix for LITE channels

DMA: fix cyclic LITE length overflow bug

dmaengine: bcm2708: Remove chancnt affectations

Mirror bcm2835-dma.c commit 9eba5536a7:
chancnt is already filled by dma_async_device_register, which uses the channel
list to know how much channels there is.

Since it's already filled, we can safely remove it from the drivers' probe
function.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dmaengine: bcm2708: overwrite dreq only if it is not set

dreq is set when the DMA channel is fetched from Device Tree.
slave_id is set using dmaengine_slave_config().
Only overwrite dreq with slave_id if it is not set.

dreq/slave_id in the cyclic DMA case is not touched, because I don't
have hardware to test with.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dmaengine: bcm2708: do device registration in the board file

Don't register the device in the driver. Do it in the board file.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dmaengine: bcm2708: don't restrict DT support to ARCH_BCM2835

Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now.
Add Device Tree support to the non ARCH_BCM2835 case.
Use the same driver name regardless of architecture.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

BCM270x_DT: add bcm2835-dma entry

Add Device Tree entry for bcm2835-dma.
The entry doesn't contain any resources since they are handled
by the arch/arm/mach-bcm270x/dma.c driver.
In non-DT mode, don't add the device in the board file.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

bcm2708-dmaengine: Add debug options

BCM270x: Add memory and irq resources to dmaengine device and DT

Prepare for merging of the legacy DMA API arch driver dma.c
with bcm2708-dmaengine by adding memory and irq resources both
to platform file device and Device Tree node.
Don't use BCM_DMAMAN_DRIVER_NAME so we don't have to include mach/dma.h

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dmaengine: bcm2708: Merge with arch dma.c driver and disable dma.c

Merge the legacy DMA API driver with bcm2708-dmaengine.
This is done so we can use bcm2708_fb on ARCH_BCM2835 (mailbox
driver is also needed).

Changes to the dma.c code:
- Use BIT() macro.
- Cutdown some comments to one line.
- Add mutex to vc_dmaman and use this, since the dev lock is locked
  during probing of the engine part.
- Add global g_dmaman variable since drvdata is used by the engine part.
- Restructure for readability:
  vc_dmaman_chan_alloc()
  vc_dmaman_chan_free()
  bcm_dma_chan_free()
- Restructure bcm_dma_chan_alloc() to simplify error handling.
- Use device irq resources instead of hardcoded bcm_dma_irqs table.
- Remove dev_dmaman_register() and code it directly.
- Remove dev_dmaman_deregister() and code it directly.
- Simplify bcm_dmaman_probe() using devm_* functions.
- Get dmachans from DT if available.
- Keep 'dma.dmachans' module argument name for backwards compatibility.

Make it available on ARCH_BCM2835 as well.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dmaengine: bcm2708: set residue_granularity field

bcm2708-dmaengine supports residue reporting at burst level
but didn't report this via the residue_granularity field.

Without this field set properly we get playback issues with I2S cards.

dmaengine: bcm2708-dmaengine: Fix memory leak when stopping a running transfer

bcm2708-dmaengine: Use more DMA channels (but not 12)

1) Only the bcm2708_fb drivers uses the legacy DMA API, and
it requires a BULK-capable channel, so all other types
(FAST, NORMAL and LITE) can be made available to the regular
DMA API.

2) DMA channels 11-14 share an interrupt. The driver can't
handle this, so don't use channels 12-14 (12 was used, probably
because it appears to have an interrupt, but in reality that
interrupt is for activity on ANY channel). This may explain
a lockup encountered when running out of DMA channels.

The combined effect of this patch is to leave 7 DMA channels
available + channel 0 for bcm2708_fb via the legacy API.

See: https://github.com/raspberrypi/linux/issues/1110
     https://github.com/raspberrypi/linux/issues/1108

dmaengine: bcm2708: Make legacy API available for bcm2835-dma

bcm2708_fb uses the legacy DMA API, so in order to start using
bcm2835-dma, bcm2835-dma has to support the legacy API. Make this
possible by exporting bcm_dmaman_probe() and bcm_dmaman_remove().

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dmaengine: bcm2708: Change DT compatible string

Both bcm2835-dma and bcm2708-dmaengine have the same compatible string.
So change compatible to "brcm,bcm2708-dma".

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dmaengine: bcm2708: Remove driver but keep legacy API

Dropping non-DT support means we don't need this driver,
but we still need the legacy DMA API.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

bcm2708-dmaengine - Fix arm64 portability/build issues

dma-bcm2708: Fix module compilation of CONFIG_DMA_BCM2708

bcm2708-dmaengine.c defines functions like bcm_dma_start which are
defined as well in dma-bcm2708.h as inline versions when
CONFIG_DMA_BCM2708 is not defined. This works fine when
CONFIG_DMA_BCM2708 is built in, but when it is selected as module build
fails with redefinition errors because in the build system when
CONFIG_DMA_BCM2708 is selected as module, the macro becomes
CONFIG_DMA_BCM2708_MODULE.

This patch makes the header use CONFIG_DMA_BCM2708_MODULE too when
available.

Fixes https://github.com/raspberrypi/linux/issues/2056

Signed-off-by: Andrei Gherzan <andrei@gherzan.com>
2019-09-17 11:19:48 +01:00
Harm Hanemaaijer
e23ccc64ce Speed up console framebuffer imageblit function
Especially on platforms with a slower CPU but a relatively high
framebuffer fill bandwidth, like current ARM devices, the existing
console monochrome imageblit function used to draw console text is
suboptimal for common pixel depths such as 16bpp and 32bpp. The existing
code is quite general and can deal with several pixel depths. By creating
special case functions for 16bpp and 32bpp, by far the most common pixel
formats used on modern systems, a significant speed-up is attained
which can be readily felt on ARM-based devices like the Raspberry Pi
and the Allwinner platform, but should help any platform using the
fb layer.

The special case functions allow constant folding, eliminating a number
of instructions including divide operations, and allow the use of an
unrolled loop, eliminating instructions with a variable shift size,
reducing source memory access instructions, and eliminating excessive
branching. These unrolled loops also allow much better code optimization
by the C compiler. The code that selects which optimized variant is used
is also simplified, eliminating integer divide instructions.

The speed-up, measured by timing 'cat file.txt' in the console, varies
between 40% and 70%, when testing on the Raspberry Pi and Allwinner
ARM-based platforms, depending on font size and the pixel depth, with
the greater benefit for 32bpp.

Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
2019-09-17 11:19:47 +01:00
Siarhei Siamashka
6430d8468b fbdev: add FBIOCOPYAREA ioctl
Based on the patch authored by Ali Gholami Rudi at
    https://lkml.org/lkml/2009/7/13/153

Provide an ioctl for userspace applications, but only if this operation
is hardware accelerated (otherwide it does not make any sense).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

bcm2708_fb: Add ioctl for reading gpu memory through dma

video: bcm2708_fb: Add compat_ioctl support.

When using a 64 bit kernel with 32 bit userspace we need
compat ioctl handling for FBIODMACOPY as one of the
parameters is a pointer.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:47 +01:00
popcornmix
26e14d0592 bcm2708 framebuffer driver
Signed-off-by: popcornmix <popcornmix@gmail.com>

bcm2708_fb : Implement blanking support using the mailbox property interface

bcm2708_fb: Add pan and vsync controls

bcm2708_fb: DMA acceleration for fb_copyarea

Based on http://www.raspberrypi.org/phpBB3/viewtopic.php?p=62425#p62425
Also used Simon's dmaer_master module as a reference for tweaking DMA
settings for better performance.

For now busylooping only. IRQ support might be added later.
With non-overclocked Raspberry Pi, the performance is ~360 MB/s
for simple copy or ~260 MB/s for two-pass copy (used when dragging
windows to the right).

In the case of using DMA channel 0, the performance improves
to ~440 MB/s.

For comparison, VFP optimized CPU copy can only do ~114 MB/s in
the same conditions (hindered by reading uncached source buffer).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

bcm2708_fb: report number of dma copies

Add a counter (exported via debugfs) reporting the
number of dma copies that the framebuffer driver
has done, in order to help evaluate different
optimization strategies.

Signed-off-by: Luke Diamand <luked@broadcom.com>

bcm2708_fb: use IRQ for DMA copies

The copyarea ioctl() uses DMA to speed things along. This
was busy-waiting for completion. This change supports using
an interrupt instead for larger transfers. For small
transfers, busy-waiting is still likely to be faster.

Signed-off-by: Luke Diamand <luke@diamand.org>

bcm2708: Make ioctl logging quieter

video: fbdev: bcm2708_fb: Don't panic on error

No need to panic the kernel if the video driver fails.
Just print a message and return an error.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

fbdev: bcm2708_fb: Add ARCH_BCM2835 support

Add Device Tree support.
Pass the device to dma_alloc_coherent() in order to get the
correct bus address on ARCH_BCM2835.
Use the new DMA legacy API header file.
Including <mach/platform.h> is not necessary.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

BCM270x_DT: Add bcm2708-fb device

Add bcm2708-fb to Device Tree and don't add the
platform device when booting in DT mode.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

Cleanup of bcm2708_fb file to kernel coding standards

Some minor change to function - remove a use of
in_atomic, plus replacing various debug messages
that manually specify the function name with
("%s",.__func__)

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>

video: bcm2708_fb: Try allocating on the ARM and passing to VPU

Currently the VPU allocates the contiguous buffer for the
framebuffer.
Try an alternate path first where we use dma_alloc_coherent
and pass the buffer to the VPU. Should the VPU firmware not
support that path, then free the buffer and revert to the
old behaviour of using the VPU allocation.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
2019-09-17 11:19:47 +01:00
popcornmix
678be9f7f4 Add dwc_otg driver
Signed-off-by: popcornmix <popcornmix@gmail.com>

usb: dwc: fix lockdep false positive

Signed-off-by: Kari Suvanto <karis79@gmail.com>

usb: dwc: fix inconsistent lock state

Signed-off-by: Kari Suvanto <karis79@gmail.com>

Add FIQ patch to dwc_otg driver. Enable with dwc_otg.fiq_fix_enable=1. Should give about 10% more ARM performance.
Thanks to Gordon and Costas

Avoid dynamic memory allocation for channel lock in USB driver. Thanks ddv2005.

Add NAK holdoff scheme. Enabled by default, disable with dwc_otg.nak_holdoff_enable=0. Thanks gsh

Make sure we wait for the reset to finish

dwc_otg: fix bug in dwc_otg_hcd.c resulting in silent kernel
	 memory corruption, escalating to OOPS under high USB load.

dwc_otg: Fix unsafe access of QTD during URB enqueue

In dwc_otg_hcd_urb_enqueue during qtd creation, it was possible that the
transaction could complete almost immediately after the qtd was assigned
to a host channel during URB enqueue, which meant the qtd pointer was no
longer valid having been completed and removed. Usually, this resulted in
an OOPS during URB submission. By predetermining whether transactions
need to be queued or not, this unsafe pointer access is avoided.

This bug was only evident on the Pi model A where a device was attached
that had no periodic endpoints (e.g. USB pendrive or some wlan devices).

dwc_otg: Fix incorrect URB allocation error handling

If the memory allocation for a dwc_otg_urb failed, the kernel would OOPS
because for some reason a member of the *unallocated* struct was set to
zero. Error handling changed to fail correctly.

dwc_otg: fix potential use-after-free case in interrupt handler

If a transaction had previously aborted, certain interrupts are
enabled to track error counts and reset where necessary. On IN
endpoints the host generates an ACK interrupt near-simultaneously
with completion of transfer. In the case where this transfer had
previously had an error, this results in a use-after-free on
the QTD memory space with a 1-byte length being overwritten to
0x00.

dwc_otg: add handling of SPLIT transaction data toggle errors

Previously a data toggle error on packets from a USB1.1 device behind
a TT would result in the Pi locking up as the driver never handled
the associated interrupt. Patch adds basic retry mechanism and
interrupt acknowledgement to cater for either a chance toggle error or
for devices that have a broken initial toggle state (FT8U232/FT232BM).

dwc_otg: implement tasklet for returning URBs to usbcore hcd layer

The dwc_otg driver interrupt handler for transfer completion will spend
a very long time with interrupts disabled when a URB is completed -
this is because usb_hcd_giveback_urb is called from within the handler
which for a USB device driver with complicated processing (e.g. webcam)
will take an exorbitant amount of time to complete. This results in
missed completion interrupts for other USB packets which lead to them
being dropped due to microframe overruns.

This patch splits returning the URB to the usb hcd layer into a
high-priority tasklet. This will have most benefit for isochronous IN
transfers but will also have incidental benefit where multiple periodic
devices are active at once.

dwc_otg: fix NAK holdoff and allow on split transactions only

This corrects a bug where if a single active non-periodic endpoint
had at least one transaction in its qh, on frnum == MAX_FRNUM the qh
would get skipped and never get queued again. This would result in
a silent device until error detection (automatic or otherwise) would
either reset the device or flush and requeue the URBs.

Additionally the NAK holdoff was enabled for all transactions - this
would potentially stall a HS endpoint for 1ms if a previous error state
enabled this interrupt and the next response was a NAK. Fix so that
only split transactions get held off.

dwc_otg: Call usb_hcd_unlink_urb_from_ep with lock held in completion handler

usb_hcd_unlink_urb_from_ep must be called with the HCD lock held.  Calling it
asynchronously in the tasklet was not safe (regression in
c4564d4a1a).

This change unlinks it from the endpoint prior to queueing it for handling in
the tasklet, and also adds a check to ensure the urb is OK to be unlinked
before doing so.

NULL pointer dereference kernel oopses had been observed in usb_hcd_giveback_urb
when a USB device was unplugged/replugged during data transfer.  This effect
was reproduced using automated USB port power control, hundreds of replug
events were performed during active transfers to confirm that the problem was
eliminated.

USB fix using a FIQ to implement split transactions

This commit adds a FIQ implementaion that schedules
the split transactions using a FIQ so we don't get
held off by the interrupt latency of Linux

dwc_otg: fix device attributes and avoid kernel warnings on boot

dcw_otg: avoid logging function that can cause panics

See: https://github.com/raspberrypi/firmware/issues/21
Thanks to cleverca22 for fix

dwc_otg: mask correct interrupts after transaction error recovery

The dwc_otg driver will unmask certain interrupts on a transaction
that previously halted in the error state in order to reset the
QTD error count. The various fine-grained interrupt handlers do not
consider that other interrupts besides themselves were unmasked.

By disabling the two other interrupts only ever enabled in DMA mode
for this purpose, we can avoid unnecessary function calls in the
IRQ handler. This will also prevent an unneccesary FIQ interrupt
from being generated if the FIQ is enabled.

dwc_otg: fiq: prevent FIQ thrash and incorrect state passing to IRQ

In the case of a transaction to a device that had previously aborted
due to an error, several interrupts are enabled to reset the error
count when a device responds. This has the side-effect of making the
FIQ thrash because the hardware will generate multiple instances of
a NAK on an IN bulk/interrupt endpoint and multiple instances of ACK
on an OUT bulk/interrupt endpoint. Make the FIQ mask and clear the
associated interrupts.

Additionally, on non-split transactions make sure that only unmasked
interrupts are cleared. This caused a hard-to-trigger but serious
race condition when you had the combination of an endpoint awaiting
error recovery and a transaction completed on an endpoint - due to
the sequencing and timing of interrupts generated by the dwc_otg core,
it was possible to confuse the IRQ handler.

Fix function tracing

dwc_otg: whitespace cleanup in dwc_otg_urb_enqueue

dwc_otg: prevent OOPSes during device disconnects

The dwc_otg_urb_enqueue function is thread-unsafe. In particular the
access of urb->hcpriv, usb_hcd_link_urb_to_ep, dwc_otg_urb->qtd and
friends does not occur within a critical section and so if a device
was unplugged during activity there was a high chance that the
usbcore hub_thread would try to disable the endpoint with partially-
formed entries in the URB queue. This would result in BUG() or null
pointer dereferences.

Fix so that access of urb->hcpriv, enqueuing to the hardware and
adding to usbcore endpoint URB lists is contained within a single
critical section.

dwc_otg: prevent BUG() in TT allocation if hub address is > 16

A fixed-size array is used to track TT allocation. This was
previously set to 16 which caused a crash because
dwc_otg_hcd_allocate_port would read past the end of the array.

This was hit if a hub was plugged in which enumerated as addr > 16,
due to previous device resets or unplugs.

Also add #ifdef FIQ_DEBUG around hcd->hub_port_alloc[], which grows
to a large size if 128 hub addresses are supported. This field is
for debug only for tracking which frame an allocate happened in.

dwc_otg: make channel halts with unknown state less damaging

If the IRQ received a channel halt interrupt through the FIQ
with no other bits set, the IRQ would not release the host
channel and never complete the URB.

Add catchall handling to treat as a transaction error and retry.

dwc_otg: fiq_split: use TTs with more granularity

This fixes certain issues with split transaction scheduling.

- Isochronous multi-packet OUT transactions now hog the TT until
  they are completed - this prevents hubs aborting transactions
  if they get a periodic start-split out-of-order
- Don't perform TT allocation on non-periodic endpoints - this
  allows simultaneous use of the TT's bulk/control and periodic
  transaction buffers

This commit will mainly affect USB audio playback.

dwc_otg: fix potential sleep while atomic during urb enqueue

Fixes a regression introduced with eb1b482a. Kmalloc called from
dwc_otg_hcd_qtd_add / dwc_otg_hcd_qtd_create did not always have
the GPF_ATOMIC flag set. Force this flag when inside the larger
critical section.

dwc_otg: make fiq_split_enable imply fiq_fix_enable

Failing to set up the FIQ correctly would result in
"IRQ 32: nobody cared" errors in dmesg.

dwc_otg: prevent crashes on host port disconnects

Fix several issues resulting in crashes or inconsistent state
if a Model A root port was disconnected.

- Clean up queue heads properly in kill_urbs_in_qh_list by
  removing the empty QHs from the schedule lists
- Set the halt status properly to prevent IRQ handlers from
  using freed memory
- Add fiq_split related cleanup for saved registers
- Make microframe scheduling reclaim host channels if
  active during a disconnect
- Abort URBs with -ESHUTDOWN status response, informing
  device drivers so they respond in a more correct fashion
  and don't try to resubmit URBs
- Prevent IRQ handlers from attempting to handle channel
  interrupts if the associated URB was dequeued (and the
  driver state was cleared)

dwc_otg: prevent leaking URBs during enqueue

A dwc_otg_urb would get leaked if the HCD enqueue function
failed for any reason. Free the URB at the appropriate points.

dwc_otg: Enable NAK holdoff for control split transactions

Certain low-speed devices take a very long time to complete a
data or status stage of a control transaction, producing NAK
responses until they complete internal processing - the USB2.0
spec limit is up to 500mS. This causes the same type of interrupt
storm as seen with USB-serial dongles prior to c8edb238.

In certain circumstances, usually while booting, this interrupt
storm could cause SD card timeouts.

dwc_otg: Fix for occasional lockup on boot when doing a USB reset

dwc_otg: Don't issue traffic to LS devices in FS mode

Issuing low-speed packets when the root port is in full-speed mode
causes the root port to stop responding. Explicitly fail when
enqueuing URBs to a LS endpoint on a FS bus.

Fix ARM architecture issue with local_irq_restore()

If local_fiq_enable() is called before a local_irq_restore(flags) where
the flags variable has the F bit set, the FIQ will be erroneously disabled.

Fixup arch_local_irq_restore to avoid trampling the F bit in CPSR.

Also fix some of the hacks previously implemented for previous dwc_otg
incarnations.

dwc_otg: fiq_fsm: Base commit for driver rewrite

This commit removes the previous FIQ fixes entirely and adds fiq_fsm.

This rewrite features much more complete support for split transactions
and takes into account several OTG hardware bugs. High-speed
isochronous transactions are also capable of being performed by fiq_fsm.

All driver options have been removed and replaced with:
  - dwc_otg.fiq_enable (bool)
  - dwc_otg.fiq_fsm_enable (bool)
  - dwc_otg.fiq_fsm_mask (bitmask)
  - dwc_otg.nak_holdoff (unsigned int)

Defaults are specified such that fiq_fsm behaves similarly to the
previously implemented FIQ fixes.

fiq_fsm: Push error recovery into the FIQ when fiq_fsm is used

If the transfer associated with a QTD failed due to a bus error, the HCD
would retry the transfer up to 3 times (implementing the USB2.0
three-strikes retry in software).

Due to the masking mechanism used by fiq_fsm, it is only possible to pass
a single interrupt through to the HCD per-transfer.

In this instance host channels would fall off the radar because the error
reset would function, but the subsequent channel halt would be lost.

Push the error count reset into the FIQ handler.

fiq_fsm: Implement timeout mechanism

For full-speed endpoints with a large packet size, interrupt latency
runs the risk of the FIQ starting a transaction too late in a full-speed
frame. If the device is still transmitting data when EOF2 for the
downstream frame occurs, the hub will disable the port. This change is
not reflected in the hub status endpoint and the device becomes
unresponsive.

Prevent high-bandwidth transactions from being started too late in a
frame. The mechanism is not guaranteed: a combination of bit stuffing
and hub latency may still result in a device overrunning.

fiq_fsm: fix bounce buffer utilisation for Isochronous OUT

Multi-packet isochronous OUT transactions were subject to a few bounday
bugs. Fix them.

Audio playback is now much more robust: however, an issue stands with
devices that have adaptive sinks - ALSA plays samples too fast.

dwc_otg: Return full-speed frame numbers in HS mode

The frame counter increments on every *microframe* in high-speed mode.
Most device drivers expect this number to be in full-speed frames - this
caused considerable confusion to e.g. snd_usb_audio which uses the
frame counter to estimate the number of samples played.

fiq_fsm: save PID on completion of interrupt OUT transfers

Also add edge case handling for interrupt transports.

Note that for periodic split IN, data toggles are unimplemented in the
OTG host hardware - it unconditionally accepts any PID.

fiq_fsm: add missing case for fiq_fsm_tt_in_use()

Certain combinations of bitrate and endpoint activity could
result in a periodic transaction erroneously getting started
while the previous Isochronous OUT was still active.

fiq_fsm: clear hcintmsk for aborted transactions

Prevents the FIQ from erroneously handling interrupts
on a timed out channel.

fiq_fsm: enable by default

fiq_fsm: fix dequeues for non-periodic split transactions

If a dequeue happened between the SSPLIT and CSPLIT phases of the
transaction, the HCD would never receive an interrupt.

fiq_fsm: Disable by default

fiq_fsm: Handle HC babble errors

The HCTSIZ transfer size field raises a babble interrupt if
the counter wraps. Handle the resulting interrupt in this case.

dwc_otg: fix interrupt registration for fiq_enable=0

Additionally make the module parameter conditional for wherever
hcd->fiq_state is touched.

fiq_fsm: Enable by default

dwc_otg: Fix various issues with root port and transaction errors

Process the host port interrupts correctly (and don't trample them).
Root port hotplug now functional again.

Fix a few thinkos with the transaction error passthrough for fiq_fsm.

fiq_fsm: Implement hack for Split Interrupt transactions

Hubs aren't too picky about which endpoint we send Control type split
transactions to. By treating Interrupt transfers as Control, it is
possible to use the non-periodic queue in the OTG core as well as the
non-periodic FIFOs in the hub itself. This massively reduces the
microframe exclusivity/contention that periodic split transactions
otherwise have to enforce.

It goes without saying that this is a fairly egregious USB specification
violation, but it works.

Original idea by Hans Petter Selasky @ FreeBSD.org.

dwc_otg: FIQ support on SMP. Set up FIQ stack and handler on Core 0 only.

dwc_otg: introduce fiq_fsm_spin(un|)lock()

SMP safety for the FIQ relies on register read-modify write cycles being
completed in the correct order. Several places in the DWC code modify
registers also touched by the FIQ. Protect these by a bare-bones lock
mechanism.

This also makes it possible to run the FIQ and IRQ handlers on different
cores.

fiq_fsm: fix build on bcm2708 and bcm2709 platforms

dwc_otg: put some barriers back where they should be for UP

bcm2709/dwc_otg: Setup FIQ on core 1 if >1 core active

dwc_otg: fixup read-modify-write in critical paths

Be more careful about read-modify-write on registers that the FIQ
also touches.

Guard fiq_fsm_spin_lock with fiq_enable check

fiq_fsm: Falling out of the state machine isn't fatal

This edge case can be hit if the port is disabled while the FIQ is
in the middle of a transaction. Make the effects less severe.

Also get rid of the useless return value.

squash: dwc_otg: Allow to build without SMP

usb: core: make overcurrent messages more prominent

Hub overcurrent messages are more serious than "debug". Increase loglevel.

usb: dwc_otg: Don't use dma_to_virt()

Commit 6ce0d20 changes dma_to_virt() which breaks this driver.
Open code the old dma_to_virt() implementation to work around this.

Limit the use of __bus_to_virt() to cases where transfer_buffer_length
is set and transfer_buffer is not set. This is done to increase the
chance that this driver will also work on ARCH_BCM2835.

transfer_buffer should not be NULL if the length is set, but the
comment in the code indicates that there are situations where this
might happen. drivers/usb/isp1760/isp1760-hcd.c also has a similar
comment pointing to a possible: 'usb storage / SCSI bug'.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dwc_otg: Fix crash when fiq_enable=0

dwc_otg: fiq_fsm: Make high-speed isochronous strided transfers work properly

Certain low-bandwidth high-speed USB devices (specialist audio devices,
compressed-frame webcams) have packet intervals > 1 microframe.

Stride these transfers in the FIQ by using the start-of-frame interrupt
to restart the channel at the right time.

dwc_otg: Force host mode to fix incorrect compute module boards

dwc_otg: Add ARCH_BCM2835 support

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dwc_otg: Simplify FIQ irq number code

Dropping ATAGS means we can simplify the FIQ irq number code.
Also add error checking on the returned irq number.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dwc_otg: Remove duplicate gadget probe/unregister function

dwc_otg: Properly set the HFIR

Douglas Anderson reported:

According to the most up to date version of the dwc2 databook, the FRINT
field of the HFIR register should be programmed to:
* 125 us * (PHY clock freq for HS) - 1
* 1000 us * (PHY clock freq for FS/LS) - 1

This is opposed to older versions of the doc that claimed it should be:
* 125 us * (PHY clock freq for HS)
* 1000 us * (PHY clock freq for FS/LS)

and reported lower timing jitter on a USB analyser

dcw_otg: trim xfer length when buffer larger than allocated size is received

dwc_otg: Don't free qh align buffers in atomic context

dwc_otg: Enable the hack for Split Interrupt transactions by default

dwc_otg.fiq_fsm_mask=0xF has long been a suggestion for users with audio stutters or other USB bandwidth issues.
So far we are aware of many success stories but no failure caused by this setting.
Make it a default to learn more.

See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=70437

Signed-off-by: popcornmix <popcornmix@gmail.com>

dwc_otg: Use kzalloc when suitable

dwc_otg: Pass struct device to dma_alloc*()

This makes it possible to get the bus address from Device Tree.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

dwc_otg: fix summarize urb->actual_length for isochronous transfers

Kernel does not copy input data of ISO transfers to userspace
if actual_length is set only in ISO transfers and not summarized
in urb->actual_length. Fixes raspberrypi/linux#903

fiq_fsm: Use correct states when starting isoc OUT transfers

In fiq_fsm_start_next_periodic() if an isochronous OUT transfer
was selected, no regard was given as to whether this was a single-packet
transfer or a multi-packet staged transfer.

For single-packet transfers, this had the effect of repeatedly sending
OUT packets with bogus data and lengths.

Eventually if the channel was repeatedly enabled enough times, this
would lock up the OTG core and no further bus transfers would happen.

Set the FSM state up properly if we select a single-packet transfer.

Fixes https://github.com/raspberrypi/linux/issues/1842

dwc_otg: make nak_holdoff work as intended with empty queues

If URBs reading from non-periodic split endpoints were dequeued and
the last transfer from the endpoint was a NAK handshake, the resulting
qh->nak_frame value was stale which would result in unnecessarily long
polling intervals for the first subsequent transfer with a fresh URB.

Fixup qh->nak_frame in dwc_otg_hcd_urb_dequeue and also guard against
a case where a single URB is submitted to the endpoint, a NAK was
received on the transfer immediately prior to receiving data and the
device subsequently resubmits another URB past the qh->nak_frame interval.

Fixes https://github.com/raspberrypi/linux/issues/1709

dwc_otg: fix split transaction data toggle handling around dequeues

See https://github.com/raspberrypi/linux/issues/1709

Fix several issues regarding endpoint state when URBs are dequeued
- If the HCD is disconnected, flush FIQ-enabled channels properly
- Save the data toggle state for bulk endpoints if the last transfer
  from an endpoint where URBs were dequeued returned a data packet
- Reset hc->start_pkt_count properly in assign_and_init_hc()

dwc_otg: fix several potential crash sources

On root port disconnect events, the host driver state is cleared and
in-progress host channels are forcibly stopped. This doesn't play
well with the FIQ running in the background, so:
- Guard the disconnect callback with both the host spinlock and FIQ
  spinlock
- Move qtd dereference in dwc_otg_handle_hc_fsm() after the early-out
  so we don't dereference a qtd that has gone away
- Turn catch-all BUG()s in dwc_otg_handle_hc_fsm() into warnings.

dwc_otg: delete hcd->channel_lock

The lock serves no purpose as it is only held while the HCD spinlock
is already being held.

dwc_otg: remove unnecessary dma-mode channel halts on disconnect interrupt

Host channels are already halted in kill_urbs_in_qh_list() with the
subsequent interrupt processing behaving as if the URB was dequeued
via HCD callback.

There's no need to clobber the host channel registers a second time
as this exposes races between the driver and host channel resulting
in hcd->free_hc_list becoming corrupted.

dwcotg: Allow to build without FIQ on ARM64

Signed-off-by: popcornmix <popcornmix@gmail.com>

dwc_otg: make periodic scheduling behave properly for FS buses

If the root port is in full-speed mode, transfer times at 12mbit/s
would be calculated but matched against high-speed quotas.

Reinitialise hcd->frame_usecs[i] on each port enable event so that
full-speed bandwidth can be tracked sensibly.

Also, don't bother using the FIQ for transfers when in full-speed
mode - at the slower bus speed, interrupt frequency is reduced by
an order of magnitude.

Related issue: https://github.com/raspberrypi/linux/issues/2020

dwc_otg: fiq_fsm: Make isochronous compatibility checks work properly

Get rid of the spammy printk and local pointer mangling.
Also, there is a nominal benefit for using fiq_fsm for isochronous
transfers in FS mode (~1.1k IRQs per second vs 2.1k IRQs per second)
so remove the root port speed check.

dwc_otg: add module parameter int_ep_interval_min

Add a module parameter (defaulting to ignored) that clamps the polling rate
of high-speed Interrupt endpoints to a minimum microframe interval.

The parameter is modifiable at runtime as it is used when activating new
endpoints (such as on device connect).

dwc_otg: fiq_fsm: Add non-periodic TT exclusivity constraints

Certain hub types do not discriminate between pipe direction (IN or OUT)
when considering non-periodic transfers. Therefore these hubs get confused
if multiple transfers are issued in different directions with the same
device address and endpoint number.

Constrain queuing non-periodic split transactions so they are performed
serially in such cases.

Related: https://github.com/raspberrypi/linux/issues/2024

dwc_otg: Fixup change to DRIVER_ATTR interface

dwc_otg: Fix compilation warnings

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

USB_DWCOTG: Disable building dwc_otg as a module (#2265)

When dwc_otg is built as a module, build will fail with the following
error:

ERROR: "DWC_TASK_HI_SCHEDULE" [drivers/usb/host/dwc_otg/dwc_otg.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
Makefile:1199: recipe for target 'modules' failed
make: *** [modules] Error 2

Even if the error is solved by including the missing
DWC_TASK_HI_SCHEDULE function, the kernel will panic when loading
dwc_otg.

As a workaround, simply prevent user from building dwc_otg as a module
as the current kernel does not support it.

See: https://github.com/raspberrypi/linux/issues/2258

Signed-off-by: Malik Olivier Boussejra <malik@boussejra.com>

dwc_otg: New timer API

dwc_otg: Fix removed ACCESS_ONCE->READ_ONCE

dwc_otg: don't unconditionally force host mode in dwc_otg_cil_init()

Add the ability to disable force_host_mode for those that want to use
dwc_otg in both device and host modes.

dwc_otg: Fix a regression when dequeueing isochronous transfers

In 282bed95 (dwc_otg: make nak_holdoff work as intended with empty queues)
the dequeue mechanism was changed to leave FIQ-enabled transfers to run
to completion - to avoid leaving hub TT buffers with stale packets lying
around.

This broke FIQ-accelerated isochronous transfers, as this then meant that
dozens of transfers were performed after the dequeue function returned.

Restore the state machine fence for isochronous transfers.

fiq_fsm: rewind DMA pointer for OUT transactions that fail (#2288)

See: https://github.com/raspberrypi/linux/issues/2140

dwc_otg: add smp_mb() to prevent driver state corruption on boot

Occasional crashes have been seen where the FIQ code dereferences
invalid/random pointers immediately after being set up, leading to
panic on boot.

The crash occurs as the FIQ code races against hcd_init_fiq() and
the hcd_init_fiq() code races against the outstanding memory stores
from dwc_otg_hcd_init(). Use explicit barriers after touching
driver state.

usb: dwc_otg: fix memory corruption in dwc_otg driver

[Upstream commit 51b1b64917]

The move from the staging tree to the main tree exposed a
longstanding memory corruption bug in the dwc2 driver. The
reordering of the driver initialization caused the dwc2 driver
to corrupt the initialization data of the sdhci driver on the
Raspberry Pi platform, which made the bug show up.

The error is in calling to_usb_device(hsotg->dev), since ->dev
is not a member of struct usb_device. The easiest fix is to
just remove the offending code, since it is not really needed.

Thanks to Stephen Warren for tracking down the cause of this.

Reported-by: Andre Heider <a.heider@gmail.com>
Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Paul Zimmerman <paulz@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[lukas: port from upstream dwc2 to out-of-tree dwc_otg driver]
Signed-off-by: Lukas Wunner <lukas@wunner.de>

usb: dwb_otg: Fix unreachable switch statement warning

This warning appears with GCC 7.3.0 from toolchains.bootlin.com:

../drivers/usb/host/dwc_otg/dwc_otg_fiq_fsm.c: In function ‘fiq_fsm_update_hs_isoc’:
../drivers/usb/host/dwc_otg/dwc_otg_fiq_fsm.c:595:61: warning: statement will never be executed [-Wswitch-unreachable]
   st->hctsiz_copy.b.xfersize = nrpackets * st->hcchar_copy.b.mps;
                                            ~~~~~~~~~~~~~~~~~^~~~

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

dwc_otg: fiq_fsm: fix incorrect DMA register offset calculation

Rationalise the offset and update all call sites.

Fixes https://github.com/raspberrypi/linux/issues/2408

dwc_otg: fix bug with port_addr assignment for single-TT hubs

See https://github.com/raspberrypi/linux/issues/2734

The "Hub Port" field in the split transaction packet was always set
to 1 for single-TT hubs. The majority of single-TT hub products
apparently ignore this field and broadcast to all downstream enabled
ports, which masked the issue. A subset of hub devices apparently
need the port number to be exact or split transactions will fail.

usb: dwc_otg: Clean up build warnings on 64bit kernels

No functional changes. Almost all are changes to logging lines.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

usb: dwc_otg: Use dma allocation for mphi dummy_send buffer

The FIQ driver used a kzalloc'ed buffer for dummy_send,
passing a kernel virtual address to the hardware block.
The buffer is only ever used for a dummy read, so it
should be harmless, but there is the chance that it will
cause exceptions.

Use a dma allocation so that we have a genuine bus address,
and read from that.
Free the allocation when done for good measure.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

dwc_otg: only do_split when we actually need to do a split

The previous test would fail if the root port was in fullspeed mode
and there was a hub between the FS device and the root port. While
the transfer worked, the schedule mangling performed for high-speed
split transfers would break leading to an 8ms polling interval.

dwc_otg: fix locking around dequeueing and killing URBs

kill_urbs_in_qh_list() is practically only ever called with the fiq lock
already held, so don't spinlock twice in the case where we need to cancel
an isochronous transfer.

Also fix up a case where the global interrupt register could be read with
the fiq lock not held.

Fixes the deadlock seen in https://github.com/raspberrypi/linux/issues/2907

ARM64/DWC_OTG: Port dwc_otg driver to ARM64

In ARM64, the FIQ mechanism used by this driver is not current
implemented.   As a workaround, reqular IRQ is used instead
of FIQ.

In a separate change, the IRQ-CPU mapping is round robined
on ARM64 to increase concurrency and allow multiple interrupts
to be serviced at a time.  This reduces the need for FIQ.

Tests Run:

This mechanism is most likely to break when multiple USB devices
are attached at the same time.  So the system was tested under
stress.

Devices:

1. USB Speakers playing back a FLAC audio through VLC
   at 96KHz.(Higher then typically, but supported on my speakers).

2. sftp transferring large files through the buildin ethernet
   connection which is connected through USB.

3. Keyboard and mouse attached and being used.

Although I do occasionally hear some glitches, the music seems to
play quite well.

Signed-off-by: Michael Zoran <mzoran@crowfest.net>

usb: dwc_otg: Clean up interrupt claiming code

The FIQ/IRQ interrupt number identification code is scattered through
the dwc_otg driver. Rationalise it, simplifying the code and solving
an existing issue.

See: https://github.com/raspberrypi/linux/issues/2612

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

dwc_otg: Choose appropriate IRQ handover strategy

2711 has no MPHI peripheral, but the ARM Control block can fake
interrupts. Use the size of the DTB "mphi" reg block to determine
which is required.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>

usb: host: dwc_otg: fix compiling in separate directory

The dwc_otg Makefile does not respect the O=path argument correctly:
include paths in CFLAGS are given relatively to object path, not source
path. Compiling in a separate directory yields #include errors.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
2019-09-17 11:19:47 +01:00
popcornmix
24e2599fec Main bcm2708/bcm2709 linux port
Signed-off-by: popcornmix <popcornmix@gmail.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

bcm2709: Drop platform smp and timer init code

irq-bcm2836 handles this through these functions:
bcm2835_init_local_timer_frequency()
bcm2836_arm_irqchip_smp_init()

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

bcm270x: Use watchdog for reboot/poweroff

The watchdog driver already has support for reboot/poweroff.
Make use of this and remove the code from the platform files.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>

board_bcm2835: Remove coherent dma pool increase - API has gone
2019-09-17 11:19:47 +01:00
notro
957893fb56 pinctrl-bcm2835: Set base to 0 give expected gpio numbering
Signed-off-by: Noralf Tronnes <notro@tronnes.org>
2019-09-17 11:19:47 +01:00
Phil Elwell
698284259b amba_pl011: Add cts-event-workaround DT property
The BCM2835 PL011 implementation seems to have a bug that can lead to a
transmission lockup if CTS changes frequently. A workaround was added to
the driver with a vendor-specific flag to enable it, but this flag is
currently not set for ARM implementations.

Add a "cts-event-workaround" property to Pi DTBs and use the presence
of that property to force the flag to be enabled in the driver.

See: https://github.com/raspberrypi/linux/issues/1280

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:47 +01:00
Phil Elwell
ea7648cf35 amba_pl011: Insert mb() for correct FIFO handling
The pl011 register accessor functions use the _relaxed versions of the
standard readl() and writel() functions, meaning that there are no
automatic memory barriers. When polling a FIFO status register to check
for fullness, it is necessary to ensure that any outstanding writes have
completed; otherwise the flags are effectively stale, making it possible
that the next write is to a full FIFO.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:47 +01:00
Phil Elwell
992d0334c2 amba_pl011: Round input clock up
The UART clock is initialised to be as close to the requested
frequency as possible without exceeding it. Now that there is a
clock manager that returns the actual frequencies, an expected
48MHz clock is reported as 47999625. If the requested baudrate
== requested clock/16, there is no headroom and the slight
reduction in actual clock rate results in failure.

Detect cases where it looks like a "round" clock was chosen and
adjust the reported clock to match that "round" value. As the
code comment says:

/*
 * If increasing a clock by less than 0.1% changes it
 * from ..999.. to ..000.., round up.
 */

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:47 +01:00
Phil Elwell
554aeabdfe amba_pl011: Don't use DT aliases for numbering
The pl011 driver looks for DT aliases of the form "serial<n>",
and if found uses <n> as the device ID. This can cause
/dev/ttyAMA0 to become /dev/ttyAMA1, which is confusing if the
other serial port is provided by the 8250 driver which doesn't
use the same logic.
2019-09-17 11:19:47 +01:00
Phil Elwell
06b3d2fa12 lan78xx: Enable LEDs and auto-negotiation
For applications of the LAN78xx that don't have valid programmed
EEPROMs or OTPs, enabling both LEDs and auto-negotiation by default
seems reasonable.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:47 +01:00
Phil Elwell
b3f65dff69 irqchip: irq-bcm2836: Remove regmap and syscon use
The syscon node defines a register range that duplicates that used by
the local_intc node on bcm2836/7. Since irq-bcm2835 and irq-bcm2836 are
built in and always present together (both drivers are enabled by
CONFIG_ARCH_BCM2835), it is possible to replace the syscon usage with a
global variable that simplifies the code. Doing so does lose the
locking provided by regmap, but as only one side is using the regmap
interface (irq-bcm2835 uses readl and write) there is no loss of
atomicity.

See: https://github.com/raspberrypi/firmware/issues/926

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:46 +01:00
Phil Elwell
385f85e36d ASoC: Add prompt for ICS43432 codec
Without a prompt string, a config setting can't be included in a
defconfig. Give CONFIG_SND_SOC_ICS43432 a prompt so that Pi soundcards
can use the driver.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:46 +01:00
Eric Anholt
0942e223e8 mm: Remove the PFN busy warning
See commit dae803e165 -- the warning is
expected sometimes when using CMA.  However, that commit still spams
my kernel log with these warnings.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:46 +01:00
Noralf Trønnes
afa6ed3239 i2c: bcm2835: Add debug support
This adds a debug module parameter to aid in debugging transfer issues
by printing info to the kernel log. When enabled, status values are
collected in the interrupt routine and msg info in
bcm2835_i2c_start_transfer(). This is done in a way that tries to avoid
affecting timing. Having printk in the isr can mask issues.

debug values (additive):
1: Print info on error
2: Print info on all transfers
3: Print messages before transfer is started

The value can be changed at runtime:
/sys/module/i2c_bcm2835/parameters/debug

Example output, debug=3:
[  747.114448] bcm2835_i2c_xfer: msg(1/2) write addr=0x54, len=2 flags= [i2c1]
[  747.114463] bcm2835_i2c_xfer: msg(2/2) read addr=0x54, len=32 flags= [i2c1]
[  747.117809] start_transfer: msg(1/2) write addr=0x54, len=2 flags= [i2c1]
[  747.117825] isr: remain=2, status=0x30000055 : TA TXW TXD TXE  [i2c1]
[  747.117839] start_transfer: msg(2/2) read addr=0x54, len=32 flags= [i2c1]
[  747.117849] isr: remain=32, status=0xd0000039 : TA RXR TXD RXD  [i2c1]
[  747.117861] isr: remain=20, status=0xd0000039 : TA RXR TXD RXD  [i2c1]
[  747.117870] isr: remain=8, status=0x32 : DONE TXD RXD  [i2c1]

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
2019-09-17 11:19:46 +01:00
Claggy3
c51d3c77e8 Update vfpmodule.c
Christopher Alexander Tobias Schulze - May 2, 2015, 11:57 a.m.
This patch fixes a problem with VFP state save and restore related
to exception handling (panic with message "BUG: unsupported FP
instruction in kernel mode") present on VFP11 floating point units
(as used with ARM1176JZF-S CPUs, e.g. on first generation Raspberry
Pi boards). This patch was developed and discussed on

   https://github.com/raspberrypi/linux/issues/859

A precondition to see the crashes is that floating point exception
traps are enabled. In this case, the VFP11 might determine that a FPU
operation needs to trap at a point in time when it is not possible to
signal this to the ARM11 core any more. The VFP11 will then set the
FPEXC.EX bit and store the trapped opcode in FPINST. (In some cases,
a second opcode might have been accepted by the VFP11 before the
exception was detected and could be reported to the ARM11 - in this
case, the VFP11 also sets FPEXC.FP2V and stores the second opcode in
FPINST2.)

If FPEXC.EX is set, the VFP11 will "bounce" the next FPU opcode issued
by the ARM11 CPU, which will be seen by the ARM11 as an undefined opcode
trap. The VFP support code examines the FPEXC.EX and FPEXC.FP2V bits
to decide what actions to take, i.e., whether to emulate the opcodes
found in FPINST and FPINST2, and whether to retry the bounced instruction.

If a user space application has left the VFP11 in this "pending trap"
state, the next FPU opcode issued to the VFP11 might actually be the
VSTMIA operation vfp_save_state() uses to store the FPU registers
to memory (in our test cases, when building the signal stack frame).
In this case, the kernel crashes as described above.

This patch fixes the problem by making sure that vfp_save_state() is
always entered with FPEXC.EX cleared. (The current value of FPEXC has
already been saved, so this does not corrupt the context. Clearing
FPEXC.EX has no effects on FPINST or FPINST2. Also note that many
callers already modify FPEXC by setting FPEXC.EN before invoking
vfp_save_state().)

This patch also addresses a second problem related to FPEXC.EX: After
returning from signal handling, the kernel reloads the VFP context
from the user mode stack. However, the current code explicitly clears
both FPEXC.EX and FPEXC.FP2V during reload. As VFP11 requires these
bits to be preserved, this patch disables clearing them for VFP
implementations belonging to architecture 1. There should be no
negative side effects: the user can set both bits by executing FPU
opcodes anyway, and while user code may now place arbitrary values
into FPINST and FPINST2 (e.g., non-VFP ARM opcodes) the VFP support
code knows which instructions can be emulated, and rejects other
opcodes with "unhandled bounce" messages, so there should be no
security impact from allowing reloading FPEXC.EX and FPEXC.FP2V.

Signed-off-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
2019-09-17 11:19:46 +01:00
Phil Elwell
2135d0aefb sound: Demote deferral errors to INFO level
At present there is no mechanism to specify driver load order,
which can lead to deferrals and repeated retries until successful.
Since this situation is expected, reduce the dmesg level to
INFO and mention that the operation will be retried.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:46 +01:00
Eric Anholt
005d1d0984 clk: bcm2835: Mark GPIO clocks enabled at boot as critical.
These divide off of PLLD_PER and are used for the ethernet and wifi
PHYs source PLLs.  Neither of them is currently represented by a phy
device that would grab the clock for us.

This keeps other drivers from killing the networking PHYs when they
disable their own clocks and trigger PLLD_PER's refcount going to 0.

v2: Skip marking as critical if they aren't on at boot.

Signed-off-by: Eric Anholt <eric@anholt.net>
2019-09-17 11:19:46 +01:00
Phil Elwell
e0c321afbb clk-bcm2835: Read max core clock from firmware
The VPU is responsible for managing the core clock, usually under
direction from the bcm2835-cpufreq driver but not via the clk-bcm2835
driver. Since the core frequency can change without warning, it is
safer to report the maximum clock rate to users of the core clock -
I2C, SPI and the mini UART - to err on the safe side when calculating
clock divisors.

If the DT node for the clock driver includes a reference to the
firmware node, use the firmware API to query the maximum core clock
instead of reading the divider registers.

Prior to this patch, a "100KHz" I2C bus was sometimes clocked at about
160KHz. In particular, switching to the 4.9 kernel was likely to break
SenseHAT usage on a Pi3.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:46 +01:00
Phil Elwell
15cddad023 clk-bcm2835: Add claim-clocks property
The claim-clocks property can be used to prevent PLLs and dividers
from being marked as critical. It contains a vector of clock IDs,
as defined by dt-bindings/clock/bcm2835.h.

Use this mechanism to claim PLLD_DSI0, PLLD_DSI1, PLLH_AUX and
PLLH_PIX for the vc4_kms_v3d driver.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:46 +01:00
Phil Elwell
d68b20efda clk-bcm2835: Mark used PLLs and dividers CRITICAL
The VPU configures and relies on several PLLs and dividers. Mark all
enabled dividers and their PLLs as CRITICAL to prevent the kernel from
switching them off.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:46 +01:00
popcornmix
8c489dd6e4 bcm2835-rng: Avoid initialising if already enabled
Avoids the 0x40000 cycles of warmup again if firmware has already used it
2019-09-17 11:19:46 +01:00
Martin Sperl
671165b951 Register the clocks early during the boot process, so that special/critical clocks can get enabled early on in the boot process avoiding the risk of disabling a clock, pll_divider or pll when a claiming driver fails to install propperly - maybe it needs to defer.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
2019-09-17 11:19:46 +01:00
popcornmix
7cea52c36f bcm: Make RASPBERRYPI_POWER depend on PM 2019-09-17 11:19:45 +01:00
popcornmix
2aa3a6ca53 reboot: Use power off rather than busy spinning when halt is requested 2019-09-17 11:19:45 +01:00
Noralf Trønnes
14ee958a6f watchdog: bcm2835: Support setting reboot partition
The Raspberry Pi firmware looks at the RSTS register to know which
partition to boot from. The reboot syscall command
LINUX_REBOOT_CMD_RESTART2 supports passing in a string argument.

Add support for passing in a partition number 0..63 to boot from.
Partition 63 is a special partiton indicating halt.
If the partition doesn't exist, the firmware falls back to partition 0.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
2019-09-17 11:19:45 +01:00
Phil Elwell
e7562e99bb rtc: Add SPI alias for pcf2123 driver
Without this alias, Device Tree won't cause the driver
to be loaded.

See: https://github.com/raspberrypi/linux/pull/1510
2019-09-17 11:19:45 +01:00
popcornmix
81bbae5ce6 firmware: Updated mailbox header 2019-09-17 11:19:45 +01:00
Noralf Trønnes
3aaac6af94 dmaengine: bcm2835: Load driver early and support legacy API
Load driver early since at least bcm2708_fb doesn't support deferred
probing and even if it did, we don't want the video driver deferred.
Support the legacy DMA API which is needed by bcm2708_fb.
Don't mask out channel 2.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
2019-09-17 11:19:45 +01:00
Phil Elwell
3011ad696f spi: spidev: Completely disable the spidev warning
An alternative strategy would be to use "rpi,spidev" instead, but that
would require many Raspberry Pi Device Tree changes.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:45 +01:00
Noralf Trønnes
add79076e6 irqchip: irq-bcm2835: Add 2836 FIQ support
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
2019-09-17 11:19:45 +01:00
Noralf Trønnes
e4b3284ac2 irqchip: bcm2835: Add FIQ support
Add a duplicate irq range with an offset on the hwirq's so the
driver can detect that enable_fiq() is used.
Tested with downstream dwc_otg USB controller driver.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
2019-09-17 11:19:45 +01:00
Phil Elwell
daa832a4ec irq-bcm2836: Avoid "Invalid trigger warning"
Initialise the level for each IRQ to avoid a warning from the
arm arch timer code.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:45 +01:00
Phil Elwell
0038c29f43 irq-bcm2836: Prevent spurious interrupts, and trap them early
The old arch-specific IRQ macros included a dsb to ensure the
write to clear the mailbox interrupt completed before returning
from the interrupt. The BCM2836 irqchip driver needs the same
precaution to avoid spurious interrupts.

Spurious interrupts are still possible for other reasons,
though, so trap them early.
2019-09-17 11:19:45 +01:00
Phil Elwell
1dcccaa7ae Protect __release_resource against resources without parents
Without this patch, removing a device tree overlay can crash here.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:45 +01:00
popcornmix
6faff3ee51 Allow mac address to be set in smsc95xx
Signed-off-by: popcornmix <popcornmix@gmail.com>
2019-09-17 11:19:44 +01:00
Sam Nazarko
a95b6ee1de smsc95xx: Experimental: Enable turbo_mode and packetsize=2560 by default
See: http://forum.kodi.tv/showthread.php?tid=285288
2019-09-17 11:19:44 +01:00
Steve Glendinning
81682c23ea smsx95xx: fix crimes against truesize
smsc95xx is adjusting truesize when it shouldn't, and following a recent patch from Eric this is now triggering warnings.

This patch stops smsc95xx from changing truesize.

Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
2019-09-17 11:19:44 +01:00
Phil Elwell
101e05de92 Revert "rtc: pcf8523: properly handle oscillator stop bit"
This reverts commit ede44c908d.

See: https://github.com/raspberrypi/firmware/issues/1065

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
2019-09-17 11:19:44 +01:00
Dan Pasanen
f090e330b1 arm: partially revert 702b94bff3
* Re-expose some dmi APIs for use in VCSM
2019-09-17 11:19:44 +01:00
25305 changed files with 880089 additions and 1945412 deletions

View File

@@ -86,8 +86,6 @@ ForEachMacros:
- 'bio_for_each_segment_all'
- 'bio_list_for_each'
- 'bip_for_each_vec'
- 'bitmap_for_each_clear_region'
- 'bitmap_for_each_set_region'
- 'blkg_for_each_descendant_post'
- 'blkg_for_each_descendant_pre'
- 'blk_queue_for_each_rl'
@@ -117,7 +115,6 @@ ForEachMacros:
- 'drm_client_for_each_connector_iter'
- 'drm_client_for_each_modeset'
- 'drm_connector_for_each_possible_encoder'
- 'drm_for_each_bridge_in_chain'
- 'drm_for_each_connector_iter'
- 'drm_for_each_crtc'
- 'drm_for_each_encoder'
@@ -139,10 +136,9 @@ ForEachMacros:
- 'for_each_bio'
- 'for_each_board_func_rsrc'
- 'for_each_bvec'
- 'for_each_card_auxs'
- 'for_each_card_auxs_safe'
- 'for_each_card_components'
- 'for_each_card_pre_auxs'
- 'for_each_card_links'
- 'for_each_card_links_safe'
- 'for_each_card_prelinks'
- 'for_each_card_rtds'
- 'for_each_card_rtds_safe'
@@ -170,7 +166,6 @@ ForEachMacros:
- 'for_each_dpcm_fe'
- 'for_each_drhd_unit'
- 'for_each_dss_dev'
- 'for_each_efi_handle'
- 'for_each_efi_memory_desc'
- 'for_each_efi_memory_desc_in_map'
- 'for_each_element'
@@ -195,7 +190,6 @@ ForEachMacros:
- 'for_each_lru'
- 'for_each_matching_node'
- 'for_each_matching_node_and_match'
- 'for_each_member'
- 'for_each_memblock'
- 'for_each_memblock_type'
- 'for_each_memcg_cache_index'
@@ -206,11 +200,9 @@ ForEachMacros:
- 'for_each_msi_entry'
- 'for_each_msi_entry_safe'
- 'for_each_net'
- 'for_each_net_continue_reverse'
- 'for_each_netdev'
- 'for_each_netdev_continue'
- 'for_each_netdev_continue_rcu'
- 'for_each_netdev_continue_reverse'
- 'for_each_netdev_feature'
- 'for_each_netdev_in_bond_rcu'
- 'for_each_netdev_rcu'
@@ -262,10 +254,10 @@ ForEachMacros:
- 'for_each_reserved_mem_region'
- 'for_each_rtd_codec_dai'
- 'for_each_rtd_codec_dai_rollback'
- 'for_each_rtd_components'
- 'for_each_rtdcom'
- 'for_each_rtdcom_safe'
- 'for_each_set_bit'
- 'for_each_set_bit_from'
- 'for_each_set_clump8'
- 'for_each_sg'
- 'for_each_sg_dma_page'
- 'for_each_sg_page'
@@ -275,7 +267,6 @@ ForEachMacros:
- 'for_each_subelement_id'
- '__for_each_thread'
- 'for_each_thread'
- 'for_each_wakeup_source'
- 'for_each_zone'
- 'for_each_zone_zonelist'
- 'for_each_zone_zonelist_nodemask'
@@ -339,7 +330,6 @@ ForEachMacros:
- 'list_for_each'
- 'list_for_each_codec'
- 'list_for_each_codec_safe'
- 'list_for_each_continue'
- 'list_for_each_entry'
- 'list_for_each_entry_continue'
- 'list_for_each_entry_continue_rcu'
@@ -361,7 +351,6 @@ ForEachMacros:
- 'llist_for_each_entry'
- 'llist_for_each_entry_safe'
- 'llist_for_each_safe'
- 'mci_for_each_dimm'
- 'media_device_for_each_entity'
- 'media_device_for_each_intf'
- 'media_device_for_each_link'
@@ -455,16 +444,10 @@ ForEachMacros:
- 'virtio_device_for_each_vq'
- 'xa_for_each'
- 'xa_for_each_marked'
- 'xa_for_each_range'
- 'xa_for_each_start'
- 'xas_for_each'
- 'xas_for_each_conflict'
- 'xas_for_each_marked'
- 'xbc_array_for_each_value'
- 'xbc_for_each_key_value'
- 'xbc_node_for_each_array_value'
- 'xbc_node_for_each_child'
- 'xbc_node_for_each_key_value'
- 'zorro_for_each_dev'
#IncludeBlocks: Preserve # Unknown to clang-format-5.0

2
.gitattributes vendored
View File

@@ -1,4 +1,2 @@
*.c diff=cpp
*.h diff=cpp
*.dtsi diff=dts
*.dts diff=dts

3
.gitignore vendored
View File

@@ -35,6 +35,7 @@
*.mod.c
*.o
*.o.*
*.order
*.patch
*.s
*.so
@@ -46,7 +47,6 @@
*.xz
Module.symvers
modules.builtin
modules.order
#
# Top-level generic files
@@ -61,7 +61,6 @@ modules.order
/System.map
/Module.markers
/modules.builtin.modinfo
/modules.nsdeps
#
# RPM spec file (make rpm-pkg)

View File

@@ -18,7 +18,6 @@ Aleksey Gorelov <aleksey_gorelov@phoenix.com>
Aleksandar Markovic <aleksandar.markovic@mips.com> <aleksandar.markovic@imgtec.com>
Alex Shi <alex.shi@linux.alibaba.com> <alex.shi@intel.com>
Alex Shi <alex.shi@linux.alibaba.com> <alex.shi@linaro.org>
Alexandre Belloni <alexandre.belloni@bootlin.com> <alexandre.belloni@free-electrons.com>
Alexei Starovoitov <ast@kernel.org> <ast@plumgrid.com>
Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com>
Alexei Starovoitov <ast@kernel.org> <ast@fb.com>
@@ -28,14 +27,11 @@ Andi Shyti <andi@etezian.org> <andi.shyti@samsung.com>
Andreas Herrmann <aherrman@de.ibm.com>
Andrey Ryabinin <ryabinin.a.a@gmail.com> <a.ryabinin@samsung.com>
Andrew Morton <akpm@linux-foundation.org>
Andrew Murray <amurray@thegoodpenguin.co.uk> <andrew.murray@arm.com>
Andrew Murray <amurray@thegoodpenguin.co.uk> <amurray@embedded-bits.co.uk>
Andrew Vasquez <andrew.vasquez@qlogic.com>
Andy Adamson <andros@citi.umich.edu>
Antoine Tenart <antoine.tenart@free-electrons.com>
Antonio Ospite <ao2@ao2.it> <ao2@amarulasolutions.com>
Archit Taneja <archit@ti.com>
Ard Biesheuvel <ardb@kernel.org> <ard.biesheuvel@linaro.org>
Arnaud Patard <arnaud.patard@rtp-net.org>
Arnd Bergmann <arnd@arndb.de>
Axel Dyks <xl@xlsigned.net>
@@ -51,8 +47,6 @@ Boris Brezillon <bbrezillon@kernel.org> <b.brezillon.dev@gmail.com>
Boris Brezillon <bbrezillon@kernel.org> <b.brezillon@overkiz.com>
Brian Avery <b.avery@hp.com>
Brian King <brking@us.ibm.com>
Chao Yu <chao@kernel.org> <chao2.yu@samsung.com>
Chao Yu <chao@kernel.org> <yuchao0@huawei.com>
Christoph Hellwig <hch@lst.de>
Christophe Ricard <christophe.ricard@gmail.com>
Corey Minyard <minyard@acm.org>
@@ -69,7 +63,6 @@ Dengcheng Zhu <dzhu@wavecomp.com> <dengcheng.zhu@mips.com>
Dengcheng Zhu <dzhu@wavecomp.com> <dengcheng.zhu@imgtec.com>
Dengcheng Zhu <dzhu@wavecomp.com> <dczhu@mips.com>
Dengcheng Zhu <dzhu@wavecomp.com> <dengcheng.zhu@gmail.com>
<dev.kurt@vandijck-laurijssen.be> <kurt.van.dijck@eia.be>
Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Dmitry Safonov <0x7f454c46@gmail.com> <dsafonov@virtuozzo.com>
Dmitry Safonov <0x7f454c46@gmail.com> <d.safonov@partner.samsung.com>
@@ -77,7 +70,6 @@ Dmitry Safonov <0x7f454c46@gmail.com> <dima@arista.com>
Domen Puncer <domen@coderock.org>
Douglas Gilbert <dougg@torque.net>
Ed L. Cashin <ecashin@coraid.com>
Erik Kaneda <erik.kaneda@intel.com> <erik.schmauss@intel.com>
Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Felipe W Damasio <felipewd@terra.com.br>
Felix Kuhling <fxkuehl@gmx.de>
@@ -88,8 +80,6 @@ Frank Rowand <frowand.list@gmail.com> <frowand@mvista.com>
Frank Rowand <frowand.list@gmail.com> <frank.rowand@am.sony.com>
Frank Rowand <frowand.list@gmail.com> <frank.rowand@sonymobile.com>
Frank Zago <fzago@systemfabricworks.com>
Gao Xiang <xiang@kernel.org> <gaoxiang25@huawei.com>
Gao Xiang <xiang@kernel.org> <hsiangkao@aol.com>
Greg Kroah-Hartman <greg@echidna.(none)>
Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman <greg@kroah.com>
@@ -100,27 +90,16 @@ Henrik Kretzschmar <henne@nachtwindheim.de>
Henrik Rydberg <rydberg@bitmath.org>
Herbert Xu <herbert@gondor.apana.org.au>
Jacob Shin <Jacob.Shin@amd.com>
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@google.com>
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@motorola.com>
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk.kim@samsung.com>
Jakub Kicinski <kuba@kernel.org> <jakub.kicinski@netronome.com>
James Bottomley <jejb@mulgrave.(none)>
James Bottomley <jejb@titanic.il.steeleye.com>
James E Wilson <wilson@specifix.com>
James Hogan <jhogan@kernel.org> <james.hogan@imgtec.com>
James Hogan <jhogan@kernel.org> <james@albanarts.com>
James Ketrenos <jketreno@io.(none)>
Jan Glauber <jan.glauber@gmail.com> <jang@de.ibm.com>
Jan Glauber <jan.glauber@gmail.com> <jang@linux.vnet.ibm.com>
Jan Glauber <jan.glauber@gmail.com> <jglauber@cavium.com>
Jason Gunthorpe <jgg@ziepe.ca> <jgg@mellanox.com>
Jason Gunthorpe <jgg@ziepe.ca> <jgunthorpe@obsidianresearch.com>
Javi Merino <javi.merino@kernel.org> <javi.merino@arm.com>
<javier@osg.samsung.com> <javier.martinez@collabora.co.uk>
Jayachandran C <c.jayachandran@gmail.com> <jayachandranc@netlogicmicro.com>
Jayachandran C <c.jayachandran@gmail.com> <jchandra@broadcom.com>
Jayachandran C <c.jayachandran@gmail.com> <jchandra@digeo.com>
Jayachandran C <c.jayachandran@gmail.com> <jnair@caviumnetworks.com>
Jean Tourrilhes <jt@hpl.hp.com>
<jean-philippe@linaro.org> <jean-philippe.brucker@arm.com>
Jeff Garzik <jgarzik@pretzel.yyz.us>
@@ -142,7 +121,6 @@ Juha Yrjola <at solidboot.com>
Juha Yrjola <juha.yrjola@nokia.com>
Juha Yrjola <juha.yrjola@solidboot.com>
Julien Thierry <julien.thierry.kdev@gmail.com> <julien.thierry@arm.com>
Kamil Konieczny <k.konieczny@samsung.com> <k.konieczny@partner.samsung.com>
Kay Sievers <kay.sievers@vrfy.org>
Kenneth W Chen <kenneth.w.chen@intel.com>
Konstantin Khlebnikov <koct9i@gmail.com> <k.khlebnikov@samsung.com>
@@ -158,7 +136,6 @@ Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de>
Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@ascom.ch>
Li Yang <leoyang.li@nxp.com> <leo@zh-kernel.org>
Li Yang <leoyang.li@nxp.com> <leoli@freescale.com>
Lukasz Luba <lukasz.luba@arm.com> <l.luba@partner.samsung.com>
Maciej W. Rozycki <macro@mips.com> <macro@imgtec.com>
Marc Zyngier <maz@kernel.org> <marc.zyngier@arm.com>
Marcin Nowakowski <marcin.nowakowski@mips.com> <marcin.nowakowski@imgtec.com>
@@ -166,7 +143,6 @@ Mark Brown <broonie@sirena.org.uk>
Mark Yao <markyao0591@gmail.com> <mark.yao@rock-chips.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@puri.sm>
Mathieu Othacehe <m.othacehe@gmail.com>
Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com>
Matthew Wilcox <willy@infradead.org> <matthew@wil.cx>
@@ -202,22 +178,11 @@ Morten Welinder <welinder@darter.rentec.com>
Morten Welinder <welinder@troll.com>
Mythri P K <mythripk@ti.com>
Nguyen Anh Quynh <aquynh@gmail.com>
Nicolas Ferre <nicolas.ferre@microchip.com> <nicolas.ferre@atmel.com>
Nicolas Pitre <nico@fluxnic.net> <nicolas.pitre@linaro.org>
Nicolas Pitre <nico@fluxnic.net> <nico@linaro.org>
Oleksij Rempel <linux@rempel-privat.de> <bug-track@fisher-privat.net>
Oleksij Rempel <linux@rempel-privat.de> <external.Oleksij.Rempel@de.bosch.com>
Oleksij Rempel <linux@rempel-privat.de> <fixed-term.Oleksij.Rempel@de.bosch.com>
Oleksij Rempel <linux@rempel-privat.de> <o.rempel@pengutronix.de>
Oleksij Rempel <linux@rempel-privat.de> <ore@pengutronix.de>
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Patrick Mochel <mochel@digitalimplant.org>
Paul Burton <paulburton@kernel.org> <paul.burton@imgtec.com>
Paul Burton <paulburton@kernel.org> <paul.burton@mips.com>
Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.ibm.com>
Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.vnet.ibm.com>
Paul E. McKenney <paulmck@kernel.org> <paul.mckenney@linaro.org>
Paul E. McKenney <paulmck@kernel.org> <paulmck@us.ibm.com>
Paul Burton <paul.burton@mips.com> <paul.burton@imgtec.com>
Peter A Jonsson <pj@ludd.ltu.se>
Peter Oruba <peter@oruba.de>
Peter Oruba <peter.oruba@amd.com>
@@ -225,9 +190,11 @@ Pratyush Anand <pratyush.anand@gmail.com> <pratyush.anand@st.com>
Praveen BP <praveenbp@ti.com>
Punit Agrawal <punitagrawal@gmail.com> <punit.agrawal@arm.com>
Qais Yousef <qsyousef@gmail.com> <qais.yousef@imgtec.com>
Quentin Monnet <quentin@isovalent.com> <quentin.monnet@netronome.com>
Quentin Perret <qperret@qperret.net> <quentin.perret@arm.com>
Rafael J. Wysocki <rjw@rjwysocki.net> <rjw@sisk.pl>
Oleksij Rempel <linux@rempel-privat.de> <bug-track@fisher-privat.net>
Oleksij Rempel <linux@rempel-privat.de> <external.Oleksij.Rempel@de.bosch.com>
Oleksij Rempel <linux@rempel-privat.de> <fixed-term.Oleksij.Rempel@de.bosch.com>
Oleksij Rempel <linux@rempel-privat.de> <o.rempel@pengutronix.de>
Oleksij Rempel <linux@rempel-privat.de> <ore@pengutronix.de>
Rajesh Shah <rajesh.shah@intel.com>
Ralf Baechle <ralf@linux-mips.org>
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
@@ -252,7 +219,6 @@ Shuah Khan <shuah@kernel.org> <shuahkhan@gmail.com>
Shuah Khan <shuah@kernel.org> <shuah.khan@hp.com>
Shuah Khan <shuah@kernel.org> <shuahkh@osg.samsung.com>
Shuah Khan <shuah@kernel.org> <shuah.kh@samsung.com>
Simon Arlott <simon@octiron.net> <simon@fire.lp0.eu>
Simon Kelley <simon@thekelleys.org.uk>
Stéphane Witzmann <stephane.witzmann@ubpmes.univ-bpclermont.fr>
Stephen Hemminger <shemminger@osdl.org>
@@ -263,8 +229,6 @@ Sumit Semwal <sumit.semwal@ti.com>
Tejun Heo <htejun@gmail.com>
Thomas Graf <tgraf@suug.ch>
Thomas Pedersen <twp@codeaurora.org>
Tiezhu Yang <yangtiezhu@loongson.cn> <kernelpatch@126.com>
Todor Tomov <todor.too@gmail.com> <todor.tomov@linaro.org>
Tony Luck <tony.luck@intel.com>
TripleX Chung <xxx.phy@gmail.com> <zhongyu@18mail.cn>
TripleX Chung <xxx.phy@gmail.com> <triplex@zh-kernel.org>
@@ -279,7 +243,6 @@ Vinod Koul <vkoul@kernel.org> <vkoul@infradead.org>
Viresh Kumar <vireshk@kernel.org> <viresh.kumar@st.com>
Viresh Kumar <vireshk@kernel.org> <viresh.linux@gmail.com>
Viresh Kumar <vireshk@kernel.org> <viresh.kumar2@arm.com>
Vivien Didelot <vivien.didelot@gmail.com> <vivien.didelot@savoirfairelinux.com>
Vlad Dogaru <ddvlad@gmail.com> <vlad.dogaru@intel.com>
Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@virtuozzo.com>
Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@parallels.com>
@@ -291,5 +254,3 @@ Gustavo Padovan <gustavo@las.ic.unicamp.br>
Gustavo Padovan <padovan@profusion.mobi>
Changbin Du <changbin.du@intel.com> <changbin.du@intel.com>
Changbin Du <changbin.du@intel.com> <changbin.du@gmail.com>
Steve Wise <larrystevenwise@gmail.com> <swise@chelsio.com>
Steve Wise <larrystevenwise@gmail.com> <swise@opengridcomputing.com>

View File

@@ -16,5 +16,3 @@ In addition, other licenses may also apply. Please see:
Documentation/process/license-rules.rst
for more details.
All contributions to the Linux Kernel are subject to this COPYING file.

18
CREDITS
View File

@@ -567,11 +567,6 @@ D: Original author of Amiga FFS filesystem
S: Orlando, Florida
S: USA
N: Paul Burton
E: paulburton@kernel.org
W: https://pburton.com
D: MIPS maintainer 2018-2020
N: Lennert Buytenhek
E: kernel@wantstofly.org
D: Original (2.4) rewrite of the ethernet bridging code
@@ -756,7 +751,7 @@ S: Santa Cruz, California
S: USA
N: Luis Correia
E: luisfcorreia@gmail.com
E: lfcorreia@users.sf.net
D: Ralink rt2x00 WLAN driver
S: Belas, Portugal
@@ -1642,10 +1637,6 @@ S: Panoramastrasse 18
S: D-69126 Heidelberg
S: Germany
N: Simon Horman
M: horms@verge.net.au
D: Renesas ARM/ARM64 SoC maintainer
N: Christopher Horn
E: chorn@warwick.net
D: Miscellaneous sysctl hacks
@@ -1880,9 +1871,8 @@ S: The Netherlands
N: Martin Kepplinger
E: martink@posteo.de
E: martin.kepplinger@puri.sm
E: martin.kepplinger@ginzinger.com
W: http://www.martinkepplinger.com
P: 4096R/5AB387D3 F208 2B88 0F9E 4239 3468 6E3F 5003 98DF 5AB3 87D3
D: mma8452 accelerators iio driver
D: pegasus_notetaker input driver
D: Kernel fixes and cleanups
@@ -3307,9 +3297,7 @@ S: France
N: Aleksa Sarai
E: cyphar@cyphar.com
W: https://www.cyphar.com/
D: /sys/fs/cgroup/pids
D: openat2(2)
S: Sydney, Australia
D: `pids` cgroup subsystem
N: Dipankar Sarma
E: dipankar@in.ibm.com

View File

@@ -1,26 +0,0 @@
What: /sys/fs/selinux/disable
Date: April 2005 (predates git)
KernelVersion: 2.6.12-rc2 (predates git)
Contact: selinux@vger.kernel.org
Description:
The selinuxfs "disable" node allows SELinux to be disabled at runtime
prior to a policy being loaded into the kernel. If disabled via this
mechanism, SELinux will remain disabled until the system is rebooted.
The preferred method of disabling SELinux is via the "selinux=0" boot
parameter, but the selinuxfs "disable" node was created to make it
easier for systems with primitive bootloaders that did not allow for
easy modification of the kernel command line. Unfortunately, allowing
for SELinux to be disabled at runtime makes it difficult to secure the
kernel's LSM hooks using the "__ro_after_init" feature.
Thankfully, the need for the SELinux runtime disable appears to be
gone, the default Kconfig configuration disables this selinuxfs node,
and only one of the major distributions, Fedora, supports disabling
SELinux at runtime. Fedora is in the process of removing the
selinuxfs "disable" node and once that is complete we will start the
slow process of removing this code from the kernel.
More information on /sys/fs/selinux/disable can be found under the
CONFIG_SECURITY_SELINUX_DISABLE Kconfig option.

View File

@@ -6,6 +6,6 @@ Description: Bus scanning interval, microseconds component.
control systems are attached/generate presence for as short as
100 ms - hence the tens-to-hundreds milliseconds scan intervals
are required.
see Documentation/w1/w1-generic.rst for detailed information.
see Documentation/w1/w1.generic for detailed information.
Users: any user space application which wants to know bus scanning
interval

View File

@@ -314,6 +314,25 @@ Description:
board_id: (RO) Manufacturing board ID
sysfs interface for Chelsio T3 RDMA Driver (cxgb3)
--------------------------------------------------
What: /sys/class/infiniband/cxgb3_X/hw_rev
What: /sys/class/infiniband/cxgb3_X/hca_type
What: /sys/class/infiniband/cxgb3_X/board_id
Date: Feb, 2007
KernelVersion: v2.6.21
Contact: linux-rdma@vger.kernel.org
Description:
hw_rev: (RO) Hardware revision number
hca_type: (RO) HCA type. Here it is a driver short name.
It should normally match the name in its bus
driver structure (e.g. pci_driver::name).
board_id: (RO) Manufacturing board id
sysfs interface for Mellanox ConnectX HCA IB driver (mlx4)
----------------------------------------------------------

View File

@@ -1,7 +1,7 @@
What: /sys/class/tpm/tpmX/device/
Date: April 2005
KernelVersion: 2.6.12
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The device/ directory under a specific TPM instance exposes
the properties of that TPM chip
@@ -9,7 +9,7 @@ Description: The device/ directory under a specific TPM instance exposes
What: /sys/class/tpm/tpmX/device/active
Date: April 2006
KernelVersion: 2.6.17
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "active" property prints a '1' if the TPM chip is accepting
commands. An inactive TPM chip still contains all the state of
an active chip (Storage Root Key, NVRAM, etc), and can be
@@ -21,7 +21,7 @@ Description: The "active" property prints a '1' if the TPM chip is accepting
What: /sys/class/tpm/tpmX/device/cancel
Date: June 2005
KernelVersion: 2.6.13
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "cancel" property allows you to cancel the currently
pending TPM command. Writing any value to cancel will call the
TPM vendor specific cancel operation.
@@ -29,7 +29,7 @@ Description: The "cancel" property allows you to cancel the currently
What: /sys/class/tpm/tpmX/device/caps
Date: April 2005
KernelVersion: 2.6.12
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "caps" property contains TPM manufacturer and version info.
Example output:
@@ -46,7 +46,7 @@ Description: The "caps" property contains TPM manufacturer and version info.
What: /sys/class/tpm/tpmX/device/durations
Date: March 2011
KernelVersion: 3.1
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "durations" property shows the 3 vendor-specific values
used to wait for a short, medium and long TPM command. All
TPM commands are categorized as short, medium or long in
@@ -69,7 +69,7 @@ Description: The "durations" property shows the 3 vendor-specific values
What: /sys/class/tpm/tpmX/device/enabled
Date: April 2006
KernelVersion: 2.6.17
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "enabled" property prints a '1' if the TPM chip is enabled,
meaning that it should be visible to the OS. This property
may be visible but produce a '0' after some operation that
@@ -78,7 +78,7 @@ Description: The "enabled" property prints a '1' if the TPM chip is enabled,
What: /sys/class/tpm/tpmX/device/owned
Date: April 2006
KernelVersion: 2.6.17
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "owned" property produces a '1' if the TPM_TakeOwnership
ordinal has been executed successfully in the chip. A '0'
indicates that ownership hasn't been taken.
@@ -86,7 +86,7 @@ Description: The "owned" property produces a '1' if the TPM_TakeOwnership
What: /sys/class/tpm/tpmX/device/pcrs
Date: April 2005
KernelVersion: 2.6.12
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "pcrs" property will dump the current value of all Platform
Configuration Registers in the TPM. Note that since these
values may be constantly changing, the output is only valid
@@ -109,7 +109,7 @@ Description: The "pcrs" property will dump the current value of all Platform
What: /sys/class/tpm/tpmX/device/pubek
Date: April 2005
KernelVersion: 2.6.12
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "pubek" property will return the TPM's public endorsement
key if possible. If the TPM has had ownership established and
is version 1.2, the pubek will not be available without the
@@ -161,7 +161,7 @@ Description: The "pubek" property will return the TPM's public endorsement
What: /sys/class/tpm/tpmX/device/temp_deactivated
Date: April 2006
KernelVersion: 2.6.17
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "temp_deactivated" property returns a '1' if the chip has
been temporarily deactivated, usually until the next power
cycle. Whether a warm boot (reboot) will clear a TPM chip
@@ -170,7 +170,7 @@ Description: The "temp_deactivated" property returns a '1' if the chip has
What: /sys/class/tpm/tpmX/device/timeouts
Date: March 2011
KernelVersion: 3.1
Contact: linux-integrity@vger.kernel.org
Contact: tpmdd-devel@lists.sf.net
Description: The "timeouts" property shows the 4 vendor-specific values
for the TPM's interface spec timeouts. The use of these
timeouts is defined by the TPM interface spec that the chip
@@ -183,14 +183,3 @@ Description: The "timeouts" property shows the 4 vendor-specific values
The four timeout values are shown in usecs, with a trailing
"[original]" or "[adjusted]" depending on whether the values
were scaled by the driver to be reported in usec from msecs.
What: /sys/class/tpm/tpmX/tpm_version_major
Date: October 2019
KernelVersion: 5.5
Contact: linux-integrity@vger.kernel.org
Description: The "tpm_version_major" property shows the TCG spec major version
implemented by the TPM device.
Example output:
2

View File

@@ -6,19 +6,10 @@ Description: Configures which IO port the host side of the UART
Users: OpenBMC. Proposed changes should be mailed to
openbmc@lists.ozlabs.org
What: /sys/bus/platform/drivers/aspeed-vuart/*/sirq
What: /sys/bus/platform/drivers/aspeed-vuart*/sirq
Date: April 2017
Contact: Jeremy Kerr <jk@ozlabs.org>
Description: Configures which interrupt number the host side of
the UART will appear on the host <-> BMC LPC bus.
Users: OpenBMC. Proposed changes should be mailed to
openbmc@lists.ozlabs.org
What: /sys/bus/platform/drivers/aspeed-vuart/*/sirq_polarity
Date: July 2019
Contact: Oskar Senft <osk@google.com>
Description: Configures the polarity of the serial interrupt to the
host via the BMC LPC bus.
Set to 0 for active-low or 1 for active-high.
Users: OpenBMC. Proposed changes should be mailed to
openbmc@lists.ozlabs.org

View File

@@ -1,171 +0,0 @@
What: sys/bus/dsa/devices/dsa<m>/cdev_major
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The major number that the character device driver assigned to
this device.
What: sys/bus/dsa/devices/dsa<m>/errors
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The error information for this device.
What: sys/bus/dsa/devices/dsa<m>/max_batch_size
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The largest number of work descriptors in a batch.
What: sys/bus/dsa/devices/dsa<m>/max_work_queues_size
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The maximum work queue size supported by this device.
What: sys/bus/dsa/devices/dsa<m>/max_engines
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The maximum number of engines supported by this device.
What: sys/bus/dsa/devices/dsa<m>/max_groups
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The maximum number of groups can be created under this device.
What: sys/bus/dsa/devices/dsa<m>/max_tokens
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The total number of bandwidth tokens supported by this device.
The bandwidth tokens represent resources within the DSA
implementation, and these resources are allocated by engines to
support operations.
What: sys/bus/dsa/devices/dsa<m>/max_transfer_size
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The number of bytes to be read from the source address to
perform the operation. The maximum transfer size is dependent on
the workqueue the descriptor was submitted to.
What: sys/bus/dsa/devices/dsa<m>/max_work_queues
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The maximum work queue number that this device supports.
What: sys/bus/dsa/devices/dsa<m>/numa_node
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The numa node number for this device.
What: sys/bus/dsa/devices/dsa<m>/op_cap
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The operation capability bit mask specify the operation types
supported by the this device.
What: sys/bus/dsa/devices/dsa<m>/state
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The state information of this device. It can be either enabled
or disabled.
What: sys/bus/dsa/devices/dsa<m>/group<m>.<n>
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The assigned group under this device.
What: sys/bus/dsa/devices/dsa<m>/engine<m>.<n>
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The assigned engine under this device.
What: sys/bus/dsa/devices/dsa<m>/wq<m>.<n>
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The assigned work queue under this device.
What: sys/bus/dsa/devices/dsa<m>/configurable
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: To indicate if this device is configurable or not.
What: sys/bus/dsa/devices/dsa<m>/token_limit
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The maximum number of bandwidth tokens that may be in use at
one time by operations that access low bandwidth memory in the
device.
What: sys/bus/dsa/devices/wq<m>.<n>/group_id
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The group id that this work queue belongs to.
What: sys/bus/dsa/devices/wq<m>.<n>/size
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The work queue size for this work queue.
What: sys/bus/dsa/devices/wq<m>.<n>/type
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The type of this work queue, it can be "kernel" type for work
queue usages in the kernel space or "user" type for work queue
usages by applications in user space.
What: sys/bus/dsa/devices/wq<m>.<n>/cdev_minor
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The minor number assigned to this work queue by the character
device driver.
What: sys/bus/dsa/devices/wq<m>.<n>/mode
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The work queue mode type for this work queue.
What: sys/bus/dsa/devices/wq<m>.<n>/priority
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The priority value of this work queue, it is a vlue relative to
other work queue in the same group to control quality of service
for dispatching work from multiple workqueues in the same group.
What: sys/bus/dsa/devices/wq<m>.<n>/state
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The current state of the work queue.
What: sys/bus/dsa/devices/wq<m>.<n>/threshold
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The number of entries in this work queue that may be filled
via a limited portal.
What: sys/bus/dsa/devices/engine<m>.<n>/group_id
Date: Oct 25, 2019
KernelVersion: 5.6.0
Contact: dmaengine@vger.kernel.org
Description: The group that this engine belongs to.

View File

@@ -67,8 +67,6 @@ Description: Interface for making ib_srp connect to a new target.
initiator is allowed to queue per SCSI host. The default
value for this parameter is 62. The lowest supported value
is 2.
* max_it_iu_size, a decimal number specifying the maximum
initiator to target information unit length.
What: /sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev
Date: January 2, 2006

View File

@@ -1,4 +1,5 @@
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/asic_health
Date: June 2018
KernelVersion: 4.19
Contact: Vadim Pasternak <vadimpmellanox.com>
@@ -18,6 +19,7 @@ Description: These files show with which CPLD versions have been burned
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/fan_dir
Date: December 2018
KernelVersion: 5.0
Contact: Vadim Pasternak <vadimpmellanox.com>
@@ -27,16 +29,18 @@ Description: This file shows the system fans direction:
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld3_version
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/jtag_enable
Date: November 2018
KernelVersion: 5.0
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files show with which CPLD versions have been burned
on LED or Gearbox board.
on LED board.
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/jtag_enable
Date: November 2018
KernelVersion: 5.0
Contact: Vadim Pasternak <vadimpmellanox.com>
@@ -104,6 +108,7 @@ What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_pwr_fail
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_comex
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_system
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_voltmon_upgrade_fail
Date: November 2018
KernelVersion: 5.0
Contact: Vadim Pasternak <vadimpmellanox.com>
@@ -116,21 +121,6 @@ Description: These files show the system reset cause, as following: ComEx
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld4_version
Date: November 2018
KernelVersion: 5.0
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files show with which CPLD versions have been burned
on LED board.
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_thermal
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_wd
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_asic
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_reload_bios
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sff_wd
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_swb_wd
Date: June 2019
KernelVersion: 5.3
Contact: Vadim Pasternak <vadimpmellanox.com>
@@ -144,65 +134,9 @@ Description: These files show the system reset cause, as following:
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/config1
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/config2
Date: January 2020
KernelVersion: 5.6
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files show system static topology identification
like system's static I2C topology, number and type of FPGA
devices within the system and so on.
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_ac_pwr_fail
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_platform
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_soc
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sw_pwr_off
Date: January 2020
KernelVersion: 5.6
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files show the system reset causes, as following: reset
due to AC power failure, reset invoked from software by
assertion reset signal through CPLD. reset caused by signal
asserted by SOC through ACPI register, reset invoked from
software by assertion power off signal through CPLD.
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/pcie_asic_reset_dis
Date: January 2020
KernelVersion: 5.6
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: This file allows to retain ASIC up during PCIe root complex
reset, when attribute is set 1.
The file is read/write.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/vpd_wp
Date: January 2020
KernelVersion: 5.6
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: This file allows to overwrite system VPD hardware wrtie
protection when attribute is set 1.
The file is read/write.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/voltreg_update_status
Date: January 2020
KernelVersion: 5.6
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: This file exposes the configuration update status of burnable
voltage regulator devices. The status values are as following:
0 - OK; 1 - CRC failure; 2 = I2C failure; 3 - in progress.
The file is read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/ufm_version
Date: January 2020
KernelVersion: 5.6
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: This file exposes the firmware version of burnable voltage
regulator devices.
The file is read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_thermal
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_wd
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_asic
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_reload_bios
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sff_wd
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_swb_wd

View File

@@ -2,7 +2,7 @@ What: /sys/bus/w1/devices/.../pio
Date: May 2012
Contact: Markus Franke <franm@hrz.tu-chemnitz.de>
Description: read/write the contents of the two PIO's of the DS28E04-100
see Documentation/w1/slaves/w1_ds28e04.rst for detailed information
see Documentation/w1/slaves/w1_ds28e04 for detailed information
Users: any user space application which wants to communicate with DS28E04-100
@@ -11,5 +11,5 @@ What: /sys/bus/w1/devices/.../eeprom
Date: May 2012
Contact: Markus Franke <franm@hrz.tu-chemnitz.de>
Description: read/write the contents of the EEPROM memory of the DS28E04-100
see Documentation/w1/slaves/w1_ds28e04.rst for detailed information
see Documentation/w1/slaves/w1_ds28e04 for detailed information
Users: any user space application which wants to communicate with DS28E04-100

View File

@@ -2,5 +2,5 @@ What: /sys/bus/w1/devices/.../w1_seq
Date: Apr 2015
Contact: Matt Campbell <mattrcampbell@gmail.com>
Description: Support for the DS28EA00 chain sequence function
see Documentation/w1/slaves/w1_therm.rst for detailed information
see Documentation/w1/slaves/w1_therm for detailed information
Users: any user space application which wants to communicate with DS28EA00

View File

@@ -16,10 +16,6 @@ Description:
write UDC's name found in /sys/class/udc/*
to bind a gadget, empty string "" to unbind.
max_speed - maximum speed the driver supports. Valid
names are super-speed-plus, super-speed,
high-speed, full-speed, and low-speed.
bDeviceClass - USB device class code
bDeviceSubClass - USB device subclass code
bDeviceProtocol - USB device protocol code

View File

@@ -1,57 +0,0 @@
What: /sys/kernel/debug/hisi_hpre/<bdf>/cluster[0-3]/regs
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: Dump debug registers from the HPRE cluster.
Only available for PF.
What: /sys/kernel/debug/hisi_hpre/<bdf>/cluster[0-3]/cluster_ctrl
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: Write the HPRE core selection in the cluster into this file,
and then we can read the debug information of the core.
Only available for PF.
What: /sys/kernel/debug/hisi_hpre/<bdf>/rdclr_en
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: HPRE cores debug registers read clear control. 1 means enable
register read clear, otherwise 0. Writing to this file has no
functional effect, only enable or disable counters clear after
reading of these registers.
Only available for PF.
What: /sys/kernel/debug/hisi_hpre/<bdf>/current_qm
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: One HPRE controller has one PF and multiple VFs, each function
has a QM. Select the QM which below qm refers to.
Only available for PF.
What: /sys/kernel/debug/hisi_hpre/<bdf>/regs
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: Dump debug registers from the HPRE.
Only available for PF.
What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/qm_regs
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: Dump debug registers from the QM.
Available for PF and VF in host. VF in guest currently only
has one debug register.
What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/current_q
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: One QM may contain multiple queues. Select specific queue to
show its debug registers in above qm_regs.
Only available for PF.
What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/clear_enable
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: QM debug registers(qm_regs) read clear control. 1 means enable
register read clear, otherwise 0.
Writing to this file has no functional effect, only enable or
disable counters clear after reading of these registers.
Only available for PF.

View File

@@ -1,43 +0,0 @@
What: /sys/kernel/debug/hisi_sec/<bdf>/sec_dfx
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: Dump the debug registers of SEC cores.
Only available for PF.
What: /sys/kernel/debug/hisi_sec/<bdf>/clear_enable
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: Enabling/disabling of clear action after reading
the SEC debug registers.
0: disable, 1: enable.
Only available for PF, and take no other effect on SEC.
What: /sys/kernel/debug/hisi_sec/<bdf>/current_qm
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: One SEC controller has one PF and multiple VFs, each function
has a QM. This file can be used to select the QM which below
qm refers to.
Only available for PF.
What: /sys/kernel/debug/hisi_sec/<bdf>/qm/qm_regs
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: Dump of QM related debug registers.
Available for PF and VF in host. VF in guest currently only
has one debug register.
What: /sys/kernel/debug/hisi_sec/<bdf>/qm/current_q
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: One QM of SEC may contain multiple queues. Select specific
queue to show its debug registers in above 'qm_regs'.
Only available for PF.
What: /sys/kernel/debug/hisi_sec/<bdf>/qm/clear_enable
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: Enabling/disabling of clear action after reading
the SEC's QM debug registers.
0: disable, 1: enable.
Only available for PF, and take no other effect on SEC.

View File

@@ -1,50 +0,0 @@
What: /sys/kernel/debug/hisi_zip/<bdf>/comp_core[01]/regs
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: Dump of compression cores related debug registers.
Only available for PF.
What: /sys/kernel/debug/hisi_zip/<bdf>/decomp_core[0-5]/regs
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: Dump of decompression cores related debug registers.
Only available for PF.
What: /sys/kernel/debug/hisi_zip/<bdf>/clear_enable
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: Compression/decompression core debug registers read clear
control. 1 means enable register read clear, otherwise 0.
Writing to this file has no functional effect, only enable or
disable counters clear after reading of these registers.
Only available for PF.
What: /sys/kernel/debug/hisi_zip/<bdf>/current_qm
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: One ZIP controller has one PF and multiple VFs, each function
has a QM. Select the QM which below qm refers to.
Only available for PF.
What: /sys/kernel/debug/hisi_zip/<bdf>/qm/qm_regs
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: Dump of QM related debug registers.
Available for PF and VF in host. VF in guest currently only
has one debug register.
What: /sys/kernel/debug/hisi_zip/<bdf>/qm/current_q
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: One QM may contain multiple queues. Select specific queue to
show its debug registers in above qm_regs.
Only available for PF.
What: /sys/kernel/debug/hisi_zip/<bdf>/qm/clear_enable
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: QM debug registers(qm_regs) read clear control. 1 means enable
register read clear, otherwise 0.
Writing to this file has no functional effect, only enable or
disable counters clear after reading of these registers.
Only available for PF.

View File

@@ -1,23 +0,0 @@
What: /sys/kernel/debug/hyperv/<UUID>/fuzz_test_state
Date: October 2019
KernelVersion: 5.5
Contact: Branden Bonaby <brandonbonaby94@gmail.com>
Description: Fuzz testing status of a vmbus device, whether its in an ON
state or a OFF state
Users: Debugging tools
What: /sys/kernel/debug/hyperv/<UUID>/delay/fuzz_test_buffer_interrupt_delay
Date: October 2019
KernelVersion: 5.5
Contact: Branden Bonaby <brandonbonaby94@gmail.com>
Description: Fuzz testing buffer interrupt delay value between 0 - 1000
microseconds (inclusive).
Users: Debugging tools
What: /sys/kernel/debug/hyperv/<UUID>/delay/fuzz_test_message_delay
Date: October 2019
KernelVersion: 5.5
Contact: Branden Bonaby <brandonbonaby94@gmail.com>
Description: Fuzz testing message delay value between 0 - 1000 microseconds
(inclusive).
Users: Debugging tools

View File

@@ -1,23 +0,0 @@
What: /sys/kernel/debug/moxtet/input
Date: March 2019
KernelVersion: 5.3
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) Read input from the shift registers, in hexadecimal.
Returns N+1 bytes, where N is the number of Moxtet connected
modules. The first byte is from the CPU board itself.
Example: 101214
10: CPU board with SD card
12: 2 = PCIe module, 1 = IRQ not active
14: 4 = Peridot module, 1 = IRQ not active
What: /sys/kernel/debug/moxtet/output
Date: March 2019
KernelVersion: 5.3
Contact: Marek Behún <marek.behun@nic.cz>
Description: (RW) Read last written value to the shift registers, in
hexadecimal, or write values to the shift registers, also
in hexadecimal.
Example: 0102
01: 01 was last written, or is to be written, to the
first module's shift register
02: the same for second module

View File

@@ -12,7 +12,7 @@ Description: The /dev/kmsg character device node provides userspace access
The logged line can be prefixed with a <N> syslog prefix, which
carries the syslog priority and facility. The single decimal
prefix number is composed of the 3 lowest bits being the syslog
priority and the next 8 bits the syslog facility number.
priority and the higher bits the syslog facility number.
If no prefix is given, the priority number is the default kernel
log priority and the facility number is set to LOG_USER (1). It
@@ -90,12 +90,13 @@ Description: The /dev/kmsg character device node provides userspace access
+sound:card0 - subsystem:devname
The flags field carries '-' by default. A 'c' indicates a
fragment of a line. Note, that these hints about continuation
lines are not necessarily correct, and the stream could be
interleaved with unrelated messages, but merging the lines in
the output usually produces better human readable results. A
similar logic is used internally when messages are printed to
the console, /proc/kmsg or the syslog() syscall.
fragment of a line. All following fragments are flagged with
'+'. Note, that these hints about continuation lines are not
necessarily correct, and the stream could be interleaved with
unrelated messages, but merging the lines in the output
usually produces better human readable results. A similar
logic is used internally when messages are printed to the
console, /proc/kmsg or the syslog() syscall.
By default, kernel tries to avoid fragments by concatenating
when it can and fragments are rare; however, when extended

View File

@@ -25,11 +25,10 @@ Description:
lsm: [[subj_user=] [subj_role=] [subj_type=]
[obj_user=] [obj_role=] [obj_type=]]
option: [[appraise_type=]] [template=] [permit_directio]
[appraise_flag=] [keyrings=]
base: func:= [BPRM_CHECK][MMAP_CHECK][CREDS_CHECK][FILE_CHECK][MODULE_CHECK]
[FIRMWARE_CHECK]
[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
[KEXEC_CMDLINE] [KEY_CHECK]
[KEXEC_CMDLINE]
mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
[[^]MAY_EXEC]
fsmagic:= hex value
@@ -38,13 +37,7 @@ Description:
euid:= decimal value
fowner:= decimal value
lsm: are LSM specific
option: appraise_type:= [imasig] [imasig|modsig]
appraise_flag:= [check_blacklist]
Currently, blacklist check is only for files signed with appended
signature.
keyrings:= list of keyrings
(eg, .builtin_trusted_keys|.ima). Only valid
when action is "measure" and func is KEY_CHECK.
option: appraise_type:= [imasig]
template:= name of a defined IMA template type
(eg, ima-ng). Only valid when action is "measure".
pcr:= decimal value
@@ -112,16 +105,3 @@ Description:
measure func=KEXEC_KERNEL_CHECK pcr=4
measure func=KEXEC_INITRAMFS_CHECK pcr=5
Example of appraise rule allowing modsig appended signatures:
appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig
Example of measure rule using KEY_CHECK to measure all keys:
measure func=KEY_CHECK
Example of measure rule using KEY_CHECK to only measure
keys added to .builtin_trusted_keys or .ima keyring:
measure func=KEY_CHECK keyrings=.builtin_trusted_keys|.ima

View File

@@ -29,9 +29,4 @@ Description:
17 - sectors discarded
18 - time spent discarding
Kernel 5.5+ appends two more fields for flush requests:
19 - flush requests completed successfully
20 - time spent flushing
For more details refer to Documentation/admin-guide/iostats.rst

View File

@@ -33,14 +33,6 @@ Description:
Requires a separate RTC_PIE_ON call to enable the periodic
interrupts.
* RTC_VL_READ: Read the voltage inputs status of the RTC when
supported. The value is a bit field of RTC_VL_*, giving the
status of the main and backup voltages.
* RTC_VL_CLEAR: Clear the voltage status of the RTC. Some RTCs
need user interaction when the backup power provider is
replaced or charged to be able to clear the status.
The ioctl() calls supported by the older /dev/rtc interface are
also supported by the newer RTC class framework. However,
because the chips and systems are not standardized, some PC/AT

View File

@@ -15,12 +15,6 @@ Description:
9 - I/Os currently in progress
10 - time spent doing I/Os (ms)
11 - weighted time spent doing I/Os (ms)
12 - discards completed
13 - discards merged
14 - sectors discarded
15 - time spent discarding (ms)
16 - flush requests completed
17 - time spent flushing (ms)
For more details refer Documentation/admin-guide/iostats.rst

View File

@@ -1,4 +1,4 @@
What: /sys/bus/coresight/devices/etm<N>/enable_source
What: /sys/bus/coresight/devices/<memory_map>.etm/enable_source
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
@@ -8,82 +8,82 @@ Description: (RW) Enable/disable tracing on this specific trace entiry.
of coresight components linking the source to the sink is
configured and managed automatically by the coresight framework.
What: /sys/bus/coresight/devices/etm<N>/cpu
What: /sys/bus/coresight/devices/<memory_map>.etm/cpu
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) The CPU this tracing entity is associated with.
What: /sys/bus/coresight/devices/etm<N>/nr_pe_cmp
What: /sys/bus/coresight/devices/<memory_map>.etm/nr_pe_cmp
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates the number of PE comparator inputs that are
available for tracing.
What: /sys/bus/coresight/devices/etm<N>/nr_addr_cmp
What: /sys/bus/coresight/devices/<memory_map>.etm/nr_addr_cmp
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates the number of address comparator pairs that are
available for tracing.
What: /sys/bus/coresight/devices/etm<N>/nr_cntr
What: /sys/bus/coresight/devices/<memory_map>.etm/nr_cntr
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates the number of counters that are available for
tracing.
What: /sys/bus/coresight/devices/etm<N>/nr_ext_inp
What: /sys/bus/coresight/devices/<memory_map>.etm/nr_ext_inp
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates how many external inputs are implemented.
What: /sys/bus/coresight/devices/etm<N>/numcidc
What: /sys/bus/coresight/devices/<memory_map>.etm/numcidc
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates the number of Context ID comparators that are
available for tracing.
What: /sys/bus/coresight/devices/etm<N>/numvmidc
What: /sys/bus/coresight/devices/<memory_map>.etm/numvmidc
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates the number of VMID comparators that are available
for tracing.
What: /sys/bus/coresight/devices/etm<N>/nrseqstate
What: /sys/bus/coresight/devices/<memory_map>.etm/nrseqstate
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates the number of sequencer states that are
implemented.
What: /sys/bus/coresight/devices/etm<N>/nr_resource
What: /sys/bus/coresight/devices/<memory_map>.etm/nr_resource
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates the number of resource selection pairs that are
available for tracing.
What: /sys/bus/coresight/devices/etm<N>/nr_ss_cmp
What: /sys/bus/coresight/devices/<memory_map>.etm/nr_ss_cmp
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Indicates the number of single-shot comparator controls that
are available for tracing.
What: /sys/bus/coresight/devices/etm<N>/reset
What: /sys/bus/coresight/devices/<memory_map>.etm/reset
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (W) Cancels all configuration on a trace unit and set it back
to its boot configuration.
What: /sys/bus/coresight/devices/etm<N>/mode
What: /sys/bus/coresight/devices/<memory_map>.etm/mode
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
@@ -91,349 +91,302 @@ Description: (RW) Controls various modes supported by this ETM, for example
P0 instruction tracing, branch broadcast, cycle counting and
context ID tracing.
What: /sys/bus/coresight/devices/etm<N>/pe
What: /sys/bus/coresight/devices/<memory_map>.etm/pe
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls which PE to trace.
What: /sys/bus/coresight/devices/etm<N>/event
What: /sys/bus/coresight/devices/<memory_map>.etm/event
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls the tracing of arbitrary events from bank 0 to 3.
What: /sys/bus/coresight/devices/etm<N>/event_instren
What: /sys/bus/coresight/devices/<memory_map>.etm/event_instren
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls the behavior of the events in bank 0 to 3.
What: /sys/bus/coresight/devices/etm<N>/event_ts
What: /sys/bus/coresight/devices/<memory_map>.etm/event_ts
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls the insertion of global timestamps in the trace
streams.
What: /sys/bus/coresight/devices/etm<N>/syncfreq
What: /sys/bus/coresight/devices/<memory_map>.etm/syncfreq
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls how often trace synchronization requests occur.
What: /sys/bus/coresight/devices/etm<N>/cyc_threshold
What: /sys/bus/coresight/devices/<memory_map>.etm/cyc_threshold
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Sets the threshold value for cycle counting.
What: /sys/bus/coresight/devices/etm<N>/bb_ctrl
What: /sys/bus/coresight/devices/<memory_map>.etm/bb_ctrl
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls which regions in the memory map are enabled to
use branch broadcasting.
What: /sys/bus/coresight/devices/etm<N>/event_vinst
What: /sys/bus/coresight/devices/<memory_map>.etm/event_vinst
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls instruction trace filtering.
What: /sys/bus/coresight/devices/etm<N>/s_exlevel_vinst
What: /sys/bus/coresight/devices/<memory_map>.etm/s_exlevel_vinst
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) In Secure state, each bit controls whether instruction
tracing is enabled for the corresponding exception level.
What: /sys/bus/coresight/devices/etm<N>/ns_exlevel_vinst
What: /sys/bus/coresight/devices/<memory_map>.etm/ns_exlevel_vinst
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) In non-secure state, each bit controls whether instruction
tracing is enabled for the corresponding exception level.
What: /sys/bus/coresight/devices/etm<N>/addr_idx
What: /sys/bus/coresight/devices/<memory_map>.etm/addr_idx
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Select which address comparator or pair (of comparators) to
work with.
What: /sys/bus/coresight/devices/etm<N>/addr_instdatatype
What: /sys/bus/coresight/devices/<memory_map>.etm/addr_instdatatype
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls what type of comparison the trace unit performs.
What: /sys/bus/coresight/devices/etm<N>/addr_single
What: /sys/bus/coresight/devices/<memory_map>.etm/addr_single
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Used to setup single address comparator values.
What: /sys/bus/coresight/devices/etm<N>/addr_range
What: /sys/bus/coresight/devices/<memory_map>.etm/addr_range
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Used to setup address range comparator values.
What: /sys/bus/coresight/devices/etm<N>/seq_idx
What: /sys/bus/coresight/devices/<memory_map>.etm/seq_idx
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Select which sequensor.
What: /sys/bus/coresight/devices/etm<N>/seq_state
What: /sys/bus/coresight/devices/<memory_map>.etm/seq_state
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Use this to set, or read, the sequencer state.
What: /sys/bus/coresight/devices/etm<N>/seq_event
What: /sys/bus/coresight/devices/<memory_map>.etm/seq_event
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Moves the sequencer state to a specific state.
What: /sys/bus/coresight/devices/etm<N>/seq_reset_event
What: /sys/bus/coresight/devices/<memory_map>.etm/seq_reset_event
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Moves the sequencer to state 0 when a programmed event
occurs.
What: /sys/bus/coresight/devices/etm<N>/cntr_idx
What: /sys/bus/coresight/devices/<memory_map>.etm/cntr_idx
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Select which counter unit to work with.
What: /sys/bus/coresight/devices/etm<N>/cntrldvr
What: /sys/bus/coresight/devices/<memory_map>.etm/cntrldvr
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) This sets or returns the reload count value of the
specific counter.
What: /sys/bus/coresight/devices/etm<N>/cntr_val
What: /sys/bus/coresight/devices/<memory_map>.etm/cntr_val
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) This sets or returns the current count value of the
specific counter.
What: /sys/bus/coresight/devices/etm<N>/cntr_ctrl
What: /sys/bus/coresight/devices/<memory_map>.etm/cntr_ctrl
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls the operation of the selected counter.
What: /sys/bus/coresight/devices/etm<N>/res_idx
What: /sys/bus/coresight/devices/<memory_map>.etm/res_idx
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Select which resource selection unit to work with.
What: /sys/bus/coresight/devices/etm<N>/res_ctrl
What: /sys/bus/coresight/devices/<memory_map>.etm/res_ctrl
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Controls the selection of the resources in the trace unit.
What: /sys/bus/coresight/devices/etm<N>/ctxid_idx
What: /sys/bus/coresight/devices/<memory_map>.etm/ctxid_idx
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Select which context ID comparator to work with.
What: /sys/bus/coresight/devices/etm<N>/ctxid_pid
What: /sys/bus/coresight/devices/<memory_map>.etm/ctxid_pid
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Get/Set the context ID comparator value to trigger on.
What: /sys/bus/coresight/devices/etm<N>/ctxid_masks
What: /sys/bus/coresight/devices/<memory_map>.etm/ctxid_masks
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Mask for all 8 context ID comparator value
registers (if implemented).
What: /sys/bus/coresight/devices/etm<N>/vmid_idx
What: /sys/bus/coresight/devices/<memory_map>.etm/vmid_idx
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Select which virtual machine ID comparator to work with.
What: /sys/bus/coresight/devices/etm<N>/vmid_val
What: /sys/bus/coresight/devices/<memory_map>.etm/vmid_val
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Get/Set the virtual machine ID comparator value to
trigger on.
What: /sys/bus/coresight/devices/etm<N>/vmid_masks
What: /sys/bus/coresight/devices/<memory_map>.etm/vmid_masks
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Mask for all 8 virtual machine ID comparator value
registers (if implemented).
What: /sys/bus/coresight/devices/etm<N>/addr_exlevel_s_ns
Date: December 2019
KernelVersion: 5.5
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Set the Exception Level matching bits for secure and
non-secure exception levels.
What: /sys/bus/coresight/devices/etm<N>/vinst_pe_cmp_start_stop
Date: December 2019
KernelVersion: 5.5
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Access the start stop control register for PE input
comparators.
What: /sys/bus/coresight/devices/etm<N>/addr_cmp_view
Date: December 2019
KernelVersion: 5.5
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the current settings for the selected address
comparator.
What: /sys/bus/coresight/devices/etm<N>/sshot_idx
Date: December 2019
KernelVersion: 5.5
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Select the single shot control register to access.
What: /sys/bus/coresight/devices/etm<N>/sshot_ctrl
Date: December 2019
KernelVersion: 5.5
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Access the selected single shot control register.
What: /sys/bus/coresight/devices/etm<N>/sshot_status
Date: December 2019
KernelVersion: 5.5
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the current value of the selected single shot
status register.
What: /sys/bus/coresight/devices/etm<N>/sshot_pe_ctrl
Date: December 2019
KernelVersion: 5.5
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Access the selected single show PE comparator control
register.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcoslsr
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcoslsr
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the OS Lock Status Register (0x304).
The value it taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcpdcr
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcpdcr
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Power Down Control Register
(0x310). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcpdsr
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcpdsr
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Power Down Status Register
(0x314). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trclsr
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trclsr
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the SW Lock Status Register
(0xFB4). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcauthstatus
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcauthstatus
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Authentication Status Register
(0xFB8). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcdevid
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcdevid
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Device ID Register
(0xFC8). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcdevtype
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcdevtype
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Device Type Register
(0xFCC). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcpidr0
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcpidr0
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Peripheral ID0 Register
(0xFE0). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcpidr1
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcpidr1
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Peripheral ID1 Register
(0xFE4). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcpidr2
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcpidr2
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Peripheral ID2 Register
(0xFE8). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcpidr3
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcpidr3
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the Peripheral ID3 Register
(0xFEC). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trcconfig
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trcconfig
Date: February 2016
KernelVersion: 4.07
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the trace configuration register
(0x010) as currently set by SW.
What: /sys/bus/coresight/devices/etm<N>/mgmt/trctraceid
What: /sys/bus/coresight/devices/<memory_map>.etm/mgmt/trctraceid
Date: February 2016
KernelVersion: 4.07
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Print the content of the trace ID register (0x040).
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr0
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr0
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Returns the tracing capabilities of the trace unit (0x1E0).
The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr1
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr1
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Returns the tracing capabilities of the trace unit (0x1E4).
The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr2
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr2
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
@@ -441,7 +394,7 @@ Description: (R) Returns the maximum size of the data value, data address,
VMID, context ID and instuction address in the trace unit
(0x1E8). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr3
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr3
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
@@ -450,42 +403,42 @@ Description: (R) Returns the value associated with various resources
architecture specification for more details (0x1E8).
The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr4
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr4
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Returns how many resources the trace unit supports (0x1F0).
The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr5
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr5
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Returns how many resources the trace unit supports (0x1F4).
The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr8
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr8
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Returns the maximum speculation depth of the instruction
trace stream. (0x180). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr9
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr9
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Returns the number of P0 right-hand keys that the trace unit
can use (0x184). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr10
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr10
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (R) Returns the number of P1 right-hand keys that the trace unit
can use (0x188). The value is taken directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr11
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr11
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
@@ -493,7 +446,7 @@ Description: (R) Returns the number of special P1 right-hand keys that the
trace unit can use (0x18C). The value is taken directly from
the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr12
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr12
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
@@ -501,7 +454,7 @@ Description: (R) Returns the number of conditional P1 right-hand keys that
the trace unit can use (0x190). The value is taken directly
from the HW.
What: /sys/bus/coresight/devices/etm<N>/trcidr/trcidr13
What: /sys/bus/coresight/devices/<memory_map>.etm/trcidr/trcidr13
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier@linaro.org>

View File

@@ -1,25 +1,25 @@
What: /sys/bus/platform/devices/../fsi-master/fsi0/rescan
What: /sys/bus/platform/devices/fsi-master/rescan
Date: May 2017
KernelVersion: 4.12
Contact: linux-fsi@lists.ozlabs.org
Contact: cbostic@linux.vnet.ibm.com
Description:
Initiates a FSI master scan for all connected slave devices
on its links.
What: /sys/bus/platform/devices/../fsi-master/fsi0/break
What: /sys/bus/platform/devices/fsi-master/break
Date: May 2017
KernelVersion: 4.12
Contact: linux-fsi@lists.ozlabs.org
Contact: cbostic@linux.vnet.ibm.com
Description:
Sends an FSI BREAK command on a master's communication
link to any connnected slaves. A BREAK resets connected
device's logic and preps it to receive further commands
from the master.
What: /sys/bus/platform/devices/../fsi-master/fsi0/slave@00:00/term
What: /sys/bus/platform/devices/fsi-master/slave@00:00/term
Date: May 2017
KernelVersion: 4.12
Contact: linux-fsi@lists.ozlabs.org
Contact: cbostic@linux.vnet.ibm.com
Description:
Sends an FSI terminate command from the master to its
connected slave. A terminate resets the slave's state machines
@@ -29,10 +29,10 @@ Description:
ongoing operation in case of an expired 'Master Time Out'
timer.
What: /sys/bus/platform/devices/../fsi-master/fsi0/slave@00:00/raw
What: /sys/bus/platform/devices/fsi-master/slave@00:00/raw
Date: May 2017
KernelVersion: 4.12
Contact: linux-fsi@lists.ozlabs.org
Contact: cbostic@linux.vnet.ibm.com
Description:
Provides a means of reading/writing a 32 bit value from/to a
specified FSI bus address.

View File

@@ -753,8 +753,6 @@ What: /sys/.../events/in_illuminance0_thresh_falling_value
what: /sys/.../events/in_illuminance0_thresh_rising_value
what: /sys/.../events/in_proximity0_thresh_falling_value
what: /sys/.../events/in_proximity0_thresh_rising_value
What: /sys/.../events/in_illuminance_thresh_rising_value
What: /sys/.../events/in_illuminance_thresh_falling_value
KernelVersion: 2.6.37
Contact: linux-iio@vger.kernel.org
Description:
@@ -974,7 +972,6 @@ What: /sys/.../events/in_activity_jogging_thresh_rising_period
What: /sys/.../events/in_activity_jogging_thresh_falling_period
What: /sys/.../events/in_activity_running_thresh_rising_period
What: /sys/.../events/in_activity_running_thresh_falling_period
What: /sys/.../events/in_illuminance_thresh_either_period
KernelVersion: 2.6.37
Contact: linux-iio@vger.kernel.org
Description:
@@ -1718,24 +1715,3 @@ Description:
Mass concentration reading of particulate matter in ug / m3.
pmX consists of particles with aerodynamic diameter less or
equal to X micrometers.
What: /sys/bus/iio/devices/iio:deviceX/events/in_illuminance_period_available
Date: November 2019
KernelVersion: 5.4
Contact: linux-iio@vger.kernel.org
Description:
List of valid periods (in seconds) for which the light intensity
must be above the threshold level before interrupt is asserted.
What: /sys/bus/iio/devices/iio:deviceX/in_filter_notch_center_frequency
KernelVersion: 5.5
Contact: linux-iio@vger.kernel.org
Description:
Center frequency in Hz for a notch filter. Used i.e. for line
noise suppression.
What: /sys/bus/iio/devices/iio:deviceX/in_temp_thermocouple_type
KernelVersion: 5.5
Contact: linux-iio@vger.kernel.org
Description:
One of the following thermocouple types: B, E, J, K, N, R, S, T.

View File

@@ -1,39 +0,0 @@
What: /sys/bus/iio/devices/iio:deviceX/ac_excitation_en
KernelVersion:
Contact: linux-iio@vger.kernel.org
Description:
Reading gives the state of AC excitation.
Writing '1' enables AC excitation.
What: /sys/bus/iio/devices/iio:deviceX/bridge_switch_en
KernelVersion:
Contact: linux-iio@vger.kernel.org
Description:
This bridge switch is used to disconnect it when there is a
need to minimize the system current consumption.
Reading gives the state of the bridge switch.
Writing '1' enables the bridge switch.
What: /sys/bus/iio/devices/iio:deviceX/in_voltagex_sys_calibration
KernelVersion:
Contact: linux-iio@vger.kernel.org
Description:
Initiates the system calibration procedure. This is done on a
single channel at a time. Write '1' to start the calibration.
What: /sys/bus/iio/devices/iio:deviceX/in_voltagex_sys_calibration_mode_available
KernelVersion:
Contact: linux-iio@vger.kernel.org
Description:
Reading returns a list with the possible calibration modes.
There are two available options:
"zero_scale" - calibrate to zero scale
"full_scale" - calibrate to full scale
What: /sys/bus/iio/devices/iio:deviceX/in_voltagex_sys_calibration_mode
KernelVersion:
Contact: linux-iio@vger.kernel.org
Description:
Sets up the calibration mode used in the system calibration
procedure. Reading returns the current calibration mode.
Writing sets the system calibration mode.

View File

@@ -13,4 +13,4 @@ Description:
error on writing
If DFSDM input is SPI Slave:
Reading returns value previously set.
Writing value before starting conversions.
Writing value before starting conversions.

View File

@@ -1,19 +0,0 @@
What: /sys/bus/iio/devices/iio:deviceX/buffer/length_align_bytes
KernelVersion: 5.4
Contact: linux-iio@vger.kernel.org
Description:
DMA buffers tend to have a alignment requirement for the
buffers. If this alignment requirement is not met samples might
be dropped from the buffer.
This property reports the alignment requirements in bytes.
This means that the buffer size in bytes needs to be a integer
multiple of the number reported by this file.
The alignment requirements in number of sample sets will depend
on the enabled channels and the bytes per channel. This means
that the alignment requirement in samples sets might change
depending on which and how many channels are enabled. Whereas
the alignment requirement reported in bytes by this property
will remain static and does not depend on which channels are
enabled.

View File

@@ -91,6 +91,29 @@ Description:
When counting down the counter start from preset value
and fire event when reach 0.
What: /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
KernelVersion: 4.12
Contact: benjamin.gaignard@st.com
Description:
Reading returns the list possible quadrature modes.
What: /sys/bus/iio/devices/iio:deviceX/in_count0_quadrature_mode
KernelVersion: 4.12
Contact: benjamin.gaignard@st.com
Description:
Configure the device counter quadrature modes:
channel_A:
Encoder A input servers as the count input and B as
the UP/DOWN direction control input.
channel_B:
Encoder B input serves as the count input and A as
the UP/DOWN direction control input.
quadrature:
Encoder A and B inputs are mixed to get direction
and count with a scale of 0.25.
What: /sys/bus/iio/devices/iio:deviceX/in_count_enable_mode_available
KernelVersion: 4.12
Contact: benjamin.gaignard@st.com

View File

@@ -12,8 +12,7 @@ Description: (RW) Configure MSC operating mode:
- "single", for contiguous buffer mode (high-order alloc);
- "multi", for multiblock mode;
- "ExI", for DCI handler mode;
- "debug", for debug mode;
- any of the currently loaded buffer sinks.
- "debug", for debug mode.
If operating mode changes, existing buffer is deallocated,
provided there are no active users and tracing is not enabled,
otherwise the write will fail.

View File

@@ -1,63 +0,0 @@
What: /sys/bus/mdio_bus/devices/.../statistics/
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
This folder contains statistics about global and per
MDIO bus address statistics.
What: /sys/bus/mdio_bus/devices/.../statistics/transfers
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
Total number of transfers for this MDIO bus.
What: /sys/bus/mdio_bus/devices/.../statistics/errors
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
Total number of transfer errors for this MDIO bus.
What: /sys/bus/mdio_bus/devices/.../statistics/writes
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
Total number of write transactions for this MDIO bus.
What: /sys/bus/mdio_bus/devices/.../statistics/reads
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
Total number of read transactions for this MDIO bus.
What: /sys/bus/mdio_bus/devices/.../statistics/transfers_<addr>
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
Total number of transfers for this MDIO bus address.
What: /sys/bus/mdio_bus/devices/.../statistics/errors_<addr>
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
Total number of transfer errors for this MDIO bus address.
What: /sys/bus/mdio_bus/devices/.../statistics/writes_<addr>
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
Total number of write transactions for this MDIO bus address.
What: /sys/bus/mdio_bus/devices/.../statistics/reads_<addr>
Date: January 2020
KernelVersion: 5.6
Contact: netdev@vger.kernel.org
Description:
Total number of read transactions for this MDIO bus address.

View File

@@ -4,7 +4,7 @@ KernelVersion: 3.10
Contact: Samuel Ortiz <sameo@linux.intel.com>
linux-mei@linux.intel.com
Description: Stores the same MODALIAS value emitted by uevent
Format: mei:<mei device name>:<device uuid>:<protocol version>
Format: mei:<mei device name>:<device uuid>:
What: /sys/bus/mei/devices/.../name
Date: May 2015
@@ -26,24 +26,3 @@ KernelVersion: 4.3
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Stores mei client protocol version
Format: %d
What: /sys/bus/mei/devices/.../max_conn
Date: Nov 2019
KernelVersion: 5.5
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Stores mei client maximum number of connections
Format: %d
What: /sys/bus/mei/devices/.../fixed
Date: Nov 2019
KernelVersion: 5.5
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Stores mei client fixed address, if any
Format: %d
What: /sys/bus/mei/devices/.../max_len
Date: Nov 2019
KernelVersion: 5.5
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Stores mei client maximum message length
Format: %d

View File

@@ -1,17 +0,0 @@
What: /sys/bus/moxtet/devices/moxtet-<name>.<addr>/module_description
Date: March 2019
KernelVersion: 5.3
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) Moxtet module description. Format: string
What: /sys/bus/moxtet/devices/moxtet-<name>.<addr>/module_id
Date: March 2019
KernelVersion: 5.3
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) Moxtet module ID. Format: %x
What: /sys/bus/moxtet/devices/moxtet-<name>.<addr>/module_name
Date: March 2019
KernelVersion: 5.3
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) Moxtet module name. Format: string

View File

@@ -347,16 +347,3 @@ Description:
If the device has any Peer-to-Peer memory registered, this
file contains a '1' if the memory has been published for
use outside the driver that owns the device.
What: /sys/bus/pci/devices/.../link/clkpm
/sys/bus/pci/devices/.../link/l0s_aspm
/sys/bus/pci/devices/.../link/l1_aspm
/sys/bus/pci/devices/.../link/l1_1_aspm
/sys/bus/pci/devices/.../link/l1_2_aspm
/sys/bus/pci/devices/.../link/l1_1_pcipm
/sys/bus/pci/devices/.../link/l1_2_pcipm
Date: October 2019
Contact: Heiner Kallweit <hkallweit1@gmail.com>
Description: If ASPM is supported for an endpoint, these files can be
used to disable or enable the individual power management
states. Write y/1/on to enable, n/0/off to disable.

View File

@@ -80,14 +80,6 @@ Contact: thunderbolt-software@lists.01.org
Description: This attribute contains 1 if Thunderbolt device was already
authorized on boot and 0 otherwise.
What: /sys/bus/thunderbolt/devices/.../generation
Date: Jan 2020
KernelVersion: 5.5
Contact: Christian Kellner <christian@kellner.me>
Description: This attribute contains the generation of the Thunderbolt
controller associated with the device. It will contain 4
for USB4.
What: /sys/bus/thunderbolt/devices/.../key
Date: Sep 2017
KernelVersion: 4.13
@@ -112,34 +104,6 @@ Contact: thunderbolt-software@lists.01.org
Description: This attribute contains name of this device extracted from
the device DROM.
What: /sys/bus/thunderbolt/devices/.../rx_speed
Date: Jan 2020
KernelVersion: 5.5
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: This attribute reports the device RX speed per lane.
All RX lanes run at the same speed.
What: /sys/bus/thunderbolt/devices/.../rx_lanes
Date: Jan 2020
KernelVersion: 5.5
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: This attribute reports number of RX lanes the device is
using simultaneusly through its upstream port.
What: /sys/bus/thunderbolt/devices/.../tx_speed
Date: Jan 2020
KernelVersion: 5.5
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: This attribute reports the TX speed per lane.
All TX lanes run at the same speed.
What: /sys/bus/thunderbolt/devices/.../tx_lanes
Date: Jan 2020
KernelVersion: 5.5
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: This attribute reports number of TX lanes the device is
using simultaneusly through its upstream port.
What: /sys/bus/thunderbolt/devices/.../vendor
Date: Sep 2017
KernelVersion: 4.13

View File

@@ -1,26 +0,0 @@
What: /sys/class/backlight/<backlight>/scale
Date: July 2019
KernelVersion: 5.4
Contact: Daniel Thompson <daniel.thompson@linaro.org>
Description:
Description of the scale of the brightness curve.
The human eye senses brightness approximately logarithmically,
hence linear changes in brightness are perceived as being
non-linear. To achieve a linear perception of brightness changes
controls like sliders need to apply a logarithmic mapping for
backlights with a linear brightness curve.
Possible values of the attribute are:
unknown
The scale of the brightness curve is unknown.
linear
The brightness changes linearly with each step. Brightness
controls should apply a logarithmic mapping for a linear
perception.
non-linear
The brightness changes non-linearly with each step. Brightness
controls should use a linear mapping for a linear perception.

View File

@@ -7,13 +7,6 @@ Description:
The name of devfreq object denoted as ... is same as the
name of device using devfreq.
What: /sys/class/devfreq/.../name
Date: November 2019
Contact: Chanwoo Choi <cw00.choi@samsung.com>
Description:
The /sys/class/devfreq/.../name shows the name of device
of the corresponding devfreq object.
What: /sys/class/devfreq/.../governor
Date: September 2011
Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
@@ -55,15 +48,12 @@ What: /sys/class/devfreq/.../trans_stat
Date: October 2012
Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
Description:
This ABI shows or clears the statistics of devfreq behavior
on a specific device. It shows the time spent in each state
and the number of transitions between states.
This ABI shows the statistics of devfreq behavior on a
specific device. It shows the time spent in each state and
the number of transitions between states.
In order to activate this ABI, the devfreq target device
driver should provide the list of available frequencies
with its profile. If need to reset the statistics of devfreq
behavior on a specific device, enter 0(zero) to 'trans_stat'
as following:
echo 0 > /sys/class/devfreq/.../trans_stat
with its profile.
What: /sys/class/devfreq/.../userspace/set_freq
Date: September 2011

View File

@@ -1,139 +0,0 @@
What: /sys/class/leds/<led>/hw_pattern
Date: September 2019
KernelVersion: 5.5
Description:
Specify a hardware pattern for the EL15203000 LED.
The LEDs board supports only predefined patterns by firmware
for specific LEDs.
Breathing mode for Screen frame light tube:
"0 4000 1 4000"
^
|
Max-| ---
| / \
| / \
| / \ /
| / \ /
Min-|- ---
|
0------4------8--> time (sec)
Cascade mode for Pipe LED:
"1 800 2 800 4 800 8 800 16 800"
^
|
0 On -|----+ +----+ +---
| | | | |
Off-| +-------------------+ +-------------------+
|
1 On -| +----+ +----+
| | | | |
Off |----+ +-------------------+ +------------------
|
2 On -| +----+ +----+
| | | | |
Off-|---------+ +-------------------+ +-------------
|
3 On -| +----+ +----+
| | | | |
Off-|--------------+ +-------------------+ +--------
|
4 On -| +----+ +----+
| | | | |
Off-|-------------------+ +-------------------+ +---
|
0---0.8--1.6--2.4--3.2---4---4.8--5.6--6.4--7.2---8--> time (sec)
Inverted cascade mode for Pipe LED:
"30 800 29 800 27 800 23 800 15 800"
^
|
0 On -| +-------------------+ +-------------------+
| | | | |
Off-|----+ +----+ +---
|
1 On -|----+ +-------------------+ +------------------
| | | | |
Off | +----+ +----+
|
2 On -|---------+ +-------------------+ +-------------
| | | | |
Off-| +----+ +----+
|
3 On -|--------------+ +-------------------+ +--------
| | | | |
Off-| +----+ +----+
|
4 On -|-------------------+ +-------------------+ +---
| | | | |
Off-| +----+ +----+
|
0---0.8--1.6--2.4--3.2---4---4.8--5.6--6.4--7.2---8--> time (sec)
Bounce mode for Pipe LED:
"1 800 2 800 4 800 8 800 16 800 16 800 8 800 4 800 2 800 1 800"
^
|
0 On -|----+ +--------
| | |
Off-| +---------------------------------------+
|
1 On -| +----+ +----+
| | | | |
Off |----+ +-----------------------------+ +--------
|
2 On -| +----+ +----+
| | | | |
Off-|---------+ +-------------------+ +-------------
|
3 On -| +----+ +----+
| | | | |
Off-|--------------+ +---------+ +------------------
|
4 On -| +---------+
| | |
Off-|-------------------+ +-----------------------
|
0---0.8--1.6--2.4--3.2---4---4.8--5.6--6.4--7.2---8--> time (sec)
Inverted bounce mode for Pipe LED:
"30 800 29 800 27 800 23 800 15 800 15 800 23 800 27 800 29 800 30 800"
^
|
0 On -| +---------------------------------------+
| | |
Off-|----+ +--------
|
1 On -|----+ +-----------------------------+ +--------
| | | | |
Off | +----+ +----+
|
2 On -|---------+ +-------------------+ +-------------
| | | | |
Off-| +----+ +----+
|
3 On -|--------------+ +---------+ +------------------
| | | | |
Off-| +----+ +----+
|
4 On -|-------------------+ +-----------------------
| | |
Off-| +---------+
|
0---0.8--1.6--2.4--3.2---4---4.8--5.6--6.4--7.2---8--> time (sec)
What: /sys/class/leds/<led>/repeat
Date: September 2019
KernelVersion: 5.5
Description:
EL15203000 supports only indefinitely patterns,
so this file should always store -1.
For more info, please see:
Documentation/ABI/testing/sysfs-class-led-trigger-pattern

View File

@@ -80,13 +80,3 @@ Description: Display the ME device state.
DISABLED
POWER_DOWN
POWER_UP
What: /sys/class/mei/meiN/trc
Date: Nov 2019
KernelVersion: 5.5
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Display trc status register content
The ME FW writes Glitch Detection HW (TRC)
status information into trc status register
for BIOS and OS to monitor fw health.

View File

@@ -51,14 +51,6 @@ Description:
packet processing. See the network driver for the exact
meaning of this value.
What: /sys/class/<iface>/statistics/rx_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
Description:
Indicates the number of receive errors on this network device.
See the network driver for the exact meaning of this value.
What: /sys/class/<iface>/statistics/rx_fifo_errors
Date: April 2005
KernelVersion: 2.6.12
@@ -96,14 +88,6 @@ Description:
due to lack of capacity in the receive side. See the network
driver for the exact meaning of this value.
What: /sys/class/<iface>/statistics/rx_nohandler
Date: February 2016
KernelVersion: 4.6
Contact: netdev@vger.kernel.org
Description:
Indicates the number of received packets that were dropped on
an inactive device by the network core.
What: /sys/class/<iface>/statistics/rx_over_errors
Date: April 2005
KernelVersion: 2.6.12

View File

@@ -189,8 +189,7 @@ Description:
Access: Read
Valid values: "Unknown", "Good", "Overheat", "Dead",
"Over voltage", "Unspecified failure", "Cold",
"Watchdog timer expire", "Safety timer expire",
"Over current"
"Watchdog timer expire", "Safety timer expire"
What: /sys/class/power_supply/<supply_name>/precharge_current
Date: June 2017

View File

@@ -48,13 +48,3 @@ Description: Remote processor state
Writing "stop" will attempt to halt the remote processor and
return it to the "offline" state.
What: /sys/class/remoteproc/.../name
Date: August 2019
KernelVersion: 5.4
Contact: Suman Anna <s-anna@ti.com>
Description: Remote processor name
Reports the name of the remote processor. This can be used by
userspace in exactly identifying a remote processor and ease
up the usage in modifying the 'firmware' or 'state' files.

View File

@@ -1,76 +0,0 @@
What: /sys/class/wakeup/
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
The /sys/class/wakeup/ directory contains pointers to all
wakeup sources in the kernel at that moment in time.
What: /sys/class/wakeup/.../name
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the name of the wakeup source.
What: /sys/class/wakeup/.../active_count
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the number of times the wakeup source was
activated.
What: /sys/class/wakeup/.../event_count
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the number of signaled wakeup events
associated with the wakeup source.
What: /sys/class/wakeup/.../wakeup_count
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the number of times the wakeup source might
abort suspend.
What: /sys/class/wakeup/.../expire_count
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the number of times the wakeup source's
timeout has expired.
What: /sys/class/wakeup/.../active_time_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the amount of time the wakeup source has
been continuously active, in milliseconds. If the wakeup
source is not active, this file contains '0'.
What: /sys/class/wakeup/.../total_time_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the total amount of time this wakeup source
has been active, in milliseconds.
What: /sys/class/wakeup/.../max_time_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the maximum amount of time this wakeup
source has been continuously active, in milliseconds.
What: /sys/class/wakeup/.../last_change_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the monotonic clock time when the wakeup
source was touched last time, in milliseconds.
What: /sys/class/wakeup/.../prevent_suspend_time_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
The file contains the total amount of time this wakeup source
has been preventing autosleep, in milliseconds.

View File

@@ -17,13 +17,8 @@ What: /sys/class/watchdog/watchdogn/nowayout
Date: August 2015
Contact: Wim Van Sebroeck <wim@iguana.be>
Description:
It is a read/write file. While reading, it gives '1'
if the device has the nowayout feature set, otherwise
it gives '0'. Writing a '1' to the file enables the
nowayout feature. Once set, the nowayout feature
cannot be disabled, so writing a '0' either has no
effect (if the feature was already disabled) or
results in a permission error.
It is a read only file. While reading, it gives '1' if that
device supports nowayout feature else, it gives '0'.
What: /sys/class/watchdog/watchdogn/state
Date: August 2015
@@ -77,37 +72,3 @@ Description:
It is a read/write file. When read, the currently assigned
pretimeout governor is returned. When written, it sets
the pretimeout governor.
What: /sys/class/watchdog/watchdog1/access_cs0
Date: August 2019
Contact: Ivan Mikhaylov <i.mikhaylov@yadro.com>,
Alexander Amelkin <a.amelkin@yadro.com>
Description:
It is a read/write file. This attribute exists only if the
system has booted from the alternate flash chip due to
expiration of a watchdog timer of AST2400/AST2500 when
alternate boot function was enabled with 'aspeed,alt-boot'
devicetree option for that watchdog or with an appropriate
h/w strapping (for WDT2 only).
At alternate flash the 'access_cs0' sysfs node provides:
ast2400: a way to get access to the primary SPI flash
chip at CS0 after booting from the alternate
chip at CS1.
ast2500: a way to restore the normal address mapping
from (CS0->CS1, CS1->CS0) to (CS0->CS0,
CS1->CS1).
Clearing the boot code selection and timeout counter also
resets to the initial state the chip select line mapping. When
the SoC is in normal mapping state (i.e. booted from CS0),
clearing those bits does nothing for both versions of the SoC.
For alternate boot mode (booted from CS1 due to wdt2
expiration) the behavior differs as described above.
This option can be used with wdt2 (watchdog1) only.
When read, the current status of the boot code selection is
shown. When written with any non-zero value, it clears
the boot code selection and the timeout counter, which results
in chipselect reset for AST2400/AST2500.

View File

@@ -1,128 +0,0 @@
Intel Stratix10 Remote System Update (RSU) device attributes
What: /sys/devices/platform/stratix10-rsu.0/current_image
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) the address in flash of currently running image.
What: /sys/devices/platform/stratix10-rsu.0/fail_image
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) the address in flash of failed image.
What: /sys/devices/platform/stratix10-rsu.0/state
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) the state of RSU system.
The state field has two parts: major error code in
upper 16 bits and minor error code in lower 16 bits.
b[15:0]
Currently used only when major error is 0xF006
(CPU watchdog timeout), in which case the minor
error code is the value reported by CPU to
firmware through the RSU notify command before
the watchdog timeout occurs.
b[31:16]
0xF001 bitstream error
0xF002 hardware access failure
0xF003 bitstream corruption
0xF004 internal error
0xF005 device error
0xF006 CPU watchdog timeout
0xF007 internal unknown error
What: /sys/devices/platform/stratix10-rsu.0/version
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) the version number of RSU firmware. 19.3 or late
version includes information about the firmware which
reported the error.
pre 19.3:
b[31:0]
0x0 version number
19.3 or late:
b[15:0]
0x1 version number
b[31:16]
0x0 no error
0x0DCF Decision CMF error
0x0ACF Application CMF error
What: /sys/devices/platform/stratix10-rsu.0/error_location
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) the error offset inside the image that failed.
What: /sys/devices/platform/stratix10-rsu.0/error_details
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) error code.
What: /sys/devices/platform/stratix10-rsu.0/retry_counter
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) the current image's retry counter, which is used by
user to know how many times the images is still allowed
to reload itself before giving up and starting RSU
fail-over flow.
What: /sys/devices/platform/stratix10-rsu.0/reboot_image
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(WO) the address in flash of image to be loaded on next
reboot command.
What: /sys/devices/platform/stratix10-rsu.0/notify
Date: August 2019
KernelVersion: 5.4
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(WO) client to notify firmware with different actions.
b[15:0]
inform firmware the current software execution
stage.
0 the first stage bootloader didn't run or
didn't reach the point of launching second
stage bootloader.
1 failed in second bootloader or didn't get
to the point of launching the operating
system.
2 both first and second stage bootloader ran
and the operating system launch was
attempted.
b[16]
1 firmware to reset current image retry
counter.
0 no action.
b[17]
1 firmware to clear RSU log
0 no action.
b[18]
this is negative logic
1 no action
0 firmware record the notify code defined
in b[15:0].

View File

@@ -260,12 +260,3 @@ Description:
This attribute has no effect on system-wide suspend/resume and
hibernation.
What: /sys/devices/.../power/runtime_status
Date: April 2010
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
Description:
The /sys/devices/.../power/runtime_status attribute contains
the current runtime PM status of the device, which may be
"suspended", "suspending", "resuming", "active", "error" (fatal
error), or "unsupported" (runtime PM is disabled).

View File

@@ -26,13 +26,6 @@ Description:
Read-only attribute common to all SoCs. Contains SoC family name
(e.g. DB8500).
What: /sys/devices/socX/serial_number
Date: January 2019
contact: Bjorn Andersson <bjorn.andersson@linaro.org>
Description:
Read-only attribute supported by most SoCs. Contains the SoC's
serial number, if available.
What: /sys/devices/socX/soc_id
Date: January 2012
contact: Lee Jones <lee.jones@linaro.org>

View File

@@ -196,12 +196,6 @@ Description:
does not reflect it. Likewise, if one enables a deep state but a
lighter state still is disabled, then this has no effect.
What: /sys/devices/system/cpu/cpuX/cpuidle/stateN/default_status
Date: December 2019
KernelVersion: v5.6
Contact: Linux power management list <linux-pm@vger.kernel.org>
Description:
(RO) The default status of this state, "enabled" or "disabled".
What: /sys/devices/system/cpu/cpuX/cpuidle/stateN/residency
Date: March 2014
@@ -492,9 +486,6 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/l1tf
/sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/srbds
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
/sys/devices/system/cpu/vulnerabilities/itlb_multihit
Date: January 2018
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Information about CPU vulnerabilities
@@ -571,13 +562,3 @@ Description: Umwait control
or C0.2 state. The time is an unsigned 32-bit number.
Note that a value of zero means there is no limit.
Low order two bits must be zero.
What: /sys/devices/system/cpu/svm
Date: August 2019
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
Description: Secure Virtual Machine
If 1, it means the system is using the Protected Execution
Facility in POWER9 and newer processors. i.e., it is a Secure
Virtual Machine.

View File

@@ -57,7 +57,6 @@ KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Allows the user to set the maximum clock frequency for MME, TPC
and IC when the power management profile is set to "automatic".
This property is valid only for the Goya ASIC family
What: /sys/class/habanalabs/hl<n>/ic_clk
Date: Jan 2019
@@ -128,8 +127,8 @@ Description: Power management profile. Values are "auto", "manual". In "auto"
the max clock frequency to a low value when there are no user
processes that are opened on the device's file. In "manual"
mode, the user sets the maximum clock frequency by writing to
ic_clk, mme_clk and tpc_clk. This property is valid only for
the Goya ASIC family
ic_clk, mme_clk and tpc_clk
What: /sys/class/habanalabs/hl<n>/preboot_btl_ver
Date: Jan 2019
@@ -187,4 +186,11 @@ What: /sys/class/habanalabs/hl<n>/uboot_ver
Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Version of the u-boot running on the device's CPU
Description: Version of the u-boot running on the device's CPU
What: /sys/class/habanalabs/hl<n>/write_open_cnt
Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Displays the total number of user processes that are currently
opened on the device's file

View File

@@ -11,16 +11,3 @@ Description:
#echo 00:19.0-E0:2:FF > /sys/bus/pci/drivers/pciback/quirks
will allow the guest to read and write to the configuration
register 0x0E.
What: /sys/bus/pci/drivers/pciback/allow_interrupt_control
Date: Jan 2020
KernelVersion: 5.6
Contact: xen-devel@lists.xenproject.org
Description:
List of devices which can have interrupt control flag (INTx,
MSI, MSI-X) set by a connected guest. It is meant to be set
only when the guest is a stubdomain hosting device model (qemu)
and the actual device is assigned to a HVM. It is not safe
(similar to permissive attribute) to set for a devices assigned
to a PV guest. The device is automatically removed from this
list when the connected pcifront terminates.

View File

@@ -1,116 +0,0 @@
What: /sys/bus/w1/devices/.../alarms
Date: May 2020
Contact: Akira Shimahara <akira215corp@gmail.com>
Description:
(RW) read or write TH and TL (Temperature High an Low) alarms.
Values shall be space separated and in the device range
(typical -55 degC to 125 degC), if not values will be trimmed
to device min/max capabilities. Values are integer as they are
stored in a 8bit register in the device. Lowest value is
automatically put to TL. Once set, alarms could be search at
master level, refer to Documentation/w1/w1_generic.rst for
detailed information
Users: any user space application which wants to communicate with
w1_term device
What: /sys/bus/w1/devices/.../eeprom
Date: May 2020
Contact: Akira Shimahara <akira215corp@gmail.com>
Description:
(WO) writing that file will either trigger a save of the
device data to its embedded EEPROM, either restore data
embedded in device EEPROM. Be aware that devices support
limited EEPROM writing cycles (typical 50k)
* 'save': save device RAM to EEPROM
* 'restore': restore EEPROM data in device RAM
Users: any user space application which wants to communicate with
w1_term device
What: /sys/bus/w1/devices/.../ext_power
Date: May 2020
Contact: Akira Shimahara <akira215corp@gmail.com>
Description:
(RO) return the power status by asking the device
* '0': device parasite powered
* '1': device externally powered
* '-xx': xx is kernel error when reading power status
Users: any user space application which wants to communicate with
w1_term device
What: /sys/bus/w1/devices/.../resolution
Date: May 2020
Contact: Akira Shimahara <akira215corp@gmail.com>
Description:
(RW) get or set the device resolution (on supported devices,
if not, this entry is not present). Note that the resolution
will be changed only in device RAM, so it will be cleared when
power is lost. Trigger a 'save' to EEPROM command to keep
values after power-on. Read or write are :
* '9..12': device resolution in bit
or resolution to set in bit
* '-xx': xx is kernel error when reading the resolution
* Anything else: do nothing
Users: any user space application which wants to communicate with
w1_term device
What: /sys/bus/w1/devices/.../temperature
Date: May 2020
Contact: Akira Shimahara <akira215corp@gmail.com>
Description:
(RO) return the temperature in 1/1000 degC.
* If a bulk read has been triggered, it will directly
return the temperature computed when the bulk read
occurred, if available. If not yet available, nothing
is returned (a debug kernel message is sent), you
should retry later on.
* If no bulk read has been triggered, it will trigger
a conversion and send the result. Note that the
conversion duration depend on the resolution (if
device support this feature). It takes 94ms in 9bits
resolution, 750ms for 12bits.
Users: any user space application which wants to communicate with
w1_term device
What: /sys/bus/w1/devices/.../w1_slave
Date: May 2020
Contact: Akira Shimahara <akira215corp@gmail.com>
Description:
(RW) return the temperature in 1/1000 degC.
*read*: return 2 lines with the hexa output data sent on the
bus, return the CRC check and temperature in 1/1000 degC
*write* :
* '0' : save the 2 or 3 bytes to the device EEPROM
(i.e. TH, TL and config register)
* '9..12' : set the device resolution in RAM
(if supported)
* Anything else: do nothing
refer to Documentation/w1/slaves/w1_therm.rst for detailed
information.
Users: any user space application which wants to communicate with
w1_term device
What: /sys/bus/w1/devices/w1_bus_masterXX/therm_bulk_read
Date: May 2020
Contact: Akira Shimahara <akira215corp@gmail.com>
Description:
(RW) trigger a bulk read conversion. read the status
*read*:
* '-1': conversion in progress on at least 1 sensor
* '1' : conversion complete but at least one sensor
value has not been read yet
* '0' : no bulk operation. Reading temperature will
trigger a conversion on each device
*write*: 'trigger': trigger a bulk read on all supporting
devices on the bus
Note that if a bulk read is sent but one sensor is not read
immediately, the next access to temperature on this device
will return the temperature measured at the time of issue
of the bulk read command (not the current temperature).
Users: any user space application which wants to communicate with
w1_term device

View File

@@ -25,13 +25,3 @@ Description:
allocated without being in use. The time is in
seconds, 0 means indefinitely long.
The default is 60 seconds.
What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
Date: December 2019
KernelVersion: 5.6
Contact: SeongJae Park <sjpark@amazon.de>
Description:
When memory pressure is reported to blkback this option
controls the duration in milliseconds that blkback will not
cache any page not backed by a grant mapping.
The default is 10ms.

View File

@@ -28,11 +28,3 @@ Description: Displays the physical addresses of all EFI Configuration
versions are always printed first, i.e. ACPI20 comes
before ACPI.
Users: dmidecode
What: /sys/firmware/efi/tables/rci2
Date: July 2019
Contact: Narendra K <Narendra.K@dell.com>, linux-bugs@dell.com
Description: Displays the content of the Runtime Configuration Interface
Table version 2 on Dell EMC PowerEdge systems in binary format
Users: It is used by Dell EMC OpenManage Server Administrator tool to
populate BIOS setup page.

View File

@@ -1,37 +0,0 @@
What: /sys/firmware/turris-mox-rwtm/board_version
Date: August 2019
KernelVersion: 5.4
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) Board version burned into eFuses of this Turris Mox board.
Format: %i
What: /sys/firmware/turris-mox-rwtm/mac_address*
Date: August 2019
KernelVersion: 5.4
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) MAC addresses burned into eFuses of this Turris Mox board.
Format: %pM
What: /sys/firmware/turris-mox-rwtm/pubkey
Date: August 2019
KernelVersion: 5.4
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) ECDSA public key (in pubkey hex compressed form) computed
as pair to the ECDSA private key burned into eFuses of this
Turris Mox Board.
Format: string
What: /sys/firmware/turris-mox-rwtm/ram_size
Date: August 2019
KernelVersion: 5.4
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) RAM size in MiB of this Turris Mox board as was detected
during manufacturing and burned into eFuses. Can be 512 or 1024.
Format: %i
What: /sys/firmware/turris-mox-rwtm/serial_number
Date: August 2019
KernelVersion: 5.4
Contact: Marek Behún <marek.behun@nic.cz>
Description: (R) Serial number burned into eFuses of this Turris Mox device.
Format: %016X

View File

@@ -1,320 +1,253 @@
What: /sys/fs/f2fs/<disk>/gc_max_sleep_time
Date: July 2013
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
Description: Controls the maximum sleep time for gc_thread. Time
is in milliseconds.
Description:
Controls the maximun sleep time for gc_thread. Time
is in milliseconds.
What: /sys/fs/f2fs/<disk>/gc_min_sleep_time
Date: July 2013
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
Description: Controls the minimum sleep time for gc_thread. Time
is in milliseconds.
Description:
Controls the minimum sleep time for gc_thread. Time
is in milliseconds.
What: /sys/fs/f2fs/<disk>/gc_no_gc_sleep_time
Date: July 2013
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
Description: Controls the default sleep time for gc_thread. Time
is in milliseconds.
Description:
Controls the default sleep time for gc_thread. Time
is in milliseconds.
What: /sys/fs/f2fs/<disk>/gc_idle
Date: July 2013
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
Description: Controls the victim selection policy for garbage collection.
Setting gc_idle = 0(default) will disable this option. Setting
gc_idle = 1 will select the Cost Benefit approach & setting
gc_idle = 2 will select the greedy approach.
Description:
Controls the victim selection policy for garbage collection.
What: /sys/fs/f2fs/<disk>/reclaim_segments
Date: October 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: This parameter controls the number of prefree segments to be
reclaimed. If the number of prefree segments is larger than
the number of segments in the proportion to the percentage
over total volume size, f2fs tries to conduct checkpoint to
reclaim the prefree segments to free segments.
By default, 5% over total # of segments.
What: /sys/fs/f2fs/<disk>/main_blkaddr
Date: November 2019
Contact: "Ramon Pantin" <pantin@google.com>
Description:
Shows first block address of MAIN area.
Controls the issue rate of segment discard commands.
What: /sys/fs/f2fs/<disk>/ipu_policy
Date: November 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the in-place-update policy.
updates in f2fs. User can set:
0x01: F2FS_IPU_FORCE, 0x02: F2FS_IPU_SSR,
0x04: F2FS_IPU_UTIL, 0x08: F2FS_IPU_SSR_UTIL,
0x10: F2FS_IPU_FSYNC, 0x20: F2FS_IPU_ASYNC,
0x40: F2FS_IPU_NOCACHE.
Refer segment.h for details.
Description:
Controls the in-place-update policy.
What: /sys/fs/f2fs/<disk>/min_ipu_util
Date: November 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the FS utilization condition for the in-place-update
policies. It is used by F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
Description:
Controls the FS utilization condition for the in-place-update
policies.
What: /sys/fs/f2fs/<disk>/min_fsync_blocks
Date: September 2014
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the dirty page count condition for the in-place-update
policies.
Description:
Controls the dirty page count condition for the in-place-update
policies.
What: /sys/fs/f2fs/<disk>/min_seq_blocks
Date: August 2018
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the dirty page count condition for batched sequential
writes in writepages.
Description:
Controls the dirty page count condition for batched sequential
writes in ->writepages.
What: /sys/fs/f2fs/<disk>/min_hot_blocks
Date: March 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the dirty page count condition for redefining hot data.
Description:
Controls the dirty page count condition for redefining hot data.
What: /sys/fs/f2fs/<disk>/min_ssr_sections
Date: October 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls the free section threshold to trigger SSR allocation.
If this is large, SSR mode will be enabled early.
Description:
Controls the fee section threshold to trigger SSR allocation.
What: /sys/fs/f2fs/<disk>/max_small_discards
Date: November 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the issue rate of discard commands that consist of small
blocks less than 2MB. The candidates to be discarded are cached until
checkpoint is triggered, and issued during the checkpoint.
By default, it is disabled with 0.
Description:
Controls the issue rate of small discard commands.
What: /sys/fs/f2fs/<disk>/discard_granularity
Date: July 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls discard granularity of inner discard thread. Inner thread
What: /sys/fs/f2fs/<disk>/discard_granularity
Date: July 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description:
Controls discard granularity of inner discard thread, inner thread
will not issue discards with size that is smaller than granularity.
The unit size is one block(4KB), now only support configuring
in range of [1, 512]. Default value is 4(=16KB).
The unit size is one block, now only support configuring in range
of [1, 512].
What: /sys/fs/f2fs/<disk>/umount_discard_timeout
Date: January 2019
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Set timeout to issue discard commands during umount.
Default: 5 secs
What: /sys/fs/f2fs/<disk>/umount_discard_timeout
Date: January 2019
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description:
Set timeout to issue discard commands during umount.
Default: 5 secs
What: /sys/fs/f2fs/<disk>/max_victim_search
Date: January 2014
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the number of trials to find a victim segment
when conducting SSR and cleaning operations. The default value
is 4096 which covers 8GB block address range.
Description:
Controls the number of trials to find a victim segment.
What: /sys/fs/f2fs/<disk>/migration_granularity
Date: October 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls migration granularity of garbage collection on large
section, it can let GC move partial segment{s} of one section
in one GC cycle, so that dispersing heavy overhead GC to
multiple lightweight one.
Description:
Controls migration granularity of garbage collection on large
section, it can let GC move partial segment{s} of one section
in one GC cycle, so that dispersing heavy overhead GC to
multiple lightweight one.
What: /sys/fs/f2fs/<disk>/dir_level
Date: March 2014
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the directory level for large directory. If a
directory has a number of files, it can reduce the file lookup
latency by increasing this dir_level value. Otherwise, it
needs to decrease this value to reduce the space overhead.
The default value is 0.
Description:
Controls the directory level for large directory.
What: /sys/fs/f2fs/<disk>/ram_thresh
Date: March 2014
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the memory footprint used by free nids and cached
nat entries. By default, 1 is set, which indicates
10 MB / 1 GB RAM.
Description:
Controls the memory footprint used by f2fs.
What: /sys/fs/f2fs/<disk>/batched_trim_sections
Date: February 2015
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the trimming rate in batch mode.
<deprecated>
Description:
Controls the trimming rate in batch mode.
<deprecated>
What: /sys/fs/f2fs/<disk>/cp_interval
Date: October 2015
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the checkpoint timing, set to 60 seconds by default.
Description:
Controls the checkpoint timing.
What: /sys/fs/f2fs/<disk>/idle_interval
Date: January 2016
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the idle timing of system, if there is no FS operation
during given interval.
Set to 5 seconds by default.
Description:
Controls the idle timing for all paths other than
discard and gc path.
What: /sys/fs/f2fs/<disk>/discard_idle_interval
Date: September 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Contact: "Sahitya Tummala" <stummala@codeaurora.org>
Description: Controls the idle timing of discard thread given
this time interval.
Default is 5 secs.
Description:
Controls the idle timing for discard path.
What: /sys/fs/f2fs/<disk>/gc_idle_interval
Date: September 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Contact: "Sahitya Tummala" <stummala@codeaurora.org>
Description: Controls the idle timing for gc path. Set to 5 seconds by default.
Description:
Controls the idle timing for gc path.
What: /sys/fs/f2fs/<disk>/iostat_enable
Date: August 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls to enable/disable IO stat.
Description:
Controls to enable/disable IO stat.
What: /sys/fs/f2fs/<disk>/ra_nid_pages
Date: October 2015
Contact: "Chao Yu" <chao2.yu@samsung.com>
Description: Controls the count of nid pages to be readaheaded.
When building free nids, F2FS reads NAT blocks ahead for
speed up. Default is 0.
Description:
Controls the count of nid pages to be readaheaded.
What: /sys/fs/f2fs/<disk>/dirty_nats_ratio
Date: January 2016
Contact: "Chao Yu" <chao2.yu@samsung.com>
Description: Controls dirty nat entries ratio threshold, if current
ratio exceeds configured threshold, checkpoint will
be triggered for flushing dirty nat entries.
Description:
Controls dirty nat entries ratio threshold, if current
ratio exceeds configured threshold, checkpoint will
be triggered for flushing dirty nat entries.
What: /sys/fs/f2fs/<disk>/lifetime_write_kbytes
Date: January 2016
Contact: "Shuoran Liu" <liushuoran@huawei.com>
Description: Shows total written kbytes issued to disk.
Description:
Shows total written kbytes issued to disk.
What: /sys/fs/f2fs/<disk>/features
Date: July 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Shows all enabled features in current device.
Description:
Shows all enabled features in current device.
What: /sys/fs/f2fs/<disk>/inject_rate
Date: May 2016
Contact: "Sheng Yong" <shengyong1@huawei.com>
Description: Controls the injection rate of arbitrary faults.
Description:
Controls the injection rate.
What: /sys/fs/f2fs/<disk>/inject_type
Date: May 2016
Contact: "Sheng Yong" <shengyong1@huawei.com>
Description: Controls the injection type of arbitrary faults.
What: /sys/fs/f2fs/<disk>/dirty_segments
Date: October 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Shows the number of dirty segments.
Description:
Controls the injection type.
What: /sys/fs/f2fs/<disk>/reserved_blocks
Date: June 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls target reserved blocks in system, the threshold
is soft, it could exceed current available user space.
Description:
Controls target reserved blocks in system, the threshold
is soft, it could exceed current available user space.
What: /sys/fs/f2fs/<disk>/current_reserved_blocks
Date: October 2017
Contact: "Yunlong Song" <yunlong.song@huawei.com>
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Shows current reserved blocks in system, it may be temporarily
smaller than target_reserved_blocks, but will gradually
increase to target_reserved_blocks when more free blocks are
freed by user later.
Description:
Shows current reserved blocks in system, it may be temporarily
smaller than target_reserved_blocks, but will gradually
increase to target_reserved_blocks when more free blocks are
freed by user later.
What: /sys/fs/f2fs/<disk>/gc_urgent
Date: August 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Do background GC agressively when set. When gc_urgent = 1,
background thread starts to do GC by given gc_urgent_sleep_time
interval. It is set to 0 by default.
Description:
Do background GC agressively
What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time
Date: August 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls sleep time of GC urgent mode. Set to 500ms by default.
Description:
Controls sleep time of GC urgent mode
What: /sys/fs/f2fs/<disk>/readdir_ra
Date: November 2017
Contact: "Sheng Yong" <shengyong1@huawei.com>
Description: Controls readahead inode block in readdir. Enabled by default.
What: /sys/fs/f2fs/<disk>/gc_pin_file_thresh
Date: January 2018
Contact: Jaegeuk Kim <jaegeuk@kernel.org>
Description: This indicates how many GC can be failed for the pinned
file. If it exceeds this, F2FS doesn't guarantee its pinning
state. 2048 trials is set by default.
Description:
Controls readahead inode block in readdir.
What: /sys/fs/f2fs/<disk>/extension_list
Date: Feburary 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Used to control configure extension list:
- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension
Description:
Used to control configure extension list:
- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension
What: /sys/fs/f2fs/<disk>/unusable
Date April 2019
Contact: "Daniel Rosenberg" <drosen@google.com>
Description: If checkpoint=disable, it displays the number of blocks that
are unusable.
If checkpoint=enable it displays the enumber of blocks that
would be unusable if checkpoint=disable were to be set.
What: /sys/fs/f2fs/<disk>/encoding
Date July 2019
Contact: "Daniel Rosenberg" <drosen@google.com>
Description: Displays name and version of the encoding set for the filesystem.
If no encoding is set, displays (none)
What: /sys/fs/f2fs/<disk>/free_segments
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of free segments in disk.
What: /sys/fs/f2fs/<disk>/cp_foreground_calls
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of checkpoint operations performed on demand. Available when
CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/cp_background_calls
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of checkpoint operations performed in the background to
free segments. Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/gc_foreground_calls
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of garbage collection operations performed on demand.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/gc_background_calls
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of garbage collection operations triggered in background.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/moved_blocks_foreground
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of blocks moved by garbage collection in foreground.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/moved_blocks_background
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of blocks moved by garbage collection in background.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/avg_vblocks
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Average number of valid blocks.
Available when CONFIG_F2FS_STAT_FS=y.
Description:
If checkpoint=disable, it displays the number of blocks that are unusable.
If checkpoint=enable it displays the enumber of blocks that would be unusable
if checkpoint=disable were to be set.

View File

@@ -1,17 +0,0 @@
What: /sys/kernel/btf
Date: Aug 2019
KernelVersion: 5.5
Contact: bpf@vger.kernel.org
Description:
Contains BTF type information and related data for kernel and
kernel modules.
What: /sys/kernel/btf/vmlinux
Date: Aug 2019
KernelVersion: 5.5
Contact: bpf@vger.kernel.org
Description:
Read-only binary attribute exposing kernel's own BTF type
information with description of all internal kernel types. See
Documentation/bpf/btf.rst for detailed description of format
itself.

View File

@@ -429,15 +429,10 @@ KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org>
Description:
The shrink file is used to reclaim unused slab cache
memory from a cache. Empty per-cpu or partial slabs
are freed and the partial list is sorted so the slabs
with the fewest available objects are used first.
It only accepts a value of "1" on write for shrinking
the cache. Other input values are considered invalid.
Shrinking slab caches might be expensive and can
adversely impact other running applications. So it
should be used with care.
The shrink file is written when memory should be reclaimed from
a cache. Empty partial slabs are freed and the partial list is
sorted so the slabs with the fewest available objects are used
first.
What: /sys/kernel/slab/cache/slab_size
Date: May 2007

View File

@@ -46,13 +46,3 @@ Description:
* 0 - normal,
* 1 - overboost,
* 2 - silent
What: /sys/devices/platform/<platform>/throttle_thermal_policy
Date: Dec 2019
KernelVersion: 5.6
Contact: "Leonid Maksymchuk" <leonmaxx@gmail.com>
Description:
Throttle thermal policy mode:
* 0 - default,
* 1 - overboost,
* 2 - silent

View File

@@ -21,220 +21,3 @@ Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns Bitstream (static FPGA region) meta
data, which includes the synthesis date, seed and other
information of this static FPGA region.
What: /sys/bus/platform/devices/dfl-fme.0/cache_size
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns cache size of this FPGA device.
What: /sys/bus/platform/devices/dfl-fme.0/fabric_version
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns fabric version of this FPGA device.
Userspace applications need this information to select
best data channels per different fabric design.
What: /sys/bus/platform/devices/dfl-fme.0/socket_id
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns socket_id to indicate which socket
this FPGA belongs to, only valid for integrated solution.
User only needs this information, in case standard numa node
can't provide correct information.
What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie0_errors
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Write. Read this file for errors detected on pcie0 link.
Write this file to clear errors logged in pcie0_errors. Write
fails with -EINVAL if input parsing fails or input error code
doesn't match.
What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Write. Read this file for errors detected on pcie1 link.
Write this file to clear errors logged in pcie1_errors. Write
fails with -EINVAL if input parsing fails or input error code
doesn't match.
What: /sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns non-fatal errors detected.
What: /sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns catastrophic and fatal errors detected.
What: /sys/bus/platform/devices/dfl-fme.0/errors/inject_errors
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Write. Read this file to check errors injected. Write this
file to inject errors for testing purpose. Write fails with
-EINVAL if input parsing fails or input inject error code isn't
supported.
What: /sys/bus/platform/devices/dfl-fme.0/errors/fme_errors
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Write. Read this file to get errors detected on FME.
Write this file to clear errors logged in fme_errors. Write
fials with -EINVAL if input parsing fails or input error code
doesn't match.
What: /sys/bus/platform/devices/dfl-fme.0/errors/first_error
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. Read this file to get the first error detected by
hardware.
What: /sys/bus/platform/devices/dfl-fme.0/errors/next_error
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. Read this file to get the second error detected by
hardware.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/name
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. Read this file to get the name of hwmon device, it
supports values:
'dfl_fme_thermal' - thermal hwmon device name
'dfl_fme_power' - power hwmon device name
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_input
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. It returns FPGA device temperature in millidegrees
Celsius.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_max
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. It returns hardware threshold1 temperature in
millidegrees Celsius. If temperature rises at or above this
threshold, hardware starts 50% or 90% throttling (see
'temp1_max_policy').
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. It returns hardware threshold2 temperature in
millidegrees Celsius. If temperature rises at or above this
threshold, hardware starts 100% throttling.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_emergency
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. It returns hardware trip threshold temperature in
millidegrees Celsius. If temperature rises at or above this
threshold, a fatal event will be triggered to board management
controller (BMC) to shutdown FPGA.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_max_alarm
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns 1 if temperature is currently at or above
hardware threshold1 (see 'temp1_max'), otherwise 0.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit_alarm
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns 1 if temperature is currently at or above
hardware threshold2 (see 'temp1_crit'), otherwise 0.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_max_policy
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. Read this file to get the policy of hardware threshold1
(see 'temp1_max'). It only supports two values (policies):
0 - AP2 state (90% throttling)
1 - AP1 state (50% throttling)
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_input
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. It returns current FPGA power consumption in uW.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_max
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Write. Read this file to get current hardware power
threshold1 in uW. If power consumption rises at or above
this threshold, hardware starts 50% throttling.
Write this file to set current hardware power threshold1 in uW.
As hardware only accepts values in Watts, so input value will
be round down per Watts (< 1 watts part will be discarded) and
clamped within the range from 0 to 127 Watts. Write fails with
-EINVAL if input parsing fails.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_crit
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Write. Read this file to get current hardware power
threshold2 in uW. If power consumption rises at or above
this threshold, hardware starts 90% throttling.
Write this file to set current hardware power threshold2 in uW.
As hardware only accepts values in Watts, so input value will
be round down per Watts (< 1 watts part will be discarded) and
clamped within the range from 0 to 127 Watts. Write fails with
-EINVAL if input parsing fails.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_max_alarm
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns 1 if power consumption is currently at or
above hardware threshold1 (see 'power1_max'), otherwise 0.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_crit_alarm
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns 1 if power consumption is currently at or
above hardware threshold2 (see 'power1_crit'), otherwise 0.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_xeon_limit
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. It returns power limit for XEON in uW.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_fpga_limit
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Only. It returns power limit for FPGA in uW.
What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_ltr
Date: October 2019
KernelVersion: 5.5
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. Read this file to get current Latency Tolerance
Reporting (ltr) value. It returns 1 if all Accelerated
Function Units (AFUs) can tolerate latency >= 40us for memory
access or 0 if any AFU is latency sensitive (< 40us).

View File

@@ -14,88 +14,3 @@ Description: Read-only. User can program different PR bitstreams to FPGA
Accelerator Function Unit (AFU) for different functions. It
returns uuid which could be used to identify which PR bitstream
is programmed in this AFU.
What: /sys/bus/platform/devices/dfl-port.0/power_state
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It reports the APx (AFU Power) state, different APx
means different throttling level. When reading this file, it
returns "0" - Normal / "1" - AP1 / "2" - AP2 / "6" - AP6.
What: /sys/bus/platform/devices/dfl-port.0/ap1_event
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-write. Read this file for AP1 (AFU Power State 1) event.
It's used to indicate transient AP1 state. Write 1 to this
file to clear AP1 event.
What: /sys/bus/platform/devices/dfl-port.0/ap2_event
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-write. Read this file for AP2 (AFU Power State 2) event.
It's used to indicate transient AP2 state. Write 1 to this
file to clear AP2 event.
What: /sys/bus/platform/devices/dfl-port.0/ltr
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-write. Read or set AFU latency tolerance reporting value.
Set ltr to 1 if the AFU can tolerate latency >= 40us or set it
to 0 if it is latency sensitive.
What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcmd
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Write-only. User writes command to this interface to set
userclock to AFU.
What: /sys/bus/platform/devices/dfl-port.0/userclk_freqsts
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. Read this file to get the status of issued command
to userclck_freqcmd.
What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrcmd
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Write-only. User writes command to this interface to set
userclock counter.
What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrsts
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. Read this file to get the status of issued command
to userclck_freqcntrcmd.
What: /sys/bus/platform/devices/dfl-port.0/errors/errors
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-Write. Read this file to get errors detected on port and
Accelerated Function Unit (AFU). Write error code to this file
to clear errors. Write fails with -EINVAL if input parsing
fails or input error code doesn't match. Write fails with
-EBUSY or -ETIMEDOUT if error can't be cleared as hardware
in low power state (-EBUSY) or not respoding (-ETIMEDOUT).
What: /sys/bus/platform/devices/dfl-port.0/errors/first_error
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. Read this file to get the first error detected by
hardware.
What: /sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req
Date: August 2019
KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. Read this file to get the first malformed request
captured by hardware.

View File

@@ -1,58 +0,0 @@
What: /sys/bus/platform/devices/MLNXBF04:00/lifecycle_state
Date: Oct 2019
KernelVersion: 5.5
Contact: "Liming Sun <lsun@mellanox.com>"
Description:
The Life-cycle state of the SoC, which could be one of the
following values.
Production - Production state and can be updated to secure
GA Secured - Secure chip and not able to change state
GA Non-Secured - Non-Secure chip and not able to change state
RMA - Return Merchandise Authorization
What: /sys/bus/platform/devices/MLNXBF04:00/post_reset_wdog
Date: Oct 2019
KernelVersion: 5.5
Contact: "Liming Sun <lsun@mellanox.com>"
Description:
The watchdog setting in seconds for the next booting. It's used
to reboot the chip and recover it to the old state if the new
boot partition fails.
What: /sys/bus/platform/devices/MLNXBF04:00/reset_action
Date: Oct 2019
KernelVersion: 5.5
Contact: "Liming Sun <lsun@mellanox.com>"
Description:
The source of the boot stream for the next reset. It could be
one of the following values.
external - boot from external source (USB or PCIe)
emmc - boot from the onchip eMMC
emmc_legacy - boot from the onchip eMMC in legacy (slow) mode
What: /sys/bus/platform/devices/MLNXBF04:00/second_reset_action
Date: Oct 2019
KernelVersion: 5.5
Contact: "Liming Sun <lsun@mellanox.com>"
Description:
Update the source of the boot stream after next reset. It could
be one of the following values and will be applied after next
reset.
external - boot from external source (USB or PCIe)
emmc - boot from the onchip eMMC
emmc_legacy - boot from the onchip eMMC in legacy (slow) mode
swap_emmc - swap the primary / secondary boot partition
none - cancel the action
What: /sys/bus/platform/devices/MLNXBF04:00/secure_boot_fuse_state
Date: Oct 2019
KernelVersion: 5.5
Contact: "Liming Sun <lsun@mellanox.com>"
Description:
The state of eFuse versions with the following values.
InUse - burnt, valid and currently in use
Used - burnt and valid
Free - not burnt and free to use
Skipped - not burnt but not free (skipped)
Wasted - burnt and invalid
Invalid - not burnt but marked as valid (error state).

View File

@@ -31,23 +31,6 @@ Description:
Output will a version string be similar to the example below:
08B6
What: /sys/bus/platform/devices/GOOG000C\:00/usb_charge
Date: October 2019
KernelVersion: 5.5
Description:
Control the USB PowerShare Policy. USB PowerShare is a policy
which affects charging via the special USB PowerShare port
(marked with a small lightning bolt or battery icon) when in
low power states:
- In S0, the port will always provide power.
- In S0ix, if usb_charge is enabled, then power will be
supplied to the port when on AC or if battery is > 50%.
Else no power is supplied.
- In S5, if usb_charge is enabled, then power will be supplied
to the port when on AC. Else no power is supplied.
Input should be either "0" or "1".
What: /sys/bus/platform/devices/GOOG000C\:00/version
Date: May 2019
KernelVersion: 5.3

View File

@@ -301,122 +301,3 @@ Description:
Using this sysfs file will override any values that were
set using the kernel command line for disk offset.
What: /sys/power/suspend_stats
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats directory contains suspend related
statistics.
What: /sys/power/suspend_stats/success
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/success file contains the number
of times entering system sleep state succeeded.
What: /sys/power/suspend_stats/fail
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/fail file contains the number
of times entering system sleep state failed.
What: /sys/power/suspend_stats/failed_freeze
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_freeze file contains the
number of times freezing processes failed.
What: /sys/power/suspend_stats/failed_prepare
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_prepare file contains the
number of times preparing all non-sysdev devices for
a system PM transition failed.
What: /sys/power/suspend_stats/failed_resume
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_resume file contains the
number of times executing "resume" callbacks of
non-sysdev devices failed.
What: /sys/power/suspend_stats/failed_resume_early
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_resume_early file contains
the number of times executing "early resume" callbacks
of devices failed.
What: /sys/power/suspend_stats/failed_resume_noirq
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_resume_noirq file contains
the number of times executing "noirq resume" callbacks
of devices failed.
What: /sys/power/suspend_stats/failed_suspend
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_suspend file contains
the number of times executing "suspend" callbacks
of all non-sysdev devices failed.
What: /sys/power/suspend_stats/failed_suspend_late
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_suspend_late file contains
the number of times executing "late suspend" callbacks
of all devices failed.
What: /sys/power/suspend_stats/failed_suspend_noirq
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_suspend_noirq file contains
the number of times executing "noirq suspend" callbacks
of all devices failed.
What: /sys/power/suspend_stats/last_failed_dev
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/last_failed_dev file contains
the last device for which a suspend/resume callback failed.
What: /sys/power/suspend_stats/last_failed_errno
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/last_failed_errno file contains
the errno of the last failed attempt at entering
system sleep state.
What: /sys/power/suspend_stats/last_failed_step
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/last_failed_step file contains
the last failed step in the suspend/resume path.
What: /sys/power/sync_on_suspend
Date: October 2019
Contact: Jonas Meurer <jonas@freesources.org>
Description:
This file controls whether or not the kernel will sync()
filesystems during system suspend (after freezing user space
and before suspending devices).
Writing a "1" to this file enables the sync() and writing a "0"
disables it. Reads from the file return the current value.
The default is "1" if the build-time "SUSPEND_SKIP_SYNC" config
flag is unset, or "0" otherwise.

View File

@@ -1,46 +0,0 @@
What: /sys/firmware/secvar
Date: August 2019
Contact: Nayna Jain <nayna@linux.ibm.com>
Description: This directory is created if the POWER firmware supports OS
secureboot, thereby secure variables. It exposes interface
for reading/writing the secure variables
What: /sys/firmware/secvar/vars
Date: August 2019
Contact: Nayna Jain <nayna@linux.ibm.com>
Description: This directory lists all the secure variables that are supported
by the firmware.
What: /sys/firmware/secvar/format
Date: August 2019
Contact: Nayna Jain <nayna@linux.ibm.com>
Description: A string indicating which backend is in use by the firmware.
This determines the format of the variable and the accepted
format of variable updates.
What: /sys/firmware/secvar/vars/<variable name>
Date: August 2019
Contact: Nayna Jain <nayna@linux.ibm.com>
Description: Each secure variable is represented as a directory named as
<variable_name>. The variable name is unique and is in ASCII
representation. The data and size can be determined by reading
their respective attribute files.
What: /sys/firmware/secvar/vars/<variable_name>/size
Date: August 2019
Contact: Nayna Jain <nayna@linux.ibm.com>
Description: An integer representation of the size of the content of the
variable. In other words, it represents the size of the data.
What: /sys/firmware/secvar/vars/<variable_name>/data
Date: August 2019
Contact: Nayna Jain h<nayna@linux.ibm.com>
Description: A read-only file containing the value of the variable. The size
of the file represents the maximum size of the variable data.
What: /sys/firmware/secvar/vars/<variable_name>/update
Date: August 2019
Contact: Nayna Jain <nayna@linux.ibm.com>
Description: A write-only file that is used to submit the new value for the
variable. The size of the file represents the maximum size of
the variable data that can be written.

View File

@@ -1,46 +0,0 @@
What: Raise a uevent when a USB charger is inserted or removed
Date: 2020-01-14
KernelVersion: 5.6
Contact: linux-usb@vger.kernel.org
Description: There are two USB charger states:
USB_CHARGER_ABSENT
USB_CHARGER_PRESENT
There are five USB charger types:
USB_CHARGER_UNKNOWN_TYPE: Charger type is unknown
USB_CHARGER_SDP_TYPE: Standard Downstream Port
USB_CHARGER_CDP_TYPE: Charging Downstream Port
USB_CHARGER_DCP_TYPE: Dedicated Charging Port
USB_CHARGER_ACA_TYPE: Accessory Charging Adapter
https://www.usb.org/document-library/battery-charging-v12-spec-and-adopters-agreement
Here are two examples taken using udevadm monitor -p when
USB charger is online:
UDEV change /devices/soc0/usbphynop1 (platform)
ACTION=change
DEVPATH=/devices/soc0/usbphynop1
DRIVER=usb_phy_generic
MODALIAS=of:Nusbphynop1T(null)Cusb-nop-xceiv
OF_COMPATIBLE_0=usb-nop-xceiv
OF_COMPATIBLE_N=1
OF_FULLNAME=/usbphynop1
OF_NAME=usbphynop1
SEQNUM=2493
SUBSYSTEM=platform
USB_CHARGER_STATE=USB_CHARGER_PRESENT
USB_CHARGER_TYPE=USB_CHARGER_SDP_TYPE
USEC_INITIALIZED=227422826
USB charger is offline:
KERNEL change /devices/soc0/usbphynop1 (platform)
ACTION=change
DEVPATH=/devices/soc0/usbphynop1
DRIVER=usb_phy_generic
MODALIAS=of:Nusbphynop1T(null)Cusb-nop-xceiv
OF_COMPATIBLE_0=usb-nop-xceiv
OF_COMPATIBLE_N=1
OF_FULLNAME=/usbphynop1
OF_NAME=usbphynop1
SEQNUM=2494
SUBSYSTEM=platform
USB_CHARGER_STATE=USB_CHARGER_ABSENT
USB_CHARGER_TYPE=USB_CHARGER_UNKNOWN_TYPE

View File

@@ -204,14 +204,6 @@ Returns the maximum size of a mapping for the device. The size parameter
of the mapping functions like dma_map_single(), dma_map_page() and
others should not be larger than the returned value.
::
unsigned long
dma_get_merge_boundary(struct device *dev);
Returns the DMA merge boundary. If the device cannot merge any the DMA address
segments, the function returns 0.
Part Id - Streaming DMA mappings
--------------------------------
@@ -603,6 +595,17 @@ For reasons of efficiency, most platforms choose to track the declared
region only at the granularity of a page. For smaller allocations,
you should use the dma_pool() API.
::
void
dma_release_declared_memory(struct device *dev)
Remove the memory region previously declared from the system. This
API performs *no* in-use checking for this region and will return
unconditionally having removed all the required structures. It is the
driver's job to ensure that no parts of this memory region are
currently in use.
Part III - Debug drivers use of the DMA-API
-------------------------------------------

View File

@@ -5,6 +5,24 @@ DMA attributes
This document describes the semantics of the DMA attributes that are
defined in linux/dma-mapping.h.
DMA_ATTR_WRITE_BARRIER
----------------------
DMA_ATTR_WRITE_BARRIER is a (write) barrier attribute for DMA. DMA
to a memory region with the DMA_ATTR_WRITE_BARRIER attribute forces
all pending DMA writes to complete, and thus provides a mechanism to
strictly order DMA from a device across all intervening busses and
bridges. This barrier is not specific to a particular type of
interconnect, it applies to the system as a whole, and so its
implementation must account for the idiosyncrasies of the system all
the way from the DMA device to memory.
As an example of a situation where DMA_ATTR_WRITE_BARRIER would be
useful, suppose that a device does a DMA write to indicate that data is
ready and available in memory. The DMA of the "completion indication"
could race with data DMA. Mapping the memory used for completion
indications with DMA_ATTR_WRITE_BARRIER would prevent the race.
DMA_ATTR_WEAK_ORDERING
----------------------

View File

@@ -13,7 +13,7 @@ endif
SPHINXBUILD = sphinx-build
SPHINXOPTS =
SPHINXDIRS = .
_SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst))
_SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(srctree)/Documentation/*/conf.py))
SPHINX_CONF = conf.py
PAPER =
BUILDDIR = $(obj)/output
@@ -33,6 +33,8 @@ ifeq ($(HAVE_SPHINX),0)
else # HAVE_SPHINX
export SPHINXOPTS = $(shell perl -e 'open IN,"sphinx-build --version 2>&1 |"; while (<IN>) { if (m/([\d\.]+)/) { print "-jauto" if ($$1 >= "1.7") } ;} close IN')
# User-friendly check for pdflatex and latexmk
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
@@ -65,8 +67,6 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2 && \
PYTHONDONTWRITEBYTECODE=1 \
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
$(PYTHON) $(srctree)/scripts/jobserver-exec \
$(SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \
$(SPHINXBUILD) \
-b $2 \
-c $(abspath $(srctree)/$(src)) \
@@ -128,10 +128,8 @@ dochelp:
@echo ' pdfdocs - PDF'
@echo ' epubdocs - EPUB'
@echo ' xmldocs - XML'
@echo ' linkcheckdocs - check for broken external links'
@echo ' (will connect to external hosts)'
@echo ' refcheckdocs - check for references to non-existing files under'
@echo ' Documentation'
@echo ' linkcheckdocs - check for broken external links (will connect to external hosts)'
@echo ' refcheckdocs - check for references to non-existing files under Documentation'
@echo ' cleandocs - clean all generated files'
@echo
@echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2'

View File

@@ -283,5 +283,5 @@ or disabled (0). If 0 is found in any of the msi_bus files belonging
to bridges between the PCI root and the device, MSIs are disabled.
It is also worth checking the device driver to see whether it supports MSIs.
For example, it may contain calls to pci_alloc_irq_vectors() with the
For example, it may contain calls to pci_irq_alloc_vectors() with the
PCI_IRQ_MSI or PCI_IRQ_MSIX flags.

View File

@@ -421,6 +421,7 @@ That is, the recovery API only requires that:
- drivers/net/ixgbe
- drivers/net/cxgb3
- drivers/net/s2io.c
- drivers/net/qlge
The End
-------

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,668 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><title>A Tour Through TREE_RCU's Expedited Grace Periods</title>
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<h2>Introduction</h2>
This document describes RCU's expedited grace periods.
Unlike RCU's normal grace periods, which accept long latencies to attain
high efficiency and minimal disturbance, expedited grace periods accept
lower efficiency and significant disturbance to attain shorter latencies.
<p>
There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier
third RCU-bh flavor having been implemented in terms of the other two.
Each of the two implementations is covered in its own section.
<ol>
<li> <a href="#Expedited Grace Period Design">
Expedited Grace Period Design</a>
<li> <a href="#RCU-preempt Expedited Grace Periods">
RCU-preempt Expedited Grace Periods</a>
<li> <a href="#RCU-sched Expedited Grace Periods">
RCU-sched Expedited Grace Periods</a>
<li> <a href="#Expedited Grace Period and CPU Hotplug">
Expedited Grace Period and CPU Hotplug</a>
<li> <a href="#Expedited Grace Period Refinements">
Expedited Grace Period Refinements</a>
</ol>
<h2><a name="Expedited Grace Period Design">
Expedited Grace Period Design</a></h2>
<p>
The expedited RCU grace periods cannot be accused of being subtle,
given that they for all intents and purposes hammer every CPU that
has not yet provided a quiescent state for the current expedited
grace period.
The one saving grace is that the hammer has grown a bit smaller
over time: The old call to <tt>try_stop_cpus()</tt> has been
replaced with a set of calls to <tt>smp_call_function_single()</tt>,
each of which results in an IPI to the target CPU.
The corresponding handler function checks the CPU's state, motivating
a faster quiescent state where possible, and triggering a report
of that quiescent state.
As always for RCU, once everything has spent some time in a quiescent
state, the expedited grace period has completed.
<p>
The details of the <tt>smp_call_function_single()</tt> handler's
operation depend on the RCU flavor, as described in the following
sections.
<h2><a name="RCU-preempt Expedited Grace Periods">
RCU-preempt Expedited Grace Periods</a></h2>
<p>
<tt>CONFIG_PREEMPT=y</tt> kernels implement RCU-preempt.
The overall flow of the handling of a given CPU by an RCU-preempt
expedited grace period is shown in the following diagram:
<p><img src="ExpRCUFlow.svg" alt="ExpRCUFlow.svg" width="55%">
<p>
The solid arrows denote direct action, for example, a function call.
The dotted arrows denote indirect action, for example, an IPI
or a state that is reached after some time.
<p>
If a given CPU is offline or idle, <tt>synchronize_rcu_expedited()</tt>
will ignore it because idle and offline CPUs are already residing
in quiescent states.
Otherwise, the expedited grace period will use
<tt>smp_call_function_single()</tt> to send the CPU an IPI, which
is handled by <tt>rcu_exp_handler()</tt>.
<p>
However, because this is preemptible RCU, <tt>rcu_exp_handler()</tt>
can check to see if the CPU is currently running in an RCU read-side
critical section.
If not, the handler can immediately report a quiescent state.
Otherwise, it sets flags so that the outermost <tt>rcu_read_unlock()</tt>
invocation will provide the needed quiescent-state report.
This flag-setting avoids the previous forced preemption of all
CPUs that might have RCU read-side critical sections.
In addition, this flag-setting is done so as to avoid increasing
the overhead of the common-case fastpath through the scheduler.
<p>
Again because this is preemptible RCU, an RCU read-side critical section
can be preempted.
When that happens, RCU will enqueue the task, which will the continue to
block the current expedited grace period until it resumes and finds its
outermost <tt>rcu_read_unlock()</tt>.
The CPU will report a quiescent state just after enqueuing the task because
the CPU is no longer blocking the grace period.
It is instead the preempted task doing the blocking.
The list of blocked tasks is managed by <tt>rcu_preempt_ctxt_queue()</tt>,
which is called from <tt>rcu_preempt_note_context_switch()</tt>, which
in turn is called from <tt>rcu_note_context_switch()</tt>, which in
turn is called from the scheduler.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
Why not just have the expedited grace period check the
state of all the CPUs?
After all, that would avoid all those real-time-unfriendly IPIs.
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
Because we want the RCU read-side critical sections to run fast,
which means no memory barriers.
Therefore, it is not possible to safely check the state from some
other CPU.
And even if it was possible to safely check the state, it would
still be necessary to IPI the CPU to safely interact with the
upcoming <tt>rcu_read_unlock()</tt> invocation, which means that
the remote state testing would not help the worst-case
latency that real-time applications care about.
<p><font color="ffffff">One way to prevent your real-time
application from getting hit with these IPIs is to
build your kernel with <tt>CONFIG_NO_HZ_FULL=y</tt>.
RCU would then perceive the CPU running your application
as being idle, and it would be able to safely detect that
state without needing to IPI the CPU.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<p>
Please note that this is just the overall flow:
Additional complications can arise due to races with CPUs going idle
or offline, among other things.
<h2><a name="RCU-sched Expedited Grace Periods">
RCU-sched Expedited Grace Periods</a></h2>
<p>
<tt>CONFIG_PREEMPT=n</tt> kernels implement RCU-sched.
The overall flow of the handling of a given CPU by an RCU-sched
expedited grace period is shown in the following diagram:
<p><img src="ExpSchedFlow.svg" alt="ExpSchedFlow.svg" width="55%">
<p>
As with RCU-preempt, RCU-sched's
<tt>synchronize_rcu_expedited()</tt> ignores offline and
idle CPUs, again because they are in remotely detectable
quiescent states.
However, because the
<tt>rcu_read_lock_sched()</tt> and <tt>rcu_read_unlock_sched()</tt>
leave no trace of their invocation, in general it is not possible to tell
whether or not the current CPU is in an RCU read-side critical section.
The best that RCU-sched's <tt>rcu_exp_handler()</tt> can do is to check
for idle, on the off-chance that the CPU went idle while the IPI
was in flight.
If the CPU is idle, then <tt>rcu_exp_handler()</tt> reports
the quiescent state.
<p> Otherwise, the handler forces a future context switch by setting the
NEED_RESCHED flag of the current task's thread flag and the CPU preempt
counter.
At the time of the context switch, the CPU reports the quiescent state.
Should the CPU go offline first, it will report the quiescent state
at that time.
<h2><a name="Expedited Grace Period and CPU Hotplug">
Expedited Grace Period and CPU Hotplug</a></h2>
<p>
The expedited nature of expedited grace periods require a much tighter
interaction with CPU hotplug operations than is required for normal
grace periods.
In addition, attempting to IPI offline CPUs will result in splats, but
failing to IPI online CPUs can result in too-short grace periods.
Neither option is acceptable in production kernels.
<p>
The interaction between expedited grace periods and CPU hotplug operations
is carried out at several levels:
<ol>
<li> The number of CPUs that have ever been online is tracked
by the <tt>rcu_state</tt> structure's <tt>-&gt;ncpus</tt>
field.
The <tt>rcu_state</tt> structure's <tt>-&gt;ncpus_snap</tt>
field tracks the number of CPUs that have ever been online
at the beginning of an RCU expedited grace period.
Note that this number never decreases, at least in the absence
of a time machine.
<li> The identities of the CPUs that have ever been online is
tracked by the <tt>rcu_node</tt> structure's
<tt>-&gt;expmaskinitnext</tt> field.
The <tt>rcu_node</tt> structure's <tt>-&gt;expmaskinit</tt>
field tracks the identities of the CPUs that were online
at least once at the beginning of the most recent RCU
expedited grace period.
The <tt>rcu_state</tt> structure's <tt>-&gt;ncpus</tt> and
<tt>-&gt;ncpus_snap</tt> fields are used to detect when
new CPUs have come online for the first time, that is,
when the <tt>rcu_node</tt> structure's <tt>-&gt;expmaskinitnext</tt>
field has changed since the beginning of the last RCU
expedited grace period, which triggers an update of each
<tt>rcu_node</tt> structure's <tt>-&gt;expmaskinit</tt>
field from its <tt>-&gt;expmaskinitnext</tt> field.
<li> Each <tt>rcu_node</tt> structure's <tt>-&gt;expmaskinit</tt>
field is used to initialize that structure's
<tt>-&gt;expmask</tt> at the beginning of each RCU
expedited grace period.
This means that only those CPUs that have been online at least
once will be considered for a given grace period.
<li> Any CPU that goes offline will clear its bit in its leaf
<tt>rcu_node</tt> structure's <tt>-&gt;qsmaskinitnext</tt>
field, so any CPU with that bit clear can safely be ignored.
However, it is possible for a CPU coming online or going offline
to have this bit set for some time while <tt>cpu_online</tt>
returns <tt>false</tt>.
<li> For each non-idle CPU that RCU believes is currently online, the grace
period invokes <tt>smp_call_function_single()</tt>.
If this succeeds, the CPU was fully online.
Failure indicates that the CPU is in the process of coming online
or going offline, in which case it is necessary to wait for a
short time period and try again.
The purpose of this wait (or series of waits, as the case may be)
is to permit a concurrent CPU-hotplug operation to complete.
<li> In the case of RCU-sched, one of the last acts of an outgoing CPU
is to invoke <tt>rcu_report_dead()</tt>, which
reports a quiescent state for that CPU.
However, this is likely paranoia-induced redundancy. <!-- @@@ -->
</ol>
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
Why all the dancing around with multiple counters and masks
tracking CPUs that were once online?
Why not just have a single set of masks tracking the currently
online CPUs and be done with it?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
Maintaining single set of masks tracking the online CPUs <i>sounds</i>
easier, at least until you try working out all the race conditions
between grace-period initialization and CPU-hotplug operations.
For example, suppose initialization is progressing down the
tree while a CPU-offline operation is progressing up the tree.
This situation can result in bits set at the top of the tree
that have no counterparts at the bottom of the tree.
Those bits will never be cleared, which will result in
grace-period hangs.
In short, that way lies madness, to say nothing of a great many
bugs, hangs, and deadlocks.
<p><font color="ffffff">
In contrast, the current multi-mask multi-counter scheme ensures
that grace-period initialization will always see consistent masks
up and down the tree, which brings significant simplifications
over the single-mask method.
<p><font color="ffffff">
This is an instance of
<a href="http://www.cs.columbia.edu/~library/TR-repository/reports/reports-1992/cucs-039-92.ps.gz"><font color="ffffff">
deferring work in order to avoid synchronization</a>.
Lazily recording CPU-hotplug events at the beginning of the next
grace period greatly simplifies maintenance of the CPU-tracking
bitmasks in the <tt>rcu_node</tt> tree.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<h2><a name="Expedited Grace Period Refinements">
Expedited Grace Period Refinements</a></h2>
<ol>
<li> <a href="#Idle-CPU Checks">Idle-CPU checks</a>.
<li> <a href="#Batching via Sequence Counter">
Batching via sequence counter</a>.
<li> <a href="#Funnel Locking and Wait/Wakeup">
Funnel locking and wait/wakeup</a>.
<li> <a href="#Use of Workqueues">Use of Workqueues</a>.
<li> <a href="#Stall Warnings">Stall warnings</a>.
<li> <a href="#Mid-Boot Operation">Mid-boot operation</a>.
</ol>
<h3><a name="Idle-CPU Checks">Idle-CPU Checks</a></h3>
<p>
Each expedited grace period checks for idle CPUs when initially forming
the mask of CPUs to be IPIed and again just before IPIing a CPU
(both checks are carried out by <tt>sync_rcu_exp_select_cpus()</tt>).
If the CPU is idle at any time between those two times, the CPU will
not be IPIed.
Instead, the task pushing the grace period forward will include the
idle CPUs in the mask passed to <tt>rcu_report_exp_cpu_mult()</tt>.
<p>
For RCU-sched, there is an additional check:
If the IPI has interrupted the idle loop, then
<tt>rcu_exp_handler()</tt> invokes <tt>rcu_report_exp_rdp()</tt>
to report the corresponding quiescent state.
<p>
For RCU-preempt, there is no specific check for idle in the
IPI handler (<tt>rcu_exp_handler()</tt>), but because
RCU read-side critical sections are not permitted within the
idle loop, if <tt>rcu_exp_handler()</tt> sees that the CPU is within
RCU read-side critical section, the CPU cannot possibly be idle.
Otherwise, <tt>rcu_exp_handler()</tt> invokes
<tt>rcu_report_exp_rdp()</tt> to report the corresponding quiescent
state, regardless of whether or not that quiescent state was due to
the CPU being idle.
<p>
In summary, RCU expedited grace periods check for idle when building
the bitmask of CPUs that must be IPIed, just before sending each IPI,
and (either explicitly or implicitly) within the IPI handler.
<h3><a name="Batching via Sequence Counter">
Batching via Sequence Counter</a></h3>
<p>
If each grace-period request was carried out separately, expedited
grace periods would have abysmal scalability and
problematic high-load characteristics.
Because each grace-period operation can serve an unlimited number of
updates, it is important to <i>batch</i> requests, so that a single
expedited grace-period operation will cover all requests in the
corresponding batch.
<p>
This batching is controlled by a sequence counter named
<tt>-&gt;expedited_sequence</tt> in the <tt>rcu_state</tt> structure.
This counter has an odd value when there is an expedited grace period
in progress and an even value otherwise, so that dividing the counter
value by two gives the number of completed grace periods.
During any given update request, the counter must transition from
even to odd and then back to even, thus indicating that a grace
period has elapsed.
Therefore, if the initial value of the counter is <tt>s</tt>,
the updater must wait until the counter reaches at least the
value <tt>(s+3)&amp;~0x1</tt>.
This counter is managed by the following access functions:
<ol>
<li> <tt>rcu_exp_gp_seq_start()</tt>, which marks the start of
an expedited grace period.
<li> <tt>rcu_exp_gp_seq_end()</tt>, which marks the end of an
expedited grace period.
<li> <tt>rcu_exp_gp_seq_snap()</tt>, which obtains a snapshot of
the counter.
<li> <tt>rcu_exp_gp_seq_done()</tt>, which returns <tt>true</tt>
if a full expedited grace period has elapsed since the
corresponding call to <tt>rcu_exp_gp_seq_snap()</tt>.
</ol>
<p>
Again, only one request in a given batch need actually carry out
a grace-period operation, which means there must be an efficient
way to identify which of many concurrent reqeusts will initiate
the grace period, and that there be an efficient way for the
remaining requests to wait for that grace period to complete.
However, that is the topic of the next section.
<h3><a name="Funnel Locking and Wait/Wakeup">
Funnel Locking and Wait/Wakeup</a></h3>
<p>
The natural way to sort out which of a batch of updaters will initiate
the expedited grace period is to use the <tt>rcu_node</tt> combining
tree, as implemented by the <tt>exp_funnel_lock()</tt> function.
The first updater corresponding to a given grace period arriving
at a given <tt>rcu_node</tt> structure records its desired grace-period
sequence number in the <tt>-&gt;exp_seq_rq</tt> field and moves up
to the next level in the tree.
Otherwise, if the <tt>-&gt;exp_seq_rq</tt> field already contains
the sequence number for the desired grace period or some later one,
the updater blocks on one of four wait queues in the
<tt>-&gt;exp_wq[]</tt> array, using the second-from-bottom
and third-from bottom bits as an index.
An <tt>-&gt;exp_lock</tt> field in the <tt>rcu_node</tt> structure
synchronizes access to these fields.
<p>
An empty <tt>rcu_node</tt> tree is shown in the following diagram,
with the white cells representing the <tt>-&gt;exp_seq_rq</tt> field
and the red cells representing the elements of the
<tt>-&gt;exp_wq[]</tt> array.
<p><img src="Funnel0.svg" alt="Funnel0.svg" width="75%">
<p>
The next diagram shows the situation after the arrival of Task&nbsp;A
and Task&nbsp;B at the leftmost and rightmost leaf <tt>rcu_node</tt>
structures, respectively.
The current value of the <tt>rcu_state</tt> structure's
<tt>-&gt;expedited_sequence</tt> field is zero, so adding three and
clearing the bottom bit results in the value two, which both tasks
record in the <tt>-&gt;exp_seq_rq</tt> field of their respective
<tt>rcu_node</tt> structures:
<p><img src="Funnel1.svg" alt="Funnel1.svg" width="75%">
<p>
Each of Tasks&nbsp;A and&nbsp;B will move up to the root
<tt>rcu_node</tt> structure.
Suppose that Task&nbsp;A wins, recording its desired grace-period sequence
number and resulting in the state shown below:
<p><img src="Funnel2.svg" alt="Funnel2.svg" width="75%">
<p>
Task&nbsp;A now advances to initiate a new grace period, while Task&nbsp;B
moves up to the root <tt>rcu_node</tt> structure, and, seeing that
its desired sequence number is already recorded, blocks on
<tt>-&gt;exp_wq[1]</tt>.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
Why <tt>-&gt;exp_wq[1]</tt>?
Given that the value of these tasks' desired sequence number is
two, so shouldn't they instead block on <tt>-&gt;exp_wq[2]</tt>?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
No.
<p><font color="ffffff">
Recall that the bottom bit of the desired sequence number indicates
whether or not a grace period is currently in progress.
It is therefore necessary to shift the sequence number right one
bit position to obtain the number of the grace period.
This results in <tt>-&gt;exp_wq[1]</tt>.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<p>
If Tasks&nbsp;C and&nbsp;D also arrive at this point, they will compute the
same desired grace-period sequence number, and see that both leaf
<tt>rcu_node</tt> structures already have that value recorded.
They will therefore block on their respective <tt>rcu_node</tt>
structures' <tt>-&gt;exp_wq[1]</tt> fields, as shown below:
<p><img src="Funnel3.svg" alt="Funnel3.svg" width="75%">
<p>
Task&nbsp;A now acquires the <tt>rcu_state</tt> structure's
<tt>-&gt;exp_mutex</tt> and initiates the grace period, which
increments <tt>-&gt;expedited_sequence</tt>.
Therefore, if Tasks&nbsp;E and&nbsp;F arrive, they will compute
a desired sequence number of 4 and will record this value as
shown below:
<p><img src="Funnel4.svg" alt="Funnel4.svg" width="75%">
<p>
Tasks&nbsp;E and&nbsp;F will propagate up the <tt>rcu_node</tt>
combining tree, with Task&nbsp;F blocking on the root <tt>rcu_node</tt>
structure and Task&nbsp;E wait for Task&nbsp;A to finish so that
it can start the next grace period.
The resulting state is as shown below:
<p><img src="Funnel5.svg" alt="Funnel5.svg" width="75%">
<p>
Once the grace period completes, Task&nbsp;A
starts waking up the tasks waiting for this grace period to complete,
increments the <tt>-&gt;expedited_sequence</tt>,
acquires the <tt>-&gt;exp_wake_mutex</tt> and then releases the
<tt>-&gt;exp_mutex</tt>.
This results in the following state:
<p><img src="Funnel6.svg" alt="Funnel6.svg" width="75%">
<p>
Task&nbsp;E can then acquire <tt>-&gt;exp_mutex</tt> and increment
<tt>-&gt;expedited_sequence</tt> to the value three.
If new tasks&nbsp;G and&nbsp;H arrive and moves up the combining tree at the
same time, the state will be as follows:
<p><img src="Funnel7.svg" alt="Funnel7.svg" width="75%">
<p>
Note that three of the root <tt>rcu_node</tt> structure's
waitqueues are now occupied.
However, at some point, Task&nbsp;A will wake up the
tasks blocked on the <tt>-&gt;exp_wq</tt> waitqueues, resulting
in the following state:
<p><img src="Funnel8.svg" alt="Funnel8.svg" width="75%">
<p>
Execution will continue with Tasks&nbsp;E and&nbsp;H completing
their grace periods and carrying out their wakeups.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
What happens if Task&nbsp;A takes so long to do its wakeups
that Task&nbsp;E's grace period completes?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
Then Task&nbsp;E will block on the <tt>-&gt;exp_wake_mutex</tt>,
which will also prevent it from releasing <tt>-&gt;exp_mutex</tt>,
which in turn will prevent the next grace period from starting.
This last is important in preventing overflow of the
<tt>-&gt;exp_wq[]</tt> array.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<h3><a name="Use of Workqueues">Use of Workqueues</a></h3>
<p>
In earlier implementations, the task requesting the expedited
grace period also drove it to completion.
This straightforward approach had the disadvantage of needing to
account for POSIX signals sent to user tasks,
so more recent implemementations use the Linux kernel's
<a href="https://www.kernel.org/doc/Documentation/core-api/workqueue.rst">workqueues</a>.
<p>
The requesting task still does counter snapshotting and funnel-lock
processing, but the task reaching the top of the funnel lock
does a <tt>schedule_work()</tt> (from <tt>_synchronize_rcu_expedited()</tt>
so that a workqueue kthread does the actual grace-period processing.
Because workqueue kthreads do not accept POSIX signals, grace-period-wait
processing need not allow for POSIX signals.
In addition, this approach allows wakeups for the previous expedited
grace period to be overlapped with processing for the next expedited
grace period.
Because there are only four sets of waitqueues, it is necessary to
ensure that the previous grace period's wakeups complete before the
next grace period's wakeups start.
This is handled by having the <tt>-&gt;exp_mutex</tt>
guard expedited grace-period processing and the
<tt>-&gt;exp_wake_mutex</tt> guard wakeups.
The key point is that the <tt>-&gt;exp_mutex</tt> is not released
until the first wakeup is complete, which means that the
<tt>-&gt;exp_wake_mutex</tt> has already been acquired at that point.
This approach ensures that the previous grace period's wakeups can
be carried out while the current grace period is in process, but
that these wakeups will complete before the next grace period starts.
This means that only three waitqueues are required, guaranteeing that
the four that are provided are sufficient.
<h3><a name="Stall Warnings">Stall Warnings</a></h3>
<p>
Expediting grace periods does nothing to speed things up when RCU
readers take too long, and therefore expedited grace periods check
for stalls just as normal grace periods do.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
But why not just let the normal grace-period machinery
detect the stalls, given that a given reader must block
both normal and expedited grace periods?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
Because it is quite possible that at a given time there
is no normal grace period in progress, in which case the
normal grace period cannot emit a stall warning.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
The <tt>synchronize_sched_expedited_wait()</tt> function loops waiting
for the expedited grace period to end, but with a timeout set to the
current RCU CPU stall-warning time.
If this time is exceeded, any CPUs or <tt>rcu_node</tt> structures
blocking the current grace period are printed.
Each stall warning results in another pass through the loop, but the
second and subsequent passes use longer stall times.
<h3><a name="Mid-Boot Operation">Mid-boot operation</a></h3>
<p>
The use of workqueues has the advantage that the expedited
grace-period code need not worry about POSIX signals.
Unfortunately, it has the
corresponding disadvantage that workqueues cannot be used until
they are initialized, which does not happen until some time after
the scheduler spawns the first task.
Given that there are parts of the kernel that really do want to
execute grace periods during this mid-boot &ldquo;dead zone&rdquo;,
expedited grace periods must do something else during thie time.
<p>
What they do is to fall back to the old practice of requiring that the
requesting task drive the expedited grace period, as was the case
before the use of workqueues.
However, the requesting task is only required to drive the grace period
during the mid-boot dead zone.
Before mid-boot, a synchronous grace period is a no-op.
Some time after mid-boot, workqueues are used.
<p>
Non-expedited non-SRCU synchronous grace periods must also operate
normally during mid-boot.
This is handled by causing non-expedited grace periods to take the
expedited code path during mid-boot.
<p>
The current code assumes that there are no POSIX signals during
the mid-boot dead zone.
However, if an overwhelming need for POSIX signals somehow arises,
appropriate adjustments can be made to the expedited stall-warning code.
One such adjustment would reinstate the pre-workqueue stall-warning
checks, but only during the mid-boot dead zone.
<p>
With this refinement, synchronous grace periods can now be used from
task context pretty much any time during the life of the kernel.
That is, aside from some points in the suspend, hibernate, or shutdown
code path.
<h3><a name="Summary">
Summary</a></h3>
<p>
Expedited grace periods use a sequence-number approach to promote
batching, so that a single grace-period operation can serve numerous
requests.
A funnel lock is used to efficiently identify the one task out of
a concurrent group that will request the grace period.
All members of the group will block on waitqueues provided in
the <tt>rcu_node</tt> structure.
The actual grace-period processing is carried out by a workqueue.
<p>
CPU-hotplug operations are noted lazily in order to prevent the need
for tight synchronization between expedited grace periods and
CPU-hotplug operations.
The dyntick-idle counters are used to avoid sending IPIs to idle CPUs,
at least in the common case.
RCU-preempt and RCU-sched use different IPI handlers and different
code to respond to the state changes carried out by those handlers,
but otherwise use common code.
<p>
Quiescent states are tracked using the <tt>rcu_node</tt> tree,
and once all necessary quiescent states have been reported,
all tasks waiting on this expedited grace period are awakened.
A pair of mutexes are used to allow one grace period's wakeups
to proceed concurrently with the next grace period's processing.
<p>
This combination of mechanisms allows expedited grace periods to
run reasonably efficiently.
However, for non-time-critical tasks, normal grace periods should be
used instead because their longer duration permits much higher
degrees of batching, and thus much lower per-request overheads.
</body></html>

View File

@@ -1,521 +0,0 @@
=================================================
A Tour Through TREE_RCU's Expedited Grace Periods
=================================================
Introduction
============
This document describes RCU's expedited grace periods.
Unlike RCU's normal grace periods, which accept long latencies to attain
high efficiency and minimal disturbance, expedited grace periods accept
lower efficiency and significant disturbance to attain shorter latencies.
There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier
third RCU-bh flavor having been implemented in terms of the other two.
Each of the two implementations is covered in its own section.
Expedited Grace Period Design
=============================
The expedited RCU grace periods cannot be accused of being subtle,
given that they for all intents and purposes hammer every CPU that
has not yet provided a quiescent state for the current expedited
grace period.
The one saving grace is that the hammer has grown a bit smaller
over time: The old call to ``try_stop_cpus()`` has been
replaced with a set of calls to ``smp_call_function_single()``,
each of which results in an IPI to the target CPU.
The corresponding handler function checks the CPU's state, motivating
a faster quiescent state where possible, and triggering a report
of that quiescent state.
As always for RCU, once everything has spent some time in a quiescent
state, the expedited grace period has completed.
The details of the ``smp_call_function_single()`` handler's
operation depend on the RCU flavor, as described in the following
sections.
RCU-preempt Expedited Grace Periods
===================================
``CONFIG_PREEMPT=y`` kernels implement RCU-preempt.
The overall flow of the handling of a given CPU by an RCU-preempt
expedited grace period is shown in the following diagram:
.. kernel-figure:: ExpRCUFlow.svg
The solid arrows denote direct action, for example, a function call.
The dotted arrows denote indirect action, for example, an IPI
or a state that is reached after some time.
If a given CPU is offline or idle, ``synchronize_rcu_expedited()``
will ignore it because idle and offline CPUs are already residing
in quiescent states.
Otherwise, the expedited grace period will use
``smp_call_function_single()`` to send the CPU an IPI, which
is handled by ``rcu_exp_handler()``.
However, because this is preemptible RCU, ``rcu_exp_handler()``
can check to see if the CPU is currently running in an RCU read-side
critical section.
If not, the handler can immediately report a quiescent state.
Otherwise, it sets flags so that the outermost ``rcu_read_unlock()``
invocation will provide the needed quiescent-state report.
This flag-setting avoids the previous forced preemption of all
CPUs that might have RCU read-side critical sections.
In addition, this flag-setting is done so as to avoid increasing
the overhead of the common-case fastpath through the scheduler.
Again because this is preemptible RCU, an RCU read-side critical section
can be preempted.
When that happens, RCU will enqueue the task, which will the continue to
block the current expedited grace period until it resumes and finds its
outermost ``rcu_read_unlock()``.
The CPU will report a quiescent state just after enqueuing the task because
the CPU is no longer blocking the grace period.
It is instead the preempted task doing the blocking.
The list of blocked tasks is managed by ``rcu_preempt_ctxt_queue()``,
which is called from ``rcu_preempt_note_context_switch()``, which
in turn is called from ``rcu_note_context_switch()``, which in
turn is called from the scheduler.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| Why not just have the expedited grace period check the state of all |
| the CPUs? After all, that would avoid all those real-time-unfriendly |
| IPIs. |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| Because we want the RCU read-side critical sections to run fast, |
| which means no memory barriers. Therefore, it is not possible to |
| safely check the state from some other CPU. And even if it was |
| possible to safely check the state, it would still be necessary to |
| IPI the CPU to safely interact with the upcoming |
| ``rcu_read_unlock()`` invocation, which means that the remote state |
| testing would not help the worst-case latency that real-time |
| applications care about. |
| |
| One way to prevent your real-time application from getting hit with |
| these IPIs is to build your kernel with ``CONFIG_NO_HZ_FULL=y``. RCU |
| would then perceive the CPU running your application as being idle, |
| and it would be able to safely detect that state without needing to |
| IPI the CPU. |
+-----------------------------------------------------------------------+
Please note that this is just the overall flow: Additional complications
can arise due to races with CPUs going idle or offline, among other
things.
RCU-sched Expedited Grace Periods
---------------------------------
``CONFIG_PREEMPT=n`` kernels implement RCU-sched. The overall flow of
the handling of a given CPU by an RCU-sched expedited grace period is
shown in the following diagram:
.. kernel-figure:: ExpSchedFlow.svg
As with RCU-preempt, RCU-sched's ``synchronize_rcu_expedited()`` ignores
offline and idle CPUs, again because they are in remotely detectable
quiescent states. However, because the ``rcu_read_lock_sched()`` and
``rcu_read_unlock_sched()`` leave no trace of their invocation, in
general it is not possible to tell whether or not the current CPU is in
an RCU read-side critical section. The best that RCU-sched's
``rcu_exp_handler()`` can do is to check for idle, on the off-chance
that the CPU went idle while the IPI was in flight. If the CPU is idle,
then ``rcu_exp_handler()`` reports the quiescent state.
Otherwise, the handler forces a future context switch by setting the
NEED_RESCHED flag of the current task's thread flag and the CPU preempt
counter. At the time of the context switch, the CPU reports the
quiescent state. Should the CPU go offline first, it will report the
quiescent state at that time.
Expedited Grace Period and CPU Hotplug
--------------------------------------
The expedited nature of expedited grace periods require a much tighter
interaction with CPU hotplug operations than is required for normal
grace periods. In addition, attempting to IPI offline CPUs will result
in splats, but failing to IPI online CPUs can result in too-short grace
periods. Neither option is acceptable in production kernels.
The interaction between expedited grace periods and CPU hotplug
operations is carried out at several levels:
#. The number of CPUs that have ever been online is tracked by the
``rcu_state`` structure's ``->ncpus`` field. The ``rcu_state``
structure's ``->ncpus_snap`` field tracks the number of CPUs that
have ever been online at the beginning of an RCU expedited grace
period. Note that this number never decreases, at least in the
absence of a time machine.
#. The identities of the CPUs that have ever been online is tracked by
the ``rcu_node`` structure's ``->expmaskinitnext`` field. The
``rcu_node`` structure's ``->expmaskinit`` field tracks the
identities of the CPUs that were online at least once at the
beginning of the most recent RCU expedited grace period. The
``rcu_state`` structure's ``->ncpus`` and ``->ncpus_snap`` fields are
used to detect when new CPUs have come online for the first time,
that is, when the ``rcu_node`` structure's ``->expmaskinitnext``
field has changed since the beginning of the last RCU expedited grace
period, which triggers an update of each ``rcu_node`` structure's
``->expmaskinit`` field from its ``->expmaskinitnext`` field.
#. Each ``rcu_node`` structure's ``->expmaskinit`` field is used to
initialize that structure's ``->expmask`` at the beginning of each
RCU expedited grace period. This means that only those CPUs that have
been online at least once will be considered for a given grace
period.
#. Any CPU that goes offline will clear its bit in its leaf ``rcu_node``
structure's ``->qsmaskinitnext`` field, so any CPU with that bit
clear can safely be ignored. However, it is possible for a CPU coming
online or going offline to have this bit set for some time while
``cpu_online`` returns ``false``.
#. For each non-idle CPU that RCU believes is currently online, the
grace period invokes ``smp_call_function_single()``. If this
succeeds, the CPU was fully online. Failure indicates that the CPU is
in the process of coming online or going offline, in which case it is
necessary to wait for a short time period and try again. The purpose
of this wait (or series of waits, as the case may be) is to permit a
concurrent CPU-hotplug operation to complete.
#. In the case of RCU-sched, one of the last acts of an outgoing CPU is
to invoke ``rcu_report_dead()``, which reports a quiescent state for
that CPU. However, this is likely paranoia-induced redundancy.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| Why all the dancing around with multiple counters and masks tracking |
| CPUs that were once online? Why not just have a single set of masks |
| tracking the currently online CPUs and be done with it? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| Maintaining single set of masks tracking the online CPUs *sounds* |
| easier, at least until you try working out all the race conditions |
| between grace-period initialization and CPU-hotplug operations. For |
| example, suppose initialization is progressing down the tree while a |
| CPU-offline operation is progressing up the tree. This situation can |
| result in bits set at the top of the tree that have no counterparts |
| at the bottom of the tree. Those bits will never be cleared, which |
| will result in grace-period hangs. In short, that way lies madness, |
| to say nothing of a great many bugs, hangs, and deadlocks. |
| In contrast, the current multi-mask multi-counter scheme ensures that |
| grace-period initialization will always see consistent masks up and |
| down the tree, which brings significant simplifications over the |
| single-mask method. |
| |
| This is an instance of `deferring work in order to avoid |
| synchronization <http://www.cs.columbia.edu/~library/TR-repository/re |
| ports/reports-1992/cucs-039-92.ps.gz>`__. |
| Lazily recording CPU-hotplug events at the beginning of the next |
| grace period greatly simplifies maintenance of the CPU-tracking |
| bitmasks in the ``rcu_node`` tree. |
+-----------------------------------------------------------------------+
Expedited Grace Period Refinements
----------------------------------
Idle-CPU Checks
~~~~~~~~~~~~~~~
Each expedited grace period checks for idle CPUs when initially forming
the mask of CPUs to be IPIed and again just before IPIing a CPU (both
checks are carried out by ``sync_rcu_exp_select_cpus()``). If the CPU is
idle at any time between those two times, the CPU will not be IPIed.
Instead, the task pushing the grace period forward will include the idle
CPUs in the mask passed to ``rcu_report_exp_cpu_mult()``.
For RCU-sched, there is an additional check: If the IPI has interrupted
the idle loop, then ``rcu_exp_handler()`` invokes
``rcu_report_exp_rdp()`` to report the corresponding quiescent state.
For RCU-preempt, there is no specific check for idle in the IPI handler
(``rcu_exp_handler()``), but because RCU read-side critical sections are
not permitted within the idle loop, if ``rcu_exp_handler()`` sees that
the CPU is within RCU read-side critical section, the CPU cannot
possibly be idle. Otherwise, ``rcu_exp_handler()`` invokes
``rcu_report_exp_rdp()`` to report the corresponding quiescent state,
regardless of whether or not that quiescent state was due to the CPU
being idle.
In summary, RCU expedited grace periods check for idle when building the
bitmask of CPUs that must be IPIed, just before sending each IPI, and
(either explicitly or implicitly) within the IPI handler.
Batching via Sequence Counter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If each grace-period request was carried out separately, expedited grace
periods would have abysmal scalability and problematic high-load
characteristics. Because each grace-period operation can serve an
unlimited number of updates, it is important to *batch* requests, so
that a single expedited grace-period operation will cover all requests
in the corresponding batch.
This batching is controlled by a sequence counter named
``->expedited_sequence`` in the ``rcu_state`` structure. This counter
has an odd value when there is an expedited grace period in progress and
an even value otherwise, so that dividing the counter value by two gives
the number of completed grace periods. During any given update request,
the counter must transition from even to odd and then back to even, thus
indicating that a grace period has elapsed. Therefore, if the initial
value of the counter is ``s``, the updater must wait until the counter
reaches at least the value ``(s+3)&~0x1``. This counter is managed by
the following access functions:
#. ``rcu_exp_gp_seq_start()``, which marks the start of an expedited
grace period.
#. ``rcu_exp_gp_seq_end()``, which marks the end of an expedited grace
period.
#. ``rcu_exp_gp_seq_snap()``, which obtains a snapshot of the counter.
#. ``rcu_exp_gp_seq_done()``, which returns ``true`` if a full expedited
grace period has elapsed since the corresponding call to
``rcu_exp_gp_seq_snap()``.
Again, only one request in a given batch need actually carry out a
grace-period operation, which means there must be an efficient way to
identify which of many concurrent reqeusts will initiate the grace
period, and that there be an efficient way for the remaining requests to
wait for that grace period to complete. However, that is the topic of
the next section.
Funnel Locking and Wait/Wakeup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The natural way to sort out which of a batch of updaters will initiate
the expedited grace period is to use the ``rcu_node`` combining tree, as
implemented by the ``exp_funnel_lock()`` function. The first updater
corresponding to a given grace period arriving at a given ``rcu_node``
structure records its desired grace-period sequence number in the
``->exp_seq_rq`` field and moves up to the next level in the tree.
Otherwise, if the ``->exp_seq_rq`` field already contains the sequence
number for the desired grace period or some later one, the updater
blocks on one of four wait queues in the ``->exp_wq[]`` array, using the
second-from-bottom and third-from bottom bits as an index. An
``->exp_lock`` field in the ``rcu_node`` structure synchronizes access
to these fields.
An empty ``rcu_node`` tree is shown in the following diagram, with the
white cells representing the ``->exp_seq_rq`` field and the red cells
representing the elements of the ``->exp_wq[]`` array.
.. kernel-figure:: Funnel0.svg
The next diagram shows the situation after the arrival of Task A and
Task B at the leftmost and rightmost leaf ``rcu_node`` structures,
respectively. The current value of the ``rcu_state`` structure's
``->expedited_sequence`` field is zero, so adding three and clearing the
bottom bit results in the value two, which both tasks record in the
``->exp_seq_rq`` field of their respective ``rcu_node`` structures:
.. kernel-figure:: Funnel1.svg
Each of Tasks A and B will move up to the root ``rcu_node`` structure.
Suppose that Task A wins, recording its desired grace-period sequence
number and resulting in the state shown below:
.. kernel-figure:: Funnel2.svg
Task A now advances to initiate a new grace period, while Task B moves
up to the root ``rcu_node`` structure, and, seeing that its desired
sequence number is already recorded, blocks on ``->exp_wq[1]``.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| Why ``->exp_wq[1]``? Given that the value of these tasks' desired |
| sequence number is two, so shouldn't they instead block on |
| ``->exp_wq[2]``? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| No. |
| Recall that the bottom bit of the desired sequence number indicates |
| whether or not a grace period is currently in progress. It is |
| therefore necessary to shift the sequence number right one bit |
| position to obtain the number of the grace period. This results in |
| ``->exp_wq[1]``. |
+-----------------------------------------------------------------------+
If Tasks C and D also arrive at this point, they will compute the same
desired grace-period sequence number, and see that both leaf
``rcu_node`` structures already have that value recorded. They will
therefore block on their respective ``rcu_node`` structures'
``->exp_wq[1]`` fields, as shown below:
.. kernel-figure:: Funnel3.svg
Task A now acquires the ``rcu_state`` structure's ``->exp_mutex`` and
initiates the grace period, which increments ``->expedited_sequence``.
Therefore, if Tasks E and F arrive, they will compute a desired sequence
number of 4 and will record this value as shown below:
.. kernel-figure:: Funnel4.svg
Tasks E and F will propagate up the ``rcu_node`` combining tree, with
Task F blocking on the root ``rcu_node`` structure and Task E wait for
Task A to finish so that it can start the next grace period. The
resulting state is as shown below:
.. kernel-figure:: Funnel5.svg
Once the grace period completes, Task A starts waking up the tasks
waiting for this grace period to complete, increments the
``->expedited_sequence``, acquires the ``->exp_wake_mutex`` and then
releases the ``->exp_mutex``. This results in the following state:
.. kernel-figure:: Funnel6.svg
Task E can then acquire ``->exp_mutex`` and increment
``->expedited_sequence`` to the value three. If new tasks G and H arrive
and moves up the combining tree at the same time, the state will be as
follows:
.. kernel-figure:: Funnel7.svg
Note that three of the root ``rcu_node`` structure's waitqueues are now
occupied. However, at some point, Task A will wake up the tasks blocked
on the ``->exp_wq`` waitqueues, resulting in the following state:
.. kernel-figure:: Funnel8.svg
Execution will continue with Tasks E and H completing their grace
periods and carrying out their wakeups.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| What happens if Task A takes so long to do its wakeups that Task E's |
| grace period completes? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| Then Task E will block on the ``->exp_wake_mutex``, which will also |
| prevent it from releasing ``->exp_mutex``, which in turn will prevent |
| the next grace period from starting. This last is important in |
| preventing overflow of the ``->exp_wq[]`` array. |
+-----------------------------------------------------------------------+
Use of Workqueues
~~~~~~~~~~~~~~~~~
In earlier implementations, the task requesting the expedited grace
period also drove it to completion. This straightforward approach had
the disadvantage of needing to account for POSIX signals sent to user
tasks, so more recent implemementations use the Linux kernel's
`workqueues <https://www.kernel.org/doc/Documentation/core-api/workqueue.rst>`__.
The requesting task still does counter snapshotting and funnel-lock
processing, but the task reaching the top of the funnel lock does a
``schedule_work()`` (from ``_synchronize_rcu_expedited()`` so that a
workqueue kthread does the actual grace-period processing. Because
workqueue kthreads do not accept POSIX signals, grace-period-wait
processing need not allow for POSIX signals. In addition, this approach
allows wakeups for the previous expedited grace period to be overlapped
with processing for the next expedited grace period. Because there are
only four sets of waitqueues, it is necessary to ensure that the
previous grace period's wakeups complete before the next grace period's
wakeups start. This is handled by having the ``->exp_mutex`` guard
expedited grace-period processing and the ``->exp_wake_mutex`` guard
wakeups. The key point is that the ``->exp_mutex`` is not released until
the first wakeup is complete, which means that the ``->exp_wake_mutex``
has already been acquired at that point. This approach ensures that the
previous grace period's wakeups can be carried out while the current
grace period is in process, but that these wakeups will complete before
the next grace period starts. This means that only three waitqueues are
required, guaranteeing that the four that are provided are sufficient.
Stall Warnings
~~~~~~~~~~~~~~
Expediting grace periods does nothing to speed things up when RCU
readers take too long, and therefore expedited grace periods check for
stalls just as normal grace periods do.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| But why not just let the normal grace-period machinery detect the |
| stalls, given that a given reader must block both normal and |
| expedited grace periods? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| Because it is quite possible that at a given time there is no normal |
| grace period in progress, in which case the normal grace period |
| cannot emit a stall warning. |
+-----------------------------------------------------------------------+
The ``synchronize_sched_expedited_wait()`` function loops waiting for
the expedited grace period to end, but with a timeout set to the current
RCU CPU stall-warning time. If this time is exceeded, any CPUs or
``rcu_node`` structures blocking the current grace period are printed.
Each stall warning results in another pass through the loop, but the
second and subsequent passes use longer stall times.
Mid-boot operation
~~~~~~~~~~~~~~~~~~
The use of workqueues has the advantage that the expedited grace-period
code need not worry about POSIX signals. Unfortunately, it has the
corresponding disadvantage that workqueues cannot be used until they are
initialized, which does not happen until some time after the scheduler
spawns the first task. Given that there are parts of the kernel that
really do want to execute grace periods during this mid-boot “dead
zone”, expedited grace periods must do something else during thie time.
What they do is to fall back to the old practice of requiring that the
requesting task drive the expedited grace period, as was the case before
the use of workqueues. However, the requesting task is only required to
drive the grace period during the mid-boot dead zone. Before mid-boot, a
synchronous grace period is a no-op. Some time after mid-boot,
workqueues are used.
Non-expedited non-SRCU synchronous grace periods must also operate
normally during mid-boot. This is handled by causing non-expedited grace
periods to take the expedited code path during mid-boot.
The current code assumes that there are no POSIX signals during the
mid-boot dead zone. However, if an overwhelming need for POSIX signals
somehow arises, appropriate adjustments can be made to the expedited
stall-warning code. One such adjustment would reinstate the
pre-workqueue stall-warning checks, but only during the mid-boot dead
zone.
With this refinement, synchronous grace periods can now be used from
task context pretty much any time during the life of the kernel. That
is, aside from some points in the suspend, hibernate, or shutdown code
path.
Summary
~~~~~~~
Expedited grace periods use a sequence-number approach to promote
batching, so that a single grace-period operation can serve numerous
requests. A funnel lock is used to efficiently identify the one task out
of a concurrent group that will request the grace period. All members of
the group will block on waitqueues provided in the ``rcu_node``
structure. The actual grace-period processing is carried out by a
workqueue.
CPU-hotplug operations are noted lazily in order to prevent the need for
tight synchronization between expedited grace periods and CPU-hotplug
operations. The dyntick-idle counters are used to avoid sending IPIs to
idle CPUs, at least in the common case. RCU-preempt and RCU-sched use
different IPI handlers and different code to respond to the state
changes carried out by those handlers, but otherwise use common code.
Quiescent states are tracked using the ``rcu_node`` tree, and once all
necessary quiescent states have been reported, all tasks waiting on this
expedited grace period are awakened. A pair of mutexes are used to allow
one grace period's wakeups to proceed concurrently with the next grace
period's processing.
This combination of mechanisms allows expedited grace periods to run
reasonably efficiently. However, for non-time-critical tasks, normal
grace periods should be used instead because their longer duration
permits much higher degrees of batching, and thus much lower per-request
overheads.

View File

@@ -0,0 +1,9 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><title>A Diagram of TREE_RCU's Grace-Period Memory Ordering</title>
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<p><img src="TreeRCU-gp.svg" alt="TreeRCU-gp.svg">
</body></html>

View File

@@ -0,0 +1,704 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><title>A Tour Through TREE_RCU's Grace-Period Memory Ordering</title>
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<p>August 8, 2017</p>
<p>This article was contributed by Paul E.&nbsp;McKenney</p>
<h3>Introduction</h3>
<p>This document gives a rough visual overview of how Tree RCU's
grace-period memory ordering guarantee is provided.
<ol>
<li> <a href="#What Is Tree RCU's Grace Period Memory Ordering Guarantee?">
What Is Tree RCU's Grace Period Memory Ordering Guarantee?</a>
<li> <a href="#Tree RCU Grace Period Memory Ordering Building Blocks">
Tree RCU Grace Period Memory Ordering Building Blocks</a>
<li> <a href="#Tree RCU Grace Period Memory Ordering Components">
Tree RCU Grace Period Memory Ordering Components</a>
<li> <a href="#Putting It All Together">Putting It All Together</a>
</ol>
<h3><a name="What Is Tree RCU's Grace Period Memory Ordering Guarantee?">
What Is Tree RCU's Grace Period Memory Ordering Guarantee?</a></h3>
<p>RCU grace periods provide extremely strong memory-ordering guarantees
for non-idle non-offline code.
Any code that happens after the end of a given RCU grace period is guaranteed
to see the effects of all accesses prior to the beginning of that grace
period that are within RCU read-side critical sections.
Similarly, any code that happens before the beginning of a given RCU grace
period is guaranteed to see the effects of all accesses following the end
of that grace period that are within RCU read-side critical sections.
<p>Note well that RCU-sched read-side critical sections include any region
of code for which preemption is disabled.
Given that each individual machine instruction can be thought of as
an extremely small region of preemption-disabled code, one can think of
<tt>synchronize_rcu()</tt> as <tt>smp_mb()</tt> on steroids.
<p>RCU updaters use this guarantee by splitting their updates into
two phases, one of which is executed before the grace period and
the other of which is executed after the grace period.
In the most common use case, phase one removes an element from
a linked RCU-protected data structure, and phase two frees that element.
For this to work, any readers that have witnessed state prior to the
phase-one update (in the common case, removal) must not witness state
following the phase-two update (in the common case, freeing).
<p>The RCU implementation provides this guarantee using a network
of lock-based critical sections, memory barriers, and per-CPU
processing, as is described in the following sections.
<h3><a name="Tree RCU Grace Period Memory Ordering Building Blocks">
Tree RCU Grace Period Memory Ordering Building Blocks</a></h3>
<p>The workhorse for RCU's grace-period memory ordering is the
critical section for the <tt>rcu_node</tt> structure's
<tt>-&gt;lock</tt>.
These critical sections use helper functions for lock acquisition, including
<tt>raw_spin_lock_rcu_node()</tt>,
<tt>raw_spin_lock_irq_rcu_node()</tt>, and
<tt>raw_spin_lock_irqsave_rcu_node()</tt>.
Their lock-release counterparts are
<tt>raw_spin_unlock_rcu_node()</tt>,
<tt>raw_spin_unlock_irq_rcu_node()</tt>, and
<tt>raw_spin_unlock_irqrestore_rcu_node()</tt>,
respectively.
For completeness, a
<tt>raw_spin_trylock_rcu_node()</tt>
is also provided.
The key point is that the lock-acquisition functions, including
<tt>raw_spin_trylock_rcu_node()</tt>, all invoke
<tt>smp_mb__after_unlock_lock()</tt> immediately after successful
acquisition of the lock.
<p>Therefore, for any given <tt>rcu_node</tt> structure, any access
happening before one of the above lock-release functions will be seen
by all CPUs as happening before any access happening after a later
one of the above lock-acquisition functions.
Furthermore, any access happening before one of the
above lock-release function on any given CPU will be seen by all
CPUs as happening before any access happening after a later one
of the above lock-acquisition functions executing on that same CPU,
even if the lock-release and lock-acquisition functions are operating
on different <tt>rcu_node</tt> structures.
Tree RCU uses these two ordering guarantees to form an ordering
network among all CPUs that were in any way involved in the grace
period, including any CPUs that came online or went offline during
the grace period in question.
<p>The following litmus test exhibits the ordering effects of these
lock-acquisition and lock-release functions:
<pre>
1 int x, y, z;
2
3 void task0(void)
4 {
5 raw_spin_lock_rcu_node(rnp);
6 WRITE_ONCE(x, 1);
7 r1 = READ_ONCE(y);
8 raw_spin_unlock_rcu_node(rnp);
9 }
10
11 void task1(void)
12 {
13 raw_spin_lock_rcu_node(rnp);
14 WRITE_ONCE(y, 1);
15 r2 = READ_ONCE(z);
16 raw_spin_unlock_rcu_node(rnp);
17 }
18
19 void task2(void)
20 {
21 WRITE_ONCE(z, 1);
22 smp_mb();
23 r3 = READ_ONCE(x);
24 }
25
26 WARN_ON(r1 == 0 &amp;&amp; r2 == 0 &amp;&amp; r3 == 0);
</pre>
<p>The <tt>WARN_ON()</tt> is evaluated at &ldquo;the end of time&rdquo;,
after all changes have propagated throughout the system.
Without the <tt>smp_mb__after_unlock_lock()</tt> provided by the
acquisition functions, this <tt>WARN_ON()</tt> could trigger, for example
on PowerPC.
The <tt>smp_mb__after_unlock_lock()</tt> invocations prevent this
<tt>WARN_ON()</tt> from triggering.
<p>This approach must be extended to include idle CPUs, which need
RCU's grace-period memory ordering guarantee to extend to any
RCU read-side critical sections preceding and following the current
idle sojourn.
This case is handled by calls to the strongly ordered
<tt>atomic_add_return()</tt> read-modify-write atomic operation that
is invoked within <tt>rcu_dynticks_eqs_enter()</tt> at idle-entry
time and within <tt>rcu_dynticks_eqs_exit()</tt> at idle-exit time.
The grace-period kthread invokes <tt>rcu_dynticks_snap()</tt> and
<tt>rcu_dynticks_in_eqs_since()</tt> (both of which invoke
an <tt>atomic_add_return()</tt> of zero) to detect idle CPUs.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
But what about CPUs that remain offline for the entire
grace period?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
Such CPUs will be offline at the beginning of the grace period,
so the grace period won't expect quiescent states from them.
Races between grace-period start and CPU-hotplug operations
are mediated by the CPU's leaf <tt>rcu_node</tt> structure's
<tt>-&gt;lock</tt> as described above.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<p>The approach must be extended to handle one final case, that
of waking a task blocked in <tt>synchronize_rcu()</tt>.
This task might be affinitied to a CPU that is not yet aware that
the grace period has ended, and thus might not yet be subject to
the grace period's memory ordering.
Therefore, there is an <tt>smp_mb()</tt> after the return from
<tt>wait_for_completion()</tt> in the <tt>synchronize_rcu()</tt>
code path.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
What? Where???
I don't see any <tt>smp_mb()</tt> after the return from
<tt>wait_for_completion()</tt>!!!
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
That would be because I spotted the need for that
<tt>smp_mb()</tt> during the creation of this documentation,
and it is therefore unlikely to hit mainline before v4.14.
Kudos to Lance Roy, Will Deacon, Peter Zijlstra, and
Jonathan Cameron for asking questions that sensitized me
to the rather elaborate sequence of events that demonstrate
the need for this memory barrier.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<p>Tree RCU's grace--period memory-ordering guarantees rely most
heavily on the <tt>rcu_node</tt> structure's <tt>-&gt;lock</tt>
field, so much so that it is necessary to abbreviate this pattern
in the diagrams in the next section.
For example, consider the <tt>rcu_prepare_for_idle()</tt> function
shown below, which is one of several functions that enforce ordering
of newly arrived RCU callbacks against future grace periods:
<pre>
1 static void rcu_prepare_for_idle(void)
2 {
3 bool needwake;
4 struct rcu_data *rdp;
5 struct rcu_dynticks *rdtp = this_cpu_ptr(&amp;rcu_dynticks);
6 struct rcu_node *rnp;
7 struct rcu_state *rsp;
8 int tne;
9
10 if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_ALL) ||
11 rcu_is_nocb_cpu(smp_processor_id()))
12 return;
13 tne = READ_ONCE(tick_nohz_active);
14 if (tne != rdtp-&gt;tick_nohz_enabled_snap) {
15 if (rcu_cpu_has_callbacks(NULL))
16 invoke_rcu_core();
17 rdtp-&gt;tick_nohz_enabled_snap = tne;
18 return;
19 }
20 if (!tne)
21 return;
22 if (rdtp-&gt;all_lazy &amp;&amp;
23 rdtp-&gt;nonlazy_posted != rdtp-&gt;nonlazy_posted_snap) {
24 rdtp-&gt;all_lazy = false;
25 rdtp-&gt;nonlazy_posted_snap = rdtp-&gt;nonlazy_posted;
26 invoke_rcu_core();
27 return;
28 }
29 if (rdtp-&gt;last_accelerate == jiffies)
30 return;
31 rdtp-&gt;last_accelerate = jiffies;
32 for_each_rcu_flavor(rsp) {
33 rdp = this_cpu_ptr(rsp-&gt;rda);
34 if (rcu_segcblist_pend_cbs(&amp;rdp-&gt;cblist))
35 continue;
36 rnp = rdp-&gt;mynode;
37 raw_spin_lock_rcu_node(rnp);
38 needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
39 raw_spin_unlock_rcu_node(rnp);
40 if (needwake)
41 rcu_gp_kthread_wake(rsp);
42 }
43 }
</pre>
<p>But the only part of <tt>rcu_prepare_for_idle()</tt> that really
matters for this discussion are lines&nbsp;37&ndash;39.
We will therefore abbreviate this function as follows:
</p><p><img src="rcu_node-lock.svg" alt="rcu_node-lock.svg">
<p>The box represents the <tt>rcu_node</tt> structure's <tt>-&gt;lock</tt>
critical section, with the double line on top representing the additional
<tt>smp_mb__after_unlock_lock()</tt>.
<h3><a name="Tree RCU Grace Period Memory Ordering Components">
Tree RCU Grace Period Memory Ordering Components</a></h3>
<p>Tree RCU's grace-period memory-ordering guarantee is provided by
a number of RCU components:
<ol>
<li> <a href="#Callback Registry">Callback Registry</a>
<li> <a href="#Grace-Period Initialization">Grace-Period Initialization</a>
<li> <a href="#Self-Reported Quiescent States">
Self-Reported Quiescent States</a>
<li> <a href="#Dynamic Tick Interface">Dynamic Tick Interface</a>
<li> <a href="#CPU-Hotplug Interface">CPU-Hotplug Interface</a>
<li> <a href="Forcing Quiescent States">Forcing Quiescent States</a>
<li> <a href="Grace-Period Cleanup">Grace-Period Cleanup</a>
<li> <a href="Callback Invocation">Callback Invocation</a>
</ol>
<p>Each of the following section looks at the corresponding component
in detail.
<h4><a name="Callback Registry">Callback Registry</a></h4>
<p>If RCU's grace-period guarantee is to mean anything at all, any
access that happens before a given invocation of <tt>call_rcu()</tt>
must also happen before the corresponding grace period.
The implementation of this portion of RCU's grace period guarantee
is shown in the following figure:
</p><p><img src="TreeRCU-callback-registry.svg" alt="TreeRCU-callback-registry.svg">
<p>Because <tt>call_rcu()</tt> normally acts only on CPU-local state,
it provides no ordering guarantees, either for itself or for
phase one of the update (which again will usually be removal of
an element from an RCU-protected data structure).
It simply enqueues the <tt>rcu_head</tt> structure on a per-CPU list,
which cannot become associated with a grace period until a later
call to <tt>rcu_accelerate_cbs()</tt>, as shown in the diagram above.
<p>One set of code paths shown on the left invokes
<tt>rcu_accelerate_cbs()</tt> via
<tt>note_gp_changes()</tt>, either directly from <tt>call_rcu()</tt> (if
the current CPU is inundated with queued <tt>rcu_head</tt> structures)
or more likely from an <tt>RCU_SOFTIRQ</tt> handler.
Another code path in the middle is taken only in kernels built with
<tt>CONFIG_RCU_FAST_NO_HZ=y</tt>, which invokes
<tt>rcu_accelerate_cbs()</tt> via <tt>rcu_prepare_for_idle()</tt>.
The final code path on the right is taken only in kernels built with
<tt>CONFIG_HOTPLUG_CPU=y</tt>, which invokes
<tt>rcu_accelerate_cbs()</tt> via
<tt>rcu_advance_cbs()</tt>, <tt>rcu_migrate_callbacks</tt>,
<tt>rcutree_migrate_callbacks()</tt>, and <tt>takedown_cpu()</tt>,
which in turn is invoked on a surviving CPU after the outgoing
CPU has been completely offlined.
<p>There are a few other code paths within grace-period processing
that opportunistically invoke <tt>rcu_accelerate_cbs()</tt>.
However, either way, all of the CPU's recently queued <tt>rcu_head</tt>
structures are associated with a future grace-period number under
the protection of the CPU's lead <tt>rcu_node</tt> structure's
<tt>-&gt;lock</tt>.
In all cases, there is full ordering against any prior critical section
for that same <tt>rcu_node</tt> structure's <tt>-&gt;lock</tt>, and
also full ordering against any of the current task's or CPU's prior critical
sections for any <tt>rcu_node</tt> structure's <tt>-&gt;lock</tt>.
<p>The next section will show how this ordering ensures that any
accesses prior to the <tt>call_rcu()</tt> (particularly including phase
one of the update)
happen before the start of the corresponding grace period.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
But what about <tt>synchronize_rcu()</tt>?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
The <tt>synchronize_rcu()</tt> passes <tt>call_rcu()</tt>
to <tt>wait_rcu_gp()</tt>, which invokes it.
So either way, it eventually comes down to <tt>call_rcu()</tt>.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<h4><a name="Grace-Period Initialization">Grace-Period Initialization</a></h4>
<p>Grace-period initialization is carried out by
the grace-period kernel thread, which makes several passes over the
<tt>rcu_node</tt> tree within the <tt>rcu_gp_init()</tt> function.
This means that showing the full flow of ordering through the
grace-period computation will require duplicating this tree.
If you find this confusing, please note that the state of the
<tt>rcu_node</tt> changes over time, just like Heraclitus's river.
However, to keep the <tt>rcu_node</tt> river tractable, the
grace-period kernel thread's traversals are presented in multiple
parts, starting in this section with the various phases of
grace-period initialization.
<p>The first ordering-related grace-period initialization action is to
advance the <tt>rcu_state</tt> structure's <tt>-&gt;gp_seq</tt>
grace-period-number counter, as shown below:
</p><p><img src="TreeRCU-gp-init-1.svg" alt="TreeRCU-gp-init-1.svg" width="75%">
<p>The actual increment is carried out using <tt>smp_store_release()</tt>,
which helps reject false-positive RCU CPU stall detection.
Note that only the root <tt>rcu_node</tt> structure is touched.
<p>The first pass through the <tt>rcu_node</tt> tree updates bitmasks
based on CPUs having come online or gone offline since the start of
the previous grace period.
In the common case where the number of online CPUs for this <tt>rcu_node</tt>
structure has not transitioned to or from zero,
this pass will scan only the leaf <tt>rcu_node</tt> structures.
However, if the number of online CPUs for a given leaf <tt>rcu_node</tt>
structure has transitioned from zero,
<tt>rcu_init_new_rnp()</tt> will be invoked for the first incoming CPU.
Similarly, if the number of online CPUs for a given leaf <tt>rcu_node</tt>
structure has transitioned to zero,
<tt>rcu_cleanup_dead_rnp()</tt> will be invoked for the last outgoing CPU.
The diagram below shows the path of ordering if the leftmost
<tt>rcu_node</tt> structure onlines its first CPU and if the next
<tt>rcu_node</tt> structure has no online CPUs
(or, alternatively if the leftmost <tt>rcu_node</tt> structure offlines
its last CPU and if the next <tt>rcu_node</tt> structure has no online CPUs).
</p><p><img src="TreeRCU-gp-init-2.svg" alt="TreeRCU-gp-init-1.svg" width="75%">
<p>The final <tt>rcu_gp_init()</tt> pass through the <tt>rcu_node</tt>
tree traverses breadth-first, setting each <tt>rcu_node</tt> structure's
<tt>-&gt;gp_seq</tt> field to the newly advanced value from the
<tt>rcu_state</tt> structure, as shown in the following diagram.
</p><p><img src="TreeRCU-gp-init-3.svg" alt="TreeRCU-gp-init-1.svg" width="75%">
<p>This change will also cause each CPU's next call to
<tt>__note_gp_changes()</tt>
to notice that a new grace period has started, as described in the next
section.
But because the grace-period kthread started the grace period at the
root (with the advancing of the <tt>rcu_state</tt> structure's
<tt>-&gt;gp_seq</tt> field) before setting each leaf <tt>rcu_node</tt>
structure's <tt>-&gt;gp_seq</tt> field, each CPU's observation of
the start of the grace period will happen after the actual start
of the grace period.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
But what about the CPU that started the grace period?
Why wouldn't it see the start of the grace period right when
it started that grace period?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
In some deep philosophical and overly anthromorphized
sense, yes, the CPU starting the grace period is immediately
aware of having done so.
However, if we instead assume that RCU is not self-aware,
then even the CPU starting the grace period does not really
become aware of the start of this grace period until its
first call to <tt>__note_gp_changes()</tt>.
On the other hand, this CPU potentially gets early notification
because it invokes <tt>__note_gp_changes()</tt> during its
last <tt>rcu_gp_init()</tt> pass through its leaf
<tt>rcu_node</tt> structure.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<h4><a name="Self-Reported Quiescent States">
Self-Reported Quiescent States</a></h4>
<p>When all entities that might block the grace period have reported
quiescent states (or as described in a later section, had quiescent
states reported on their behalf), the grace period can end.
Online non-idle CPUs report their own quiescent states, as shown
in the following diagram:
</p><p><img src="TreeRCU-qs.svg" alt="TreeRCU-qs.svg" width="75%">
<p>This is for the last CPU to report a quiescent state, which signals
the end of the grace period.
Earlier quiescent states would push up the <tt>rcu_node</tt> tree
only until they encountered an <tt>rcu_node</tt> structure that
is waiting for additional quiescent states.
However, ordering is nevertheless preserved because some later quiescent
state will acquire that <tt>rcu_node</tt> structure's <tt>-&gt;lock</tt>.
<p>Any number of events can lead up to a CPU invoking
<tt>note_gp_changes</tt> (or alternatively, directly invoking
<tt>__note_gp_changes()</tt>), at which point that CPU will notice
the start of a new grace period while holding its leaf
<tt>rcu_node</tt> lock.
Therefore, all execution shown in this diagram happens after the
start of the grace period.
In addition, this CPU will consider any RCU read-side critical
section that started before the invocation of <tt>__note_gp_changes()</tt>
to have started before the grace period, and thus a critical
section that the grace period must wait on.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
But a RCU read-side critical section might have started
after the beginning of the grace period
(the advancing of <tt>-&gt;gp_seq</tt> from earlier), so why should
the grace period wait on such a critical section?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
It is indeed not necessary for the grace period to wait on such
a critical section.
However, it is permissible to wait on it.
And it is furthermore important to wait on it, as this
lazy approach is far more scalable than a &ldquo;big bang&rdquo;
all-at-once grace-period start could possibly be.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<p>If the CPU does a context switch, a quiescent state will be
noted by <tt>rcu_node_context_switch()</tt> on the left.
On the other hand, if the CPU takes a scheduler-clock interrupt
while executing in usermode, a quiescent state will be noted by
<tt>rcu_sched_clock_irq()</tt> on the right.
Either way, the passage through a quiescent state will be noted
in a per-CPU variable.
<p>The next time an <tt>RCU_SOFTIRQ</tt> handler executes on
this CPU (for example, after the next scheduler-clock
interrupt), <tt>rcu_core()</tt> will invoke
<tt>rcu_check_quiescent_state()</tt>, which will notice the
recorded quiescent state, and invoke
<tt>rcu_report_qs_rdp()</tt>.
If <tt>rcu_report_qs_rdp()</tt> verifies that the quiescent state
really does apply to the current grace period, it invokes
<tt>rcu_report_rnp()</tt> which traverses up the <tt>rcu_node</tt>
tree as shown at the bottom of the diagram, clearing bits from
each <tt>rcu_node</tt> structure's <tt>-&gt;qsmask</tt> field,
and propagating up the tree when the result is zero.
<p>Note that traversal passes upwards out of a given <tt>rcu_node</tt>
structure only if the current CPU is reporting the last quiescent
state for the subtree headed by that <tt>rcu_node</tt> structure.
A key point is that if a CPU's traversal stops at a given <tt>rcu_node</tt>
structure, then there will be a later traversal by another CPU
(or perhaps the same one) that proceeds upwards
from that point, and the <tt>rcu_node</tt> <tt>-&gt;lock</tt>
guarantees that the first CPU's quiescent state happens before the
remainder of the second CPU's traversal.
Applying this line of thought repeatedly shows that all CPUs'
quiescent states happen before the last CPU traverses through
the root <tt>rcu_node</tt> structure, the &ldquo;last CPU&rdquo;
being the one that clears the last bit in the root <tt>rcu_node</tt>
structure's <tt>-&gt;qsmask</tt> field.
<h4><a name="Dynamic Tick Interface">Dynamic Tick Interface</a></h4>
<p>Due to energy-efficiency considerations, RCU is forbidden from
disturbing idle CPUs.
CPUs are therefore required to notify RCU when entering or leaving idle
state, which they do via fully ordered value-returning atomic operations
on a per-CPU variable.
The ordering effects are as shown below:
</p><p><img src="TreeRCU-dyntick.svg" alt="TreeRCU-dyntick.svg" width="50%">
<p>The RCU grace-period kernel thread samples the per-CPU idleness
variable while holding the corresponding CPU's leaf <tt>rcu_node</tt>
structure's <tt>-&gt;lock</tt>.
This means that any RCU read-side critical sections that precede the
idle period (the oval near the top of the diagram above) will happen
before the end of the current grace period.
Similarly, the beginning of the current grace period will happen before
any RCU read-side critical sections that follow the
idle period (the oval near the bottom of the diagram above).
<p>Plumbing this into the full grace-period execution is described
<a href="#Forcing Quiescent States">below</a>.
<h4><a name="CPU-Hotplug Interface">CPU-Hotplug Interface</a></h4>
<p>RCU is also forbidden from disturbing offline CPUs, which might well
be powered off and removed from the system completely.
CPUs are therefore required to notify RCU of their comings and goings
as part of the corresponding CPU hotplug operations.
The ordering effects are shown below:
</p><p><img src="TreeRCU-hotplug.svg" alt="TreeRCU-hotplug.svg" width="50%">
<p>Because CPU hotplug operations are much less frequent than idle transitions,
they are heavier weight, and thus acquire the CPU's leaf <tt>rcu_node</tt>
structure's <tt>-&gt;lock</tt> and update this structure's
<tt>-&gt;qsmaskinitnext</tt>.
The RCU grace-period kernel thread samples this mask to detect CPUs
having gone offline since the beginning of this grace period.
<p>Plumbing this into the full grace-period execution is described
<a href="#Forcing Quiescent States">below</a>.
<h4><a name="Forcing Quiescent States">Forcing Quiescent States</a></h4>
<p>As noted above, idle and offline CPUs cannot report their own
quiescent states, and therefore the grace-period kernel thread
must do the reporting on their behalf.
This process is called &ldquo;forcing quiescent states&rdquo;, it is
repeated every few jiffies, and its ordering effects are shown below:
</p><p><img src="TreeRCU-gp-fqs.svg" alt="TreeRCU-gp-fqs.svg" width="100%">
<p>Each pass of quiescent state forcing is guaranteed to traverse the
leaf <tt>rcu_node</tt> structures, and if there are no new quiescent
states due to recently idled and/or offlined CPUs, then only the
leaves are traversed.
However, if there is a newly offlined CPU as illustrated on the left
or a newly idled CPU as illustrated on the right, the corresponding
quiescent state will be driven up towards the root.
As with self-reported quiescent states, the upwards driving stops
once it reaches an <tt>rcu_node</tt> structure that has quiescent
states outstanding from other CPUs.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
The leftmost drive to root stopped before it reached
the root <tt>rcu_node</tt> structure, which means that
there are still CPUs subordinate to that structure on
which the current grace period is waiting.
Given that, how is it possible that the rightmost drive
to root ended the grace period?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
Good analysis!
It is in fact impossible in the absence of bugs in RCU.
But this diagram is complex enough as it is, so simplicity
overrode accuracy.
You can think of it as poetic license, or you can think of
it as misdirection that is resolved in the
<a href="#Putting It All Together">stitched-together diagram</a>.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<h4><a name="Grace-Period Cleanup">Grace-Period Cleanup</a></h4>
<p>Grace-period cleanup first scans the <tt>rcu_node</tt> tree
breadth-first advancing all the <tt>-&gt;gp_seq</tt> fields, then it
advances the <tt>rcu_state</tt> structure's <tt>-&gt;gp_seq</tt> field.
The ordering effects are shown below:
</p><p><img src="TreeRCU-gp-cleanup.svg" alt="TreeRCU-gp-cleanup.svg" width="75%">
<p>As indicated by the oval at the bottom of the diagram, once
grace-period cleanup is complete, the next grace period can begin.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
But when precisely does the grace period end?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
There is no useful single point at which the grace period
can be said to end.
The earliest reasonable candidate is as soon as the last
CPU has reported its quiescent state, but it may be some
milliseconds before RCU becomes aware of this.
The latest reasonable candidate is once the <tt>rcu_state</tt>
structure's <tt>-&gt;gp_seq</tt> field has been updated,
but it is quite possible that some CPUs have already completed
phase two of their updates by that time.
In short, if you are going to work with RCU, you need to
learn to embrace uncertainty.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<h4><a name="Callback Invocation">Callback Invocation</a></h4>
<p>Once a given CPU's leaf <tt>rcu_node</tt> structure's
<tt>-&gt;gp_seq</tt> field has been updated, that CPU can begin
invoking its RCU callbacks that were waiting for this grace period
to end.
These callbacks are identified by <tt>rcu_advance_cbs()</tt>,
which is usually invoked by <tt>__note_gp_changes()</tt>.
As shown in the diagram below, this invocation can be triggered by
the scheduling-clock interrupt (<tt>rcu_sched_clock_irq()</tt> on
the left) or by idle entry (<tt>rcu_cleanup_after_idle()</tt> on
the right, but only for kernels build with
<tt>CONFIG_RCU_FAST_NO_HZ=y</tt>).
Either way, <tt>RCU_SOFTIRQ</tt> is raised, which results in
<tt>rcu_do_batch()</tt> invoking the callbacks, which in turn
allows those callbacks to carry out (either directly or indirectly
via wakeup) the needed phase-two processing for each update.
</p><p><img src="TreeRCU-callback-invocation.svg" alt="TreeRCU-callback-invocation.svg" width="60%">
<p>Please note that callback invocation can also be prompted by any
number of corner-case code paths, for example, when a CPU notes that
it has excessive numbers of callbacks queued.
In all cases, the CPU acquires its leaf <tt>rcu_node</tt> structure's
<tt>-&gt;lock</tt> before invoking callbacks, which preserves the
required ordering against the newly completed grace period.
<p>However, if the callback function communicates to other CPUs,
for example, doing a wakeup, then it is that function's responsibility
to maintain ordering.
For example, if the callback function wakes up a task that runs on
some other CPU, proper ordering must in place in both the callback
function and the task being awakened.
To see why this is important, consider the top half of the
<a href="#Grace-Period Cleanup">grace-period cleanup</a> diagram.
The callback might be running on a CPU corresponding to the leftmost
leaf <tt>rcu_node</tt> structure, and awaken a task that is to run on
a CPU corresponding to the rightmost leaf <tt>rcu_node</tt> structure,
and the grace-period kernel thread might not yet have reached the
rightmost leaf.
In this case, the grace period's memory ordering might not yet have
reached that CPU, so again the callback function and the awakened
task must supply proper ordering.
<h3><a name="Putting It All Together">Putting It All Together</a></h3>
<p>A stitched-together diagram is
<a href="Tree-RCU-Diagram.html">here</a>.
<h3><a name="Legal Statement">
Legal Statement</a></h3>
<p>This work represents the view of the author and does not necessarily
represent the view of IBM.
</p><p>Linux is a registered trademark of Linus Torvalds.
</p><p>Other company, product, and service names may be trademarks or
service marks of others.
</body></html>

View File

@@ -1,624 +0,0 @@
======================================================
A Tour Through TREE_RCU's Grace-Period Memory Ordering
======================================================
August 8, 2017
This article was contributed by Paul E.&nbsp;McKenney
Introduction
============
This document gives a rough visual overview of how Tree RCU's
grace-period memory ordering guarantee is provided.
What Is Tree RCU's Grace Period Memory Ordering Guarantee?
==========================================================
RCU grace periods provide extremely strong memory-ordering guarantees
for non-idle non-offline code.
Any code that happens after the end of a given RCU grace period is guaranteed
to see the effects of all accesses prior to the beginning of that grace
period that are within RCU read-side critical sections.
Similarly, any code that happens before the beginning of a given RCU grace
period is guaranteed to see the effects of all accesses following the end
of that grace period that are within RCU read-side critical sections.
Note well that RCU-sched read-side critical sections include any region
of code for which preemption is disabled.
Given that each individual machine instruction can be thought of as
an extremely small region of preemption-disabled code, one can think of
``synchronize_rcu()`` as ``smp_mb()`` on steroids.
RCU updaters use this guarantee by splitting their updates into
two phases, one of which is executed before the grace period and
the other of which is executed after the grace period.
In the most common use case, phase one removes an element from
a linked RCU-protected data structure, and phase two frees that element.
For this to work, any readers that have witnessed state prior to the
phase-one update (in the common case, removal) must not witness state
following the phase-two update (in the common case, freeing).
The RCU implementation provides this guarantee using a network
of lock-based critical sections, memory barriers, and per-CPU
processing, as is described in the following sections.
Tree RCU Grace Period Memory Ordering Building Blocks
=====================================================
The workhorse for RCU's grace-period memory ordering is the
critical section for the ``rcu_node`` structure's
``-&gt;lock``. These critical sections use helper functions for lock
acquisition, including ``raw_spin_lock_rcu_node()``,
``raw_spin_lock_irq_rcu_node()``, and ``raw_spin_lock_irqsave_rcu_node()``.
Their lock-release counterparts are ``raw_spin_unlock_rcu_node()``,
``raw_spin_unlock_irq_rcu_node()``, and
``raw_spin_unlock_irqrestore_rcu_node()``, respectively.
For completeness, a ``raw_spin_trylock_rcu_node()`` is also provided.
The key point is that the lock-acquisition functions, including
``raw_spin_trylock_rcu_node()``, all invoke ``smp_mb__after_unlock_lock()``
immediately after successful acquisition of the lock.
Therefore, for any given ``rcu_node`` structure, any access
happening before one of the above lock-release functions will be seen
by all CPUs as happening before any access happening after a later
one of the above lock-acquisition functions.
Furthermore, any access happening before one of the
above lock-release function on any given CPU will be seen by all
CPUs as happening before any access happening after a later one
of the above lock-acquisition functions executing on that same CPU,
even if the lock-release and lock-acquisition functions are operating
on different ``rcu_node`` structures.
Tree RCU uses these two ordering guarantees to form an ordering
network among all CPUs that were in any way involved in the grace
period, including any CPUs that came online or went offline during
the grace period in question.
The following litmus test exhibits the ordering effects of these
lock-acquisition and lock-release functions::
1 int x, y, z;
2
3 void task0(void)
4 {
5 raw_spin_lock_rcu_node(rnp);
6 WRITE_ONCE(x, 1);
7 r1 = READ_ONCE(y);
8 raw_spin_unlock_rcu_node(rnp);
9 }
10
11 void task1(void)
12 {
13 raw_spin_lock_rcu_node(rnp);
14 WRITE_ONCE(y, 1);
15 r2 = READ_ONCE(z);
16 raw_spin_unlock_rcu_node(rnp);
17 }
18
19 void task2(void)
20 {
21 WRITE_ONCE(z, 1);
22 smp_mb();
23 r3 = READ_ONCE(x);
24 }
25
26 WARN_ON(r1 == 0 &amp;&amp; r2 == 0 &amp;&amp; r3 == 0);
The ``WARN_ON()`` is evaluated at &ldquo;the end of time&rdquo;,
after all changes have propagated throughout the system.
Without the ``smp_mb__after_unlock_lock()`` provided by the
acquisition functions, this ``WARN_ON()`` could trigger, for example
on PowerPC.
The ``smp_mb__after_unlock_lock()`` invocations prevent this
``WARN_ON()`` from triggering.
This approach must be extended to include idle CPUs, which need
RCU's grace-period memory ordering guarantee to extend to any
RCU read-side critical sections preceding and following the current
idle sojourn.
This case is handled by calls to the strongly ordered
``atomic_add_return()`` read-modify-write atomic operation that
is invoked within ``rcu_dynticks_eqs_enter()`` at idle-entry
time and within ``rcu_dynticks_eqs_exit()`` at idle-exit time.
The grace-period kthread invokes ``rcu_dynticks_snap()`` and
``rcu_dynticks_in_eqs_since()`` (both of which invoke
an ``atomic_add_return()`` of zero) to detect idle CPUs.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| But what about CPUs that remain offline for the entire grace period? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| Such CPUs will be offline at the beginning of the grace period, so |
| the grace period won't expect quiescent states from them. Races |
| between grace-period start and CPU-hotplug operations are mediated |
| by the CPU's leaf ``rcu_node`` structure's ``->lock`` as described |
| above. |
+-----------------------------------------------------------------------+
The approach must be extended to handle one final case, that of waking a
task blocked in ``synchronize_rcu()``. This task might be affinitied to
a CPU that is not yet aware that the grace period has ended, and thus
might not yet be subject to the grace period's memory ordering.
Therefore, there is an ``smp_mb()`` after the return from
``wait_for_completion()`` in the ``synchronize_rcu()`` code path.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| What? Where??? I don't see any ``smp_mb()`` after the return from |
| ``wait_for_completion()``!!! |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| That would be because I spotted the need for that ``smp_mb()`` during |
| the creation of this documentation, and it is therefore unlikely to |
| hit mainline before v4.14. Kudos to Lance Roy, Will Deacon, Peter |
| Zijlstra, and Jonathan Cameron for asking questions that sensitized |
| me to the rather elaborate sequence of events that demonstrate the |
| need for this memory barrier. |
+-----------------------------------------------------------------------+
Tree RCU's grace--period memory-ordering guarantees rely most heavily on
the ``rcu_node`` structure's ``->lock`` field, so much so that it is
necessary to abbreviate this pattern in the diagrams in the next
section. For example, consider the ``rcu_prepare_for_idle()`` function
shown below, which is one of several functions that enforce ordering of
newly arrived RCU callbacks against future grace periods:
::
1 static void rcu_prepare_for_idle(void)
2 {
3 bool needwake;
4 struct rcu_data *rdp;
5 struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
6 struct rcu_node *rnp;
7 struct rcu_state *rsp;
8 int tne;
9
10 if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_ALL) ||
11 rcu_is_nocb_cpu(smp_processor_id()))
12 return;
13 tne = READ_ONCE(tick_nohz_active);
14 if (tne != rdtp->tick_nohz_enabled_snap) {
15 if (rcu_cpu_has_callbacks(NULL))
16 invoke_rcu_core();
17 rdtp->tick_nohz_enabled_snap = tne;
18 return;
19 }
20 if (!tne)
21 return;
22 if (rdtp->all_lazy &&
23 rdtp->nonlazy_posted != rdtp->nonlazy_posted_snap) {
24 rdtp->all_lazy = false;
25 rdtp->nonlazy_posted_snap = rdtp->nonlazy_posted;
26 invoke_rcu_core();
27 return;
28 }
29 if (rdtp->last_accelerate == jiffies)
30 return;
31 rdtp->last_accelerate = jiffies;
32 for_each_rcu_flavor(rsp) {
33 rdp = this_cpu_ptr(rsp->rda);
34 if (rcu_segcblist_pend_cbs(&rdp->cblist))
35 continue;
36 rnp = rdp->mynode;
37 raw_spin_lock_rcu_node(rnp);
38 needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
39 raw_spin_unlock_rcu_node(rnp);
40 if (needwake)
41 rcu_gp_kthread_wake(rsp);
42 }
43 }
But the only part of ``rcu_prepare_for_idle()`` that really matters for
this discussion are lines 3739. We will therefore abbreviate this
function as follows:
.. kernel-figure:: rcu_node-lock.svg
The box represents the ``rcu_node`` structure's ``->lock`` critical
section, with the double line on top representing the additional
``smp_mb__after_unlock_lock()``.
Tree RCU Grace Period Memory Ordering Components
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tree RCU's grace-period memory-ordering guarantee is provided by a
number of RCU components:
#. `Callback Registry`_
#. `Grace-Period Initialization`_
#. `Self-Reported Quiescent States`_
#. `Dynamic Tick Interface`_
#. `CPU-Hotplug Interface`_
#. `Forcing Quiescent States`_
#. `Grace-Period Cleanup`_
#. `Callback Invocation`_
Each of the following section looks at the corresponding component in
detail.
Callback Registry
^^^^^^^^^^^^^^^^^
If RCU's grace-period guarantee is to mean anything at all, any access
that happens before a given invocation of ``call_rcu()`` must also
happen before the corresponding grace period. The implementation of this
portion of RCU's grace period guarantee is shown in the following
figure:
.. kernel-figure:: TreeRCU-callback-registry.svg
Because ``call_rcu()`` normally acts only on CPU-local state, it
provides no ordering guarantees, either for itself or for phase one of
the update (which again will usually be removal of an element from an
RCU-protected data structure). It simply enqueues the ``rcu_head``
structure on a per-CPU list, which cannot become associated with a grace
period until a later call to ``rcu_accelerate_cbs()``, as shown in the
diagram above.
One set of code paths shown on the left invokes ``rcu_accelerate_cbs()``
via ``note_gp_changes()``, either directly from ``call_rcu()`` (if the
current CPU is inundated with queued ``rcu_head`` structures) or more
likely from an ``RCU_SOFTIRQ`` handler. Another code path in the middle
is taken only in kernels built with ``CONFIG_RCU_FAST_NO_HZ=y``, which
invokes ``rcu_accelerate_cbs()`` via ``rcu_prepare_for_idle()``. The
final code path on the right is taken only in kernels built with
``CONFIG_HOTPLUG_CPU=y``, which invokes ``rcu_accelerate_cbs()`` via
``rcu_advance_cbs()``, ``rcu_migrate_callbacks``,
``rcutree_migrate_callbacks()``, and ``takedown_cpu()``, which in turn
is invoked on a surviving CPU after the outgoing CPU has been completely
offlined.
There are a few other code paths within grace-period processing that
opportunistically invoke ``rcu_accelerate_cbs()``. However, either way,
all of the CPU's recently queued ``rcu_head`` structures are associated
with a future grace-period number under the protection of the CPU's lead
``rcu_node`` structure's ``->lock``. In all cases, there is full
ordering against any prior critical section for that same ``rcu_node``
structure's ``->lock``, and also full ordering against any of the
current task's or CPU's prior critical sections for any ``rcu_node``
structure's ``->lock``.
The next section will show how this ordering ensures that any accesses
prior to the ``call_rcu()`` (particularly including phase one of the
update) happen before the start of the corresponding grace period.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| But what about ``synchronize_rcu()``? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| The ``synchronize_rcu()`` passes ``call_rcu()`` to ``wait_rcu_gp()``, |
| which invokes it. So either way, it eventually comes down to |
| ``call_rcu()``. |
+-----------------------------------------------------------------------+
Grace-Period Initialization
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Grace-period initialization is carried out by the grace-period kernel
thread, which makes several passes over the ``rcu_node`` tree within the
``rcu_gp_init()`` function. This means that showing the full flow of
ordering through the grace-period computation will require duplicating
this tree. If you find this confusing, please note that the state of the
``rcu_node`` changes over time, just like Heraclitus's river. However,
to keep the ``rcu_node`` river tractable, the grace-period kernel
thread's traversals are presented in multiple parts, starting in this
section with the various phases of grace-period initialization.
The first ordering-related grace-period initialization action is to
advance the ``rcu_state`` structure's ``->gp_seq`` grace-period-number
counter, as shown below:
.. kernel-figure:: TreeRCU-gp-init-1.svg
The actual increment is carried out using ``smp_store_release()``, which
helps reject false-positive RCU CPU stall detection. Note that only the
root ``rcu_node`` structure is touched.
The first pass through the ``rcu_node`` tree updates bitmasks based on
CPUs having come online or gone offline since the start of the previous
grace period. In the common case where the number of online CPUs for
this ``rcu_node`` structure has not transitioned to or from zero, this
pass will scan only the leaf ``rcu_node`` structures. However, if the
number of online CPUs for a given leaf ``rcu_node`` structure has
transitioned from zero, ``rcu_init_new_rnp()`` will be invoked for the
first incoming CPU. Similarly, if the number of online CPUs for a given
leaf ``rcu_node`` structure has transitioned to zero,
``rcu_cleanup_dead_rnp()`` will be invoked for the last outgoing CPU.
The diagram below shows the path of ordering if the leftmost
``rcu_node`` structure onlines its first CPU and if the next
``rcu_node`` structure has no online CPUs (or, alternatively if the
leftmost ``rcu_node`` structure offlines its last CPU and if the next
``rcu_node`` structure has no online CPUs).
.. kernel-figure:: TreeRCU-gp-init-1.svg
The final ``rcu_gp_init()`` pass through the ``rcu_node`` tree traverses
breadth-first, setting each ``rcu_node`` structure's ``->gp_seq`` field
to the newly advanced value from the ``rcu_state`` structure, as shown
in the following diagram.
.. kernel-figure:: TreeRCU-gp-init-1.svg
This change will also cause each CPU's next call to
``__note_gp_changes()`` to notice that a new grace period has started,
as described in the next section. But because the grace-period kthread
started the grace period at the root (with the advancing of the
``rcu_state`` structure's ``->gp_seq`` field) before setting each leaf
``rcu_node`` structure's ``->gp_seq`` field, each CPU's observation of
the start of the grace period will happen after the actual start of the
grace period.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| But what about the CPU that started the grace period? Why wouldn't it |
| see the start of the grace period right when it started that grace |
| period? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| In some deep philosophical and overly anthromorphized sense, yes, the |
| CPU starting the grace period is immediately aware of having done so. |
| However, if we instead assume that RCU is not self-aware, then even |
| the CPU starting the grace period does not really become aware of the |
| start of this grace period until its first call to |
| ``__note_gp_changes()``. On the other hand, this CPU potentially gets |
| early notification because it invokes ``__note_gp_changes()`` during |
| its last ``rcu_gp_init()`` pass through its leaf ``rcu_node`` |
| structure. |
+-----------------------------------------------------------------------+
Self-Reported Quiescent States
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When all entities that might block the grace period have reported
quiescent states (or as described in a later section, had quiescent
states reported on their behalf), the grace period can end. Online
non-idle CPUs report their own quiescent states, as shown in the
following diagram:
.. kernel-figure:: TreeRCU-qs.svg
This is for the last CPU to report a quiescent state, which signals the
end of the grace period. Earlier quiescent states would push up the
``rcu_node`` tree only until they encountered an ``rcu_node`` structure
that is waiting for additional quiescent states. However, ordering is
nevertheless preserved because some later quiescent state will acquire
that ``rcu_node`` structure's ``->lock``.
Any number of events can lead up to a CPU invoking ``note_gp_changes``
(or alternatively, directly invoking ``__note_gp_changes()``), at which
point that CPU will notice the start of a new grace period while holding
its leaf ``rcu_node`` lock. Therefore, all execution shown in this
diagram happens after the start of the grace period. In addition, this
CPU will consider any RCU read-side critical section that started before
the invocation of ``__note_gp_changes()`` to have started before the
grace period, and thus a critical section that the grace period must
wait on.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| But a RCU read-side critical section might have started after the |
| beginning of the grace period (the advancing of ``->gp_seq`` from |
| earlier), so why should the grace period wait on such a critical |
| section? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| It is indeed not necessary for the grace period to wait on such a |
| critical section. However, it is permissible to wait on it. And it is |
| furthermore important to wait on it, as this lazy approach is far |
| more scalable than a “big bang” all-at-once grace-period start could |
| possibly be. |
+-----------------------------------------------------------------------+
If the CPU does a context switch, a quiescent state will be noted by
``rcu_note_context_switch()`` on the left. On the other hand, if the CPU
takes a scheduler-clock interrupt while executing in usermode, a
quiescent state will be noted by ``rcu_sched_clock_irq()`` on the right.
Either way, the passage through a quiescent state will be noted in a
per-CPU variable.
The next time an ``RCU_SOFTIRQ`` handler executes on this CPU (for
example, after the next scheduler-clock interrupt), ``rcu_core()`` will
invoke ``rcu_check_quiescent_state()``, which will notice the recorded
quiescent state, and invoke ``rcu_report_qs_rdp()``. If
``rcu_report_qs_rdp()`` verifies that the quiescent state really does
apply to the current grace period, it invokes ``rcu_report_rnp()`` which
traverses up the ``rcu_node`` tree as shown at the bottom of the
diagram, clearing bits from each ``rcu_node`` structure's ``->qsmask``
field, and propagating up the tree when the result is zero.
Note that traversal passes upwards out of a given ``rcu_node`` structure
only if the current CPU is reporting the last quiescent state for the
subtree headed by that ``rcu_node`` structure. A key point is that if a
CPU's traversal stops at a given ``rcu_node`` structure, then there will
be a later traversal by another CPU (or perhaps the same one) that
proceeds upwards from that point, and the ``rcu_node`` ``->lock``
guarantees that the first CPU's quiescent state happens before the
remainder of the second CPU's traversal. Applying this line of thought
repeatedly shows that all CPUs' quiescent states happen before the last
CPU traverses through the root ``rcu_node`` structure, the “last CPU”
being the one that clears the last bit in the root ``rcu_node``
structure's ``->qsmask`` field.
Dynamic Tick Interface
^^^^^^^^^^^^^^^^^^^^^^
Due to energy-efficiency considerations, RCU is forbidden from
disturbing idle CPUs. CPUs are therefore required to notify RCU when
entering or leaving idle state, which they do via fully ordered
value-returning atomic operations on a per-CPU variable. The ordering
effects are as shown below:
.. kernel-figure:: TreeRCU-dyntick.svg
The RCU grace-period kernel thread samples the per-CPU idleness variable
while holding the corresponding CPU's leaf ``rcu_node`` structure's
``->lock``. This means that any RCU read-side critical sections that
precede the idle period (the oval near the top of the diagram above)
will happen before the end of the current grace period. Similarly, the
beginning of the current grace period will happen before any RCU
read-side critical sections that follow the idle period (the oval near
the bottom of the diagram above).
Plumbing this into the full grace-period execution is described
`below <#Forcing%20Quiescent%20States>`__.
CPU-Hotplug Interface
^^^^^^^^^^^^^^^^^^^^^
RCU is also forbidden from disturbing offline CPUs, which might well be
powered off and removed from the system completely. CPUs are therefore
required to notify RCU of their comings and goings as part of the
corresponding CPU hotplug operations. The ordering effects are shown
below:
.. kernel-figure:: TreeRCU-hotplug.svg
Because CPU hotplug operations are much less frequent than idle
transitions, they are heavier weight, and thus acquire the CPU's leaf
``rcu_node`` structure's ``->lock`` and update this structure's
``->qsmaskinitnext``. The RCU grace-period kernel thread samples this
mask to detect CPUs having gone offline since the beginning of this
grace period.
Plumbing this into the full grace-period execution is described
`below <#Forcing%20Quiescent%20States>`__.
Forcing Quiescent States
^^^^^^^^^^^^^^^^^^^^^^^^
As noted above, idle and offline CPUs cannot report their own quiescent
states, and therefore the grace-period kernel thread must do the
reporting on their behalf. This process is called “forcing quiescent
states”, it is repeated every few jiffies, and its ordering effects are
shown below:
.. kernel-figure:: TreeRCU-gp-fqs.svg
Each pass of quiescent state forcing is guaranteed to traverse the leaf
``rcu_node`` structures, and if there are no new quiescent states due to
recently idled and/or offlined CPUs, then only the leaves are traversed.
However, if there is a newly offlined CPU as illustrated on the left or
a newly idled CPU as illustrated on the right, the corresponding
quiescent state will be driven up towards the root. As with
self-reported quiescent states, the upwards driving stops once it
reaches an ``rcu_node`` structure that has quiescent states outstanding
from other CPUs.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| The leftmost drive to root stopped before it reached the root |
| ``rcu_node`` structure, which means that there are still CPUs |
| subordinate to that structure on which the current grace period is |
| waiting. Given that, how is it possible that the rightmost drive to |
| root ended the grace period? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| Good analysis! It is in fact impossible in the absence of bugs in |
| RCU. But this diagram is complex enough as it is, so simplicity |
| overrode accuracy. You can think of it as poetic license, or you can |
| think of it as misdirection that is resolved in the |
| `stitched-together diagram <#Putting%20It%20All%20Together>`__. |
+-----------------------------------------------------------------------+
Grace-Period Cleanup
^^^^^^^^^^^^^^^^^^^^
Grace-period cleanup first scans the ``rcu_node`` tree breadth-first
advancing all the ``->gp_seq`` fields, then it advances the
``rcu_state`` structure's ``->gp_seq`` field. The ordering effects are
shown below:
.. kernel-figure:: TreeRCU-gp-cleanup.svg
As indicated by the oval at the bottom of the diagram, once grace-period
cleanup is complete, the next grace period can begin.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| But when precisely does the grace period end? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| There is no useful single point at which the grace period can be said |
| to end. The earliest reasonable candidate is as soon as the last CPU |
| has reported its quiescent state, but it may be some milliseconds |
| before RCU becomes aware of this. The latest reasonable candidate is |
| once the ``rcu_state`` structure's ``->gp_seq`` field has been |
| updated, but it is quite possible that some CPUs have already |
| completed phase two of their updates by that time. In short, if you |
| are going to work with RCU, you need to learn to embrace uncertainty. |
+-----------------------------------------------------------------------+
Callback Invocation
^^^^^^^^^^^^^^^^^^^
Once a given CPU's leaf ``rcu_node`` structure's ``->gp_seq`` field has
been updated, that CPU can begin invoking its RCU callbacks that were
waiting for this grace period to end. These callbacks are identified by
``rcu_advance_cbs()``, which is usually invoked by
``__note_gp_changes()``. As shown in the diagram below, this invocation
can be triggered by the scheduling-clock interrupt
(``rcu_sched_clock_irq()`` on the left) or by idle entry
(``rcu_cleanup_after_idle()`` on the right, but only for kernels build
with ``CONFIG_RCU_FAST_NO_HZ=y``). Either way, ``RCU_SOFTIRQ`` is
raised, which results in ``rcu_do_batch()`` invoking the callbacks,
which in turn allows those callbacks to carry out (either directly or
indirectly via wakeup) the needed phase-two processing for each update.
.. kernel-figure:: TreeRCU-callback-invocation.svg
Please note that callback invocation can also be prompted by any number
of corner-case code paths, for example, when a CPU notes that it has
excessive numbers of callbacks queued. In all cases, the CPU acquires
its leaf ``rcu_node`` structure's ``->lock`` before invoking callbacks,
which preserves the required ordering against the newly completed grace
period.
However, if the callback function communicates to other CPUs, for
example, doing a wakeup, then it is that function's responsibility to
maintain ordering. For example, if the callback function wakes up a task
that runs on some other CPU, proper ordering must in place in both the
callback function and the task being awakened. To see why this is
important, consider the top half of the `grace-period
cleanup <#Grace-Period%20Cleanup>`__ diagram. The callback might be
running on a CPU corresponding to the leftmost leaf ``rcu_node``
structure, and awaken a task that is to run on a CPU corresponding to
the rightmost leaf ``rcu_node`` structure, and the grace-period kernel
thread might not yet have reached the rightmost leaf. In this case, the
grace period's memory ordering might not yet have reached that CPU, so
again the callback function and the awakened task must supply proper
ordering.
Putting It All Together
~~~~~~~~~~~~~~~~~~~~~~~
A stitched-together diagram is here:
.. kernel-figure:: TreeRCU-gp.svg
Legal Statement
~~~~~~~~~~~~~~~
This work represents the view of the author and does not necessarily
represent the view of IBM.
Linux is a registered trademark of Linus Torvalds.
Other company, product, and service names may be trademarks or service
marks of others.

View File

@@ -3880,7 +3880,7 @@
font-style="normal"
y="-4418.6582"
x="3745.7725"
xml:space="preserve">rcu_note_context_switch()</text>
xml:space="preserve">rcu_node_context_switch()</text>
</g>
<g
transform="translate(1881.1886,54048.57)"

Before

Width:  |  Height:  |  Size: 209 KiB

After

Width:  |  Height:  |  Size: 209 KiB

View File

@@ -753,7 +753,7 @@
font-style="normal"
y="-4418.6582"
x="3745.7725"
xml:space="preserve">rcu_note_context_switch()</text>
xml:space="preserve">rcu_node_context_switch()</text>
</g>
<g
transform="translate(3131.2648,-585.6713)"

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 43 KiB

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,124 +0,0 @@
.. _NMI_rcu_doc:
Using RCU to Protect Dynamic NMI Handlers
=========================================
Although RCU is usually used to protect read-mostly data structures,
it is possible to use RCU to provide dynamic non-maskable interrupt
handlers, as well as dynamic irq handlers. This document describes
how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
work in "arch/x86/oprofile/nmi_timer_int.c" and in
"arch/x86/kernel/traps.c".
The relevant pieces of code are listed below, each followed by a
brief explanation::
static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
{
return 0;
}
The dummy_nmi_callback() function is a "dummy" NMI handler that does
nothing, but returns zero, thus saying that it did nothing, allowing
the NMI handler to take the default machine-specific action::
static nmi_callback_t nmi_callback = dummy_nmi_callback;
This nmi_callback variable is a global function pointer to the current
NMI handler::
void do_nmi(struct pt_regs * regs, long error_code)
{
int cpu;
nmi_enter();
cpu = smp_processor_id();
++nmi_count(cpu);
if (!rcu_dereference_sched(nmi_callback)(regs, cpu))
default_do_nmi(regs);
nmi_exit();
}
The do_nmi() function processes each NMI. It first disables preemption
in the same way that a hardware irq would, then increments the per-CPU
count of NMIs. It then invokes the NMI handler stored in the nmi_callback
function pointer. If this handler returns zero, do_nmi() invokes the
default_do_nmi() function to handle a machine-specific NMI. Finally,
preemption is restored.
In theory, rcu_dereference_sched() is not needed, since this code runs
only on i386, which in theory does not need rcu_dereference_sched()
anyway. However, in practice it is a good documentation aid, particularly
for anyone attempting to do something similar on Alpha or on systems
with aggressive optimizing compilers.
Quick Quiz:
Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only?
:ref:`Answer to Quick Quiz <answer_quick_quiz_NMI>`
Back to the discussion of NMI and RCU::
void set_nmi_callback(nmi_callback_t callback)
{
rcu_assign_pointer(nmi_callback, callback);
}
The set_nmi_callback() function registers an NMI handler. Note that any
data that is to be used by the callback must be initialized up -before-
the call to set_nmi_callback(). On architectures that do not order
writes, the rcu_assign_pointer() ensures that the NMI handler sees the
initialized values::
void unset_nmi_callback(void)
{
rcu_assign_pointer(nmi_callback, dummy_nmi_callback);
}
This function unregisters an NMI handler, restoring the original
dummy_nmi_handler(). However, there may well be an NMI handler
currently executing on some other CPU. We therefore cannot free
up any data structures used by the old NMI handler until execution
of it completes on all other CPUs.
One way to accomplish this is via synchronize_rcu(), perhaps as
follows::
unset_nmi_callback();
synchronize_rcu();
kfree(my_nmi_data);
This works because (as of v4.20) synchronize_rcu() blocks until all
CPUs complete any preemption-disabled segments of code that they were
executing.
Since NMI handlers disable preemption, synchronize_rcu() is guaranteed
not to return until all ongoing NMI handlers exit. It is therefore safe
to free up the handler's data as soon as synchronize_rcu() returns.
Important note: for this to work, the architecture in question must
invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively.
.. _answer_quick_quiz_NMI:
Answer to Quick Quiz:
Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only?
The caller to set_nmi_callback() might well have
initialized some data that is to be used by the new NMI
handler. In this case, the rcu_dereference_sched() would
be needed, because otherwise a CPU that received an NMI
just after the new handler was set might see the pointer
to the new NMI handler, but the old pre-initialized
version of the handler's data.
This same sad story can happen on other CPUs when using
a compiler with aggressive pointer-value speculation
optimizations.
More important, the rcu_dereference_sched() makes it
clear to someone reading the code that the pointer is
being protected by RCU-sched.

View File

@@ -0,0 +1,121 @@
Using RCU to Protect Dynamic NMI Handlers
Although RCU is usually used to protect read-mostly data structures,
it is possible to use RCU to provide dynamic non-maskable interrupt
handlers, as well as dynamic irq handlers. This document describes
how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
work in "arch/x86/oprofile/nmi_timer_int.c" and in
"arch/x86/kernel/traps.c".
The relevant pieces of code are listed below, each followed by a
brief explanation.
static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
{
return 0;
}
The dummy_nmi_callback() function is a "dummy" NMI handler that does
nothing, but returns zero, thus saying that it did nothing, allowing
the NMI handler to take the default machine-specific action.
static nmi_callback_t nmi_callback = dummy_nmi_callback;
This nmi_callback variable is a global function pointer to the current
NMI handler.
void do_nmi(struct pt_regs * regs, long error_code)
{
int cpu;
nmi_enter();
cpu = smp_processor_id();
++nmi_count(cpu);
if (!rcu_dereference_sched(nmi_callback)(regs, cpu))
default_do_nmi(regs);
nmi_exit();
}
The do_nmi() function processes each NMI. It first disables preemption
in the same way that a hardware irq would, then increments the per-CPU
count of NMIs. It then invokes the NMI handler stored in the nmi_callback
function pointer. If this handler returns zero, do_nmi() invokes the
default_do_nmi() function to handle a machine-specific NMI. Finally,
preemption is restored.
In theory, rcu_dereference_sched() is not needed, since this code runs
only on i386, which in theory does not need rcu_dereference_sched()
anyway. However, in practice it is a good documentation aid, particularly
for anyone attempting to do something similar on Alpha or on systems
with aggressive optimizing compilers.
Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha,
given that the code referenced by the pointer is read-only?
Back to the discussion of NMI and RCU...
void set_nmi_callback(nmi_callback_t callback)
{
rcu_assign_pointer(nmi_callback, callback);
}
The set_nmi_callback() function registers an NMI handler. Note that any
data that is to be used by the callback must be initialized up -before-
the call to set_nmi_callback(). On architectures that do not order
writes, the rcu_assign_pointer() ensures that the NMI handler sees the
initialized values.
void unset_nmi_callback(void)
{
rcu_assign_pointer(nmi_callback, dummy_nmi_callback);
}
This function unregisters an NMI handler, restoring the original
dummy_nmi_handler(). However, there may well be an NMI handler
currently executing on some other CPU. We therefore cannot free
up any data structures used by the old NMI handler until execution
of it completes on all other CPUs.
One way to accomplish this is via synchronize_rcu(), perhaps as
follows:
unset_nmi_callback();
synchronize_rcu();
kfree(my_nmi_data);
This works because (as of v4.20) synchronize_rcu() blocks until all
CPUs complete any preemption-disabled segments of code that they were
executing.
Since NMI handlers disable preemption, synchronize_rcu() is guaranteed
not to return until all ongoing NMI handlers exit. It is therefore safe
to free up the handler's data as soon as synchronize_rcu() returns.
Important note: for this to work, the architecture in question must
invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively.
Answer to Quick Quiz
Why might the rcu_dereference_sched() be necessary on Alpha, given
that the code referenced by the pointer is read-only?
Answer: The caller to set_nmi_callback() might well have
initialized some data that is to be used by the new NMI
handler. In this case, the rcu_dereference_sched() would
be needed, because otherwise a CPU that received an NMI
just after the new handler was set might see the pointer
to the new NMI handler, but the old pre-initialized
version of the handler's data.
This same sad story can happen on other CPUs when using
a compiler with aggressive pointer-value speculation
optimizations.
More important, the rcu_dereference_sched() makes it
clear to someone reading the code that the pointer is
being protected by RCU-sched.

View File

@@ -1,165 +0,0 @@
.. _array_rcu_doc:
Using RCU to Protect Read-Mostly Arrays
=======================================
Although RCU is more commonly used to protect linked lists, it can
also be used to protect arrays. Three situations are as follows:
1. :ref:`Hash Tables <hash_tables>`
2. :ref:`Static Arrays <static_arrays>`
3. :ref:`Resizable Arrays <resizable_arrays>`
Each of these three situations involves an RCU-protected pointer to an
array that is separately indexed. It might be tempting to consider use
of RCU to instead protect the index into an array, however, this use
case is **not** supported. The problem with RCU-protected indexes into
arrays is that compilers can play way too many optimization games with
integers, which means that the rules governing handling of these indexes
are far more trouble than they are worth. If RCU-protected indexes into
arrays prove to be particularly valuable (which they have not thus far),
explicit cooperation from the compiler will be required to permit them
to be safely used.
That aside, each of the three RCU-protected pointer situations are
described in the following sections.
.. _hash_tables:
Situation 1: Hash Tables
------------------------
Hash tables are often implemented as an array, where each array entry
has a linked-list hash chain. Each hash chain can be protected by RCU
as described in the listRCU.txt document. This approach also applies
to other array-of-list situations, such as radix trees.
.. _static_arrays:
Situation 2: Static Arrays
--------------------------
Static arrays, where the data (rather than a pointer to the data) is
located in each array element, and where the array is never resized,
have not been used with RCU. Rik van Riel recommends using seqlock in
this situation, which would also have minimal read-side overhead as long
as updates are rare.
Quick Quiz:
Why is it so important that updates be rare when using seqlock?
:ref:`Answer to Quick Quiz <answer_quick_quiz_seqlock>`
.. _resizable_arrays:
Situation 3: Resizable Arrays
------------------------------
Use of RCU for resizable arrays is demonstrated by the grow_ary()
function formerly used by the System V IPC code. The array is used
to map from semaphore, message-queue, and shared-memory IDs to the data
structure that represents the corresponding IPC construct. The grow_ary()
function does not acquire any locks; instead its caller must hold the
ids->sem semaphore.
The grow_ary() function, shown below, does some limit checks, allocates a
new ipc_id_ary, copies the old to the new portion of the new, initializes
the remainder of the new, updates the ids->entries pointer to point to
the new array, and invokes ipc_rcu_putref() to free up the old array.
Note that rcu_assign_pointer() is used to update the ids->entries pointer,
which includes any memory barriers required on whatever architecture
you are running on::
static int grow_ary(struct ipc_ids* ids, int newsize)
{
struct ipc_id_ary* new;
struct ipc_id_ary* old;
int i;
int size = ids->entries->size;
if(newsize > IPCMNI)
newsize = IPCMNI;
if(newsize <= size)
return newsize;
new = ipc_rcu_alloc(sizeof(struct kern_ipc_perm *)*newsize +
sizeof(struct ipc_id_ary));
if(new == NULL)
return size;
new->size = newsize;
memcpy(new->p, ids->entries->p,
sizeof(struct kern_ipc_perm *)*size +
sizeof(struct ipc_id_ary));
for(i=size;i<newsize;i++) {
new->p[i] = NULL;
}
old = ids->entries;
/*
* Use rcu_assign_pointer() to make sure the memcpyed
* contents of the new array are visible before the new
* array becomes visible.
*/
rcu_assign_pointer(ids->entries, new);
ipc_rcu_putref(old);
return newsize;
}
The ipc_rcu_putref() function decrements the array's reference count
and then, if the reference count has dropped to zero, uses call_rcu()
to free the array after a grace period has elapsed.
The array is traversed by the ipc_lock() function. This function
indexes into the array under the protection of rcu_read_lock(),
using rcu_dereference() to pick up the pointer to the array so
that it may later safely be dereferenced -- memory barriers are
required on the Alpha CPU. Since the size of the array is stored
with the array itself, there can be no array-size mismatches, so
a simple check suffices. The pointer to the structure corresponding
to the desired IPC object is placed in "out", with NULL indicating
a non-existent entry. After acquiring "out->lock", the "out->deleted"
flag indicates whether the IPC object is in the process of being
deleted, and, if not, the pointer is returned::
struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
{
struct kern_ipc_perm* out;
int lid = id % SEQ_MULTIPLIER;
struct ipc_id_ary* entries;
rcu_read_lock();
entries = rcu_dereference(ids->entries);
if(lid >= entries->size) {
rcu_read_unlock();
return NULL;
}
out = entries->p[lid];
if(out == NULL) {
rcu_read_unlock();
return NULL;
}
spin_lock(&out->lock);
/* ipc_rmid() may have already freed the ID while ipc_lock
* was spinning: here verify that the structure is still valid
*/
if (out->deleted) {
spin_unlock(&out->lock);
rcu_read_unlock();
return NULL;
}
return out;
}
.. _answer_quick_quiz_seqlock:
Answer to Quick Quiz:
Why is it so important that updates be rare when using seqlock?
The reason that it is important that updates be rare when
using seqlock is that frequent updates can livelock readers.
One way to avoid this problem is to assign a seqlock for
each array entry rather than to the entire array.

View File

@@ -0,0 +1,153 @@
Using RCU to Protect Read-Mostly Arrays
Although RCU is more commonly used to protect linked lists, it can
also be used to protect arrays. Three situations are as follows:
1. Hash Tables
2. Static Arrays
3. Resizeable Arrays
Each of these three situations involves an RCU-protected pointer to an
array that is separately indexed. It might be tempting to consider use
of RCU to instead protect the index into an array, however, this use
case is -not- supported. The problem with RCU-protected indexes into
arrays is that compilers can play way too many optimization games with
integers, which means that the rules governing handling of these indexes
are far more trouble than they are worth. If RCU-protected indexes into
arrays prove to be particularly valuable (which they have not thus far),
explicit cooperation from the compiler will be required to permit them
to be safely used.
That aside, each of the three RCU-protected pointer situations are
described in the following sections.
Situation 1: Hash Tables
Hash tables are often implemented as an array, where each array entry
has a linked-list hash chain. Each hash chain can be protected by RCU
as described in the listRCU.txt document. This approach also applies
to other array-of-list situations, such as radix trees.
Situation 2: Static Arrays
Static arrays, where the data (rather than a pointer to the data) is
located in each array element, and where the array is never resized,
have not been used with RCU. Rik van Riel recommends using seqlock in
this situation, which would also have minimal read-side overhead as long
as updates are rare.
Quick Quiz: Why is it so important that updates be rare when
using seqlock?
Situation 3: Resizeable Arrays
Use of RCU for resizeable arrays is demonstrated by the grow_ary()
function formerly used by the System V IPC code. The array is used
to map from semaphore, message-queue, and shared-memory IDs to the data
structure that represents the corresponding IPC construct. The grow_ary()
function does not acquire any locks; instead its caller must hold the
ids->sem semaphore.
The grow_ary() function, shown below, does some limit checks, allocates a
new ipc_id_ary, copies the old to the new portion of the new, initializes
the remainder of the new, updates the ids->entries pointer to point to
the new array, and invokes ipc_rcu_putref() to free up the old array.
Note that rcu_assign_pointer() is used to update the ids->entries pointer,
which includes any memory barriers required on whatever architecture
you are running on.
static int grow_ary(struct ipc_ids* ids, int newsize)
{
struct ipc_id_ary* new;
struct ipc_id_ary* old;
int i;
int size = ids->entries->size;
if(newsize > IPCMNI)
newsize = IPCMNI;
if(newsize <= size)
return newsize;
new = ipc_rcu_alloc(sizeof(struct kern_ipc_perm *)*newsize +
sizeof(struct ipc_id_ary));
if(new == NULL)
return size;
new->size = newsize;
memcpy(new->p, ids->entries->p,
sizeof(struct kern_ipc_perm *)*size +
sizeof(struct ipc_id_ary));
for(i=size;i<newsize;i++) {
new->p[i] = NULL;
}
old = ids->entries;
/*
* Use rcu_assign_pointer() to make sure the memcpyed
* contents of the new array are visible before the new
* array becomes visible.
*/
rcu_assign_pointer(ids->entries, new);
ipc_rcu_putref(old);
return newsize;
}
The ipc_rcu_putref() function decrements the array's reference count
and then, if the reference count has dropped to zero, uses call_rcu()
to free the array after a grace period has elapsed.
The array is traversed by the ipc_lock() function. This function
indexes into the array under the protection of rcu_read_lock(),
using rcu_dereference() to pick up the pointer to the array so
that it may later safely be dereferenced -- memory barriers are
required on the Alpha CPU. Since the size of the array is stored
with the array itself, there can be no array-size mismatches, so
a simple check suffices. The pointer to the structure corresponding
to the desired IPC object is placed in "out", with NULL indicating
a non-existent entry. After acquiring "out->lock", the "out->deleted"
flag indicates whether the IPC object is in the process of being
deleted, and, if not, the pointer is returned.
struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
{
struct kern_ipc_perm* out;
int lid = id % SEQ_MULTIPLIER;
struct ipc_id_ary* entries;
rcu_read_lock();
entries = rcu_dereference(ids->entries);
if(lid >= entries->size) {
rcu_read_unlock();
return NULL;
}
out = entries->p[lid];
if(out == NULL) {
rcu_read_unlock();
return NULL;
}
spin_lock(&out->lock);
/* ipc_rmid() may have already freed the ID while ipc_lock
* was spinning: here verify that the structure is still valid
*/
if (out->deleted) {
spin_unlock(&out->lock);
rcu_read_unlock();
return NULL;
}
return out;
}
Answer to Quick Quiz:
The reason that it is important that updates be rare when
using seqlock is that frequent updates can livelock readers.
One way to avoid this problem is to assign a seqlock for
each array entry rather than to the entire array.

View File

@@ -5,22 +5,12 @@ RCU concepts
============
.. toctree::
:maxdepth: 3
:maxdepth: 1
arrayRCU
rcubarrier
rcu_dereference
whatisRCU
rcu
listRCU
NMI-RCU
UP
Design/Memory-Ordering/Tree-RCU-Memory-Ordering
Design/Expedited-Grace-Periods/Expedited-Grace-Periods
Design/Requirements/Requirements
Design/Data-Structures/Data-Structures
.. only:: subproject and html
Indices

View File

@@ -99,7 +99,7 @@ With this change, the rcu_dereference() is always within an RCU
read-side critical section, which again would have suppressed the
above lockdep-RCU splat.
But in this particular case, we don't actually dereference the pointer
But in this particular case, we don't actually deference the pointer
returned from rcu_dereference(). Instead, that pointer is just compared
to the cic pointer, which means that the rcu_dereference() can be replaced
by rcu_access_pointer() as follows:

View File

@@ -96,17 +96,7 @@ other flavors of rcu_dereference(). On the other hand, it is illegal
to use rcu_dereference_protected() if either the RCU-protected pointer
or the RCU-protected data that it points to can change concurrently.
Like rcu_dereference(), when lockdep is enabled, RCU list and hlist
traversal primitives check for being called from within an RCU read-side
critical section. However, a lockdep expression can be passed to them
as a additional optional argument. With this lockdep expression, these
traversal primitives will complain only if the lockdep expression is
false and they are called from outside any RCU read-side critical section.
For example, the workqueue for_each_pwq() macro is intended to be used
either within an RCU read-side critical section or with wq->mutex held.
It is thus implemented as follows:
#define for_each_pwq(pwq, wq)
list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node,
lock_is_held(&(wq->mutex).dep_map))
There are currently only "universal" versions of the rcu_assign_pointer()
and RCU list-/tree-traversal primitives, which do not (yet) check for
being in an RCU read-side critical section. In the future, separate
versions of these primitives might be created.

View File

@@ -1,463 +0,0 @@
.. _rcu_dereference_doc:
PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference()
===============================================================
Most of the time, you can use values from rcu_dereference() or one of
the similar primitives without worries. Dereferencing (prefix "*"),
field selection ("->"), assignment ("="), address-of ("&"), addition and
subtraction of constants, and casts all work quite naturally and safely.
It is nevertheless possible to get into trouble with other operations.
Follow these rules to keep your RCU code working properly:
- You must use one of the rcu_dereference() family of primitives
to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU
will complain. Worse yet, your code can see random memory-corruption
bugs due to games that compilers and DEC Alpha can play.
Without one of the rcu_dereference() primitives, compilers
can reload the value, and won't your code have fun with two
different values for a single pointer! Without rcu_dereference(),
DEC Alpha can load a pointer, dereference that pointer, and
return data preceding initialization that preceded the store of
the pointer.
In addition, the volatile cast in rcu_dereference() prevents the
compiler from deducing the resulting pointer value. Please see
the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH"
for an example where the compiler can in fact deduce the exact
value of the pointer, and thus cause misordering.
- You are only permitted to use rcu_dereference on pointer values.
The compiler simply knows too much about integral values to
trust it to carry dependencies through integer operations.
There are a very few exceptions, namely that you can temporarily
cast the pointer to uintptr_t in order to:
- Set bits and clear bits down in the must-be-zero low-order
bits of that pointer. This clearly means that the pointer
must have alignment constraints, for example, this does
-not- work in general for char* pointers.
- XOR bits to translate pointers, as is done in some
classic buddy-allocator algorithms.
It is important to cast the value back to pointer before
doing much of anything else with it.
- Avoid cancellation when using the "+" and "-" infix arithmetic
operators. For example, for a given variable "x", avoid
"(x-(uintptr_t)x)" for char* pointers. The compiler is within its
rights to substitute zero for this sort of expression, so that
subsequent accesses no longer depend on the rcu_dereference(),
again possibly resulting in bugs due to misordering.
Of course, if "p" is a pointer from rcu_dereference(), and "a"
and "b" are integers that happen to be equal, the expression
"p+a-b" is safe because its value still necessarily depends on
the rcu_dereference(), thus maintaining proper ordering.
- If you are using RCU to protect JITed functions, so that the
"()" function-invocation operator is applied to a value obtained
(directly or indirectly) from rcu_dereference(), you may need to
interact directly with the hardware to flush instruction caches.
This issue arises on some systems when a newly JITed function is
using the same memory that was used by an earlier JITed function.
- Do not use the results from relational operators ("==", "!=",
">", ">=", "<", or "<=") when dereferencing. For example,
the following (quite strange) code is buggy::
int *p;
int *q;
...
p = rcu_dereference(gp)
q = &global_q;
q += p > &oom_p;
r1 = *q; /* BUGGY!!! */
As before, the reason this is buggy is that relational operators
are often compiled using branches. And as before, although
weak-memory machines such as ARM or PowerPC do order stores
after such branches, but can speculate loads, which can again
result in misordering bugs.
- Be very careful about comparing pointers obtained from
rcu_dereference() against non-NULL values. As Linus Torvalds
explained, if the two pointers are equal, the compiler could
substitute the pointer you are comparing against for the pointer
obtained from rcu_dereference(). For example::
p = rcu_dereference(gp);
if (p == &default_struct)
do_default(p->a);
Because the compiler now knows that the value of "p" is exactly
the address of the variable "default_struct", it is free to
transform this code into the following::
p = rcu_dereference(gp);
if (p == &default_struct)
do_default(default_struct.a);
On ARM and Power hardware, the load from "default_struct.a"
can now be speculated, such that it might happen before the
rcu_dereference(). This could result in bugs due to misordering.
However, comparisons are OK in the following cases:
- The comparison was against the NULL pointer. If the
compiler knows that the pointer is NULL, you had better
not be dereferencing it anyway. If the comparison is
non-equal, the compiler is none the wiser. Therefore,
it is safe to compare pointers from rcu_dereference()
against NULL pointers.
- The pointer is never dereferenced after being compared.
Since there are no subsequent dereferences, the compiler
cannot use anything it learned from the comparison
to reorder the non-existent subsequent dereferences.
This sort of comparison occurs frequently when scanning
RCU-protected circular linked lists.
Note that if checks for being within an RCU read-side
critical section are not required and the pointer is never
dereferenced, rcu_access_pointer() should be used in place
of rcu_dereference().
- The comparison is against a pointer that references memory
that was initialized "a long time ago." The reason
this is safe is that even if misordering occurs, the
misordering will not affect the accesses that follow
the comparison. So exactly how long ago is "a long
time ago"? Here are some possibilities:
- Compile time.
- Boot time.
- Module-init time for module code.
- Prior to kthread creation for kthread code.
- During some prior acquisition of the lock that
we now hold.
- Before mod_timer() time for a timer handler.
There are many other possibilities involving the Linux
kernel's wide array of primitives that cause code to
be invoked at a later time.
- The pointer being compared against also came from
rcu_dereference(). In this case, both pointers depend
on one rcu_dereference() or another, so you get proper
ordering either way.
That said, this situation can make certain RCU usage
bugs more likely to happen. Which can be a good thing,
at least if they happen during testing. An example
of such an RCU usage bug is shown in the section titled
"EXAMPLE OF AMPLIFIED RCU-USAGE BUG".
- All of the accesses following the comparison are stores,
so that a control dependency preserves the needed ordering.
That said, it is easy to get control dependencies wrong.
Please see the "CONTROL DEPENDENCIES" section of
Documentation/memory-barriers.txt for more details.
- The pointers are not equal -and- the compiler does
not have enough information to deduce the value of the
pointer. Note that the volatile cast in rcu_dereference()
will normally prevent the compiler from knowing too much.
However, please note that if the compiler knows that the
pointer takes on only one of two values, a not-equal
comparison will provide exactly the information that the
compiler needs to deduce the value of the pointer.
- Disable any value-speculation optimizations that your compiler
might provide, especially if you are making use of feedback-based
optimizations that take data collected from prior runs. Such
value-speculation optimizations reorder operations by design.
There is one exception to this rule: Value-speculation
optimizations that leverage the branch-prediction hardware are
safe on strongly ordered systems (such as x86), but not on weakly
ordered systems (such as ARM or Power). Choose your compiler
command-line options wisely!
EXAMPLE OF AMPLIFIED RCU-USAGE BUG
----------------------------------
Because updaters can run concurrently with RCU readers, RCU readers can
see stale and/or inconsistent values. If RCU readers need fresh or
consistent values, which they sometimes do, they need to take proper
precautions. To see this, consider the following code fragment::
struct foo {
int a;
int b;
int c;
};
struct foo *gp1;
struct foo *gp2;
void updater(void)
{
struct foo *p;
p = kmalloc(...);
if (p == NULL)
deal_with_it();
p->a = 42; /* Each field in its own cache line. */
p->b = 43;
p->c = 44;
rcu_assign_pointer(gp1, p);
p->b = 143;
p->c = 144;
rcu_assign_pointer(gp2, p);
}
void reader(void)
{
struct foo *p;
struct foo *q;
int r1, r2;
p = rcu_dereference(gp2);
if (p == NULL)
return;
r1 = p->b; /* Guaranteed to get 143. */
q = rcu_dereference(gp1); /* Guaranteed non-NULL. */
if (p == q) {
/* The compiler decides that q->c is same as p->c. */
r2 = p->c; /* Could get 44 on weakly order system. */
}
do_something_with(r1, r2);
}
You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible,
but you should not be. After all, the updater might have been invoked
a second time between the time reader() loaded into "r1" and the time
that it loaded into "r2". The fact that this same result can occur due
to some reordering from the compiler and CPUs is beside the point.
But suppose that the reader needs a consistent view?
Then one approach is to use locking, for example, as follows::
struct foo {
int a;
int b;
int c;
spinlock_t lock;
};
struct foo *gp1;
struct foo *gp2;
void updater(void)
{
struct foo *p;
p = kmalloc(...);
if (p == NULL)
deal_with_it();
spin_lock(&p->lock);
p->a = 42; /* Each field in its own cache line. */
p->b = 43;
p->c = 44;
spin_unlock(&p->lock);
rcu_assign_pointer(gp1, p);
spin_lock(&p->lock);
p->b = 143;
p->c = 144;
spin_unlock(&p->lock);
rcu_assign_pointer(gp2, p);
}
void reader(void)
{
struct foo *p;
struct foo *q;
int r1, r2;
p = rcu_dereference(gp2);
if (p == NULL)
return;
spin_lock(&p->lock);
r1 = p->b; /* Guaranteed to get 143. */
q = rcu_dereference(gp1); /* Guaranteed non-NULL. */
if (p == q) {
/* The compiler decides that q->c is same as p->c. */
r2 = p->c; /* Locking guarantees r2 == 144. */
}
spin_unlock(&p->lock);
do_something_with(r1, r2);
}
As always, use the right tool for the job!
EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH
-----------------------------------------
If a pointer obtained from rcu_dereference() compares not-equal to some
other pointer, the compiler normally has no clue what the value of the
first pointer might be. This lack of knowledge prevents the compiler
from carrying out optimizations that otherwise might destroy the ordering
guarantees that RCU depends on. And the volatile cast in rcu_dereference()
should prevent the compiler from guessing the value.
But without rcu_dereference(), the compiler knows more than you might
expect. Consider the following code fragment::
struct foo {
int a;
int b;
};
static struct foo variable1;
static struct foo variable2;
static struct foo *gp = &variable1;
void updater(void)
{
initialize_foo(&variable2);
rcu_assign_pointer(gp, &variable2);
/*
* The above is the only store to gp in this translation unit,
* and the address of gp is not exported in any way.
*/
}
int reader(void)
{
struct foo *p;
p = gp;
barrier();
if (p == &variable1)
return p->a; /* Must be variable1.a. */
else
return p->b; /* Must be variable2.b. */
}
Because the compiler can see all stores to "gp", it knows that the only
possible values of "gp" are "variable1" on the one hand and "variable2"
on the other. The comparison in reader() therefore tells the compiler
the exact value of "p" even in the not-equals case. This allows the
compiler to make the return values independent of the load from "gp",
in turn destroying the ordering between this load and the loads of the
return values. This can result in "p->b" returning pre-initialization
garbage values.
In short, rcu_dereference() is -not- optional when you are going to
dereference the resulting pointer.
WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE?
------------------------------------------------------------
First, please avoid using rcu_dereference_raw() and also please avoid
using rcu_dereference_check() and rcu_dereference_protected() with a
second argument with a constant value of 1 (or true, for that matter).
With that caution out of the way, here is some guidance for which
member of the rcu_dereference() to use in various situations:
1. If the access needs to be within an RCU read-side critical
section, use rcu_dereference(). With the new consolidated
RCU flavors, an RCU read-side critical section is entered
using rcu_read_lock(), anything that disables bottom halves,
anything that disables interrupts, or anything that disables
preemption.
2. If the access might be within an RCU read-side critical section
on the one hand, or protected by (say) my_lock on the other,
use rcu_dereference_check(), for example::
p1 = rcu_dereference_check(p->rcu_protected_pointer,
lockdep_is_held(&my_lock));
3. If the access might be within an RCU read-side critical section
on the one hand, or protected by either my_lock or your_lock on
the other, again use rcu_dereference_check(), for example::
p1 = rcu_dereference_check(p->rcu_protected_pointer,
lockdep_is_held(&my_lock) ||
lockdep_is_held(&your_lock));
4. If the access is on the update side, so that it is always protected
by my_lock, use rcu_dereference_protected()::
p1 = rcu_dereference_protected(p->rcu_protected_pointer,
lockdep_is_held(&my_lock));
This can be extended to handle multiple locks as in #3 above,
and both can be extended to check other conditions as well.
5. If the protection is supplied by the caller, and is thus unknown
to this code, that is the rare case when rcu_dereference_raw()
is appropriate. In addition, rcu_dereference_raw() might be
appropriate when the lockdep expression would be excessively
complex, except that a better approach in that case might be to
take a long hard look at your synchronization design. Still,
there are data-locking cases where any one of a very large number
of locks or reference counters suffices to protect the pointer,
so rcu_dereference_raw() does have its place.
However, its place is probably quite a bit smaller than one
might expect given the number of uses in the current kernel.
Ditto for its synonym, rcu_dereference_check( ... , 1), and
its close relative, rcu_dereference_protected(... , 1).
SPARSE CHECKING OF RCU-PROTECTED POINTERS
-----------------------------------------
The sparse static-analysis tool checks for direct access to RCU-protected
pointers, which can result in "interesting" bugs due to compiler
optimizations involving invented loads and perhaps also load tearing.
For example, suppose someone mistakenly does something like this::
p = q->rcu_protected_pointer;
do_something_with(p->a);
do_something_else_with(p->b);
If register pressure is high, the compiler might optimize "p" out
of existence, transforming the code to something like this::
do_something_with(q->rcu_protected_pointer->a);
do_something_else_with(q->rcu_protected_pointer->b);
This could fatally disappoint your code if q->rcu_protected_pointer
changed in the meantime. Nor is this a theoretical problem: Exactly
this sort of bug cost Paul E. McKenney (and several of his innocent
colleagues) a three-day weekend back in the early 1990s.
Load tearing could of course result in dereferencing a mashup of a pair
of pointers, which also might fatally disappoint your code.
These problems could have been avoided simply by making the code instead
read as follows::
p = rcu_dereference(q->rcu_protected_pointer);
do_something_with(p->a);
do_something_else_with(p->b);
Unfortunately, these sorts of bugs can be extremely hard to spot during
review. This is where the sparse tool comes into play, along with the
"__rcu" marker. If you mark a pointer declaration, whether in a structure
or as a formal parameter, with "__rcu", which tells sparse to complain if
this pointer is accessed directly. It will also cause sparse to complain
if a pointer not marked with "__rcu" is accessed using rcu_dereference()
and friends. For example, ->rcu_protected_pointer might be declared as
follows::
struct foo __rcu *rcu_protected_pointer;
Use of "__rcu" is opt-in. If you choose not to use it, then you should
ignore the sparse warnings.

View File

@@ -0,0 +1,456 @@
PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference()
Most of the time, you can use values from rcu_dereference() or one of
the similar primitives without worries. Dereferencing (prefix "*"),
field selection ("->"), assignment ("="), address-of ("&"), addition and
subtraction of constants, and casts all work quite naturally and safely.
It is nevertheless possible to get into trouble with other operations.
Follow these rules to keep your RCU code working properly:
o You must use one of the rcu_dereference() family of primitives
to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU
will complain. Worse yet, your code can see random memory-corruption
bugs due to games that compilers and DEC Alpha can play.
Without one of the rcu_dereference() primitives, compilers
can reload the value, and won't your code have fun with two
different values for a single pointer! Without rcu_dereference(),
DEC Alpha can load a pointer, dereference that pointer, and
return data preceding initialization that preceded the store of
the pointer.
In addition, the volatile cast in rcu_dereference() prevents the
compiler from deducing the resulting pointer value. Please see
the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH"
for an example where the compiler can in fact deduce the exact
value of the pointer, and thus cause misordering.
o You are only permitted to use rcu_dereference on pointer values.
The compiler simply knows too much about integral values to
trust it to carry dependencies through integer operations.
There are a very few exceptions, namely that you can temporarily
cast the pointer to uintptr_t in order to:
o Set bits and clear bits down in the must-be-zero low-order
bits of that pointer. This clearly means that the pointer
must have alignment constraints, for example, this does
-not- work in general for char* pointers.
o XOR bits to translate pointers, as is done in some
classic buddy-allocator algorithms.
It is important to cast the value back to pointer before
doing much of anything else with it.
o Avoid cancellation when using the "+" and "-" infix arithmetic
operators. For example, for a given variable "x", avoid
"(x-(uintptr_t)x)" for char* pointers. The compiler is within its
rights to substitute zero for this sort of expression, so that
subsequent accesses no longer depend on the rcu_dereference(),
again possibly resulting in bugs due to misordering.
Of course, if "p" is a pointer from rcu_dereference(), and "a"
and "b" are integers that happen to be equal, the expression
"p+a-b" is safe because its value still necessarily depends on
the rcu_dereference(), thus maintaining proper ordering.
o If you are using RCU to protect JITed functions, so that the
"()" function-invocation operator is applied to a value obtained
(directly or indirectly) from rcu_dereference(), you may need to
interact directly with the hardware to flush instruction caches.
This issue arises on some systems when a newly JITed function is
using the same memory that was used by an earlier JITed function.
o Do not use the results from relational operators ("==", "!=",
">", ">=", "<", or "<=") when dereferencing. For example,
the following (quite strange) code is buggy:
int *p;
int *q;
...
p = rcu_dereference(gp)
q = &global_q;
q += p > &oom_p;
r1 = *q; /* BUGGY!!! */
As before, the reason this is buggy is that relational operators
are often compiled using branches. And as before, although
weak-memory machines such as ARM or PowerPC do order stores
after such branches, but can speculate loads, which can again
result in misordering bugs.
o Be very careful about comparing pointers obtained from
rcu_dereference() against non-NULL values. As Linus Torvalds
explained, if the two pointers are equal, the compiler could
substitute the pointer you are comparing against for the pointer
obtained from rcu_dereference(). For example:
p = rcu_dereference(gp);
if (p == &default_struct)
do_default(p->a);
Because the compiler now knows that the value of "p" is exactly
the address of the variable "default_struct", it is free to
transform this code into the following:
p = rcu_dereference(gp);
if (p == &default_struct)
do_default(default_struct.a);
On ARM and Power hardware, the load from "default_struct.a"
can now be speculated, such that it might happen before the
rcu_dereference(). This could result in bugs due to misordering.
However, comparisons are OK in the following cases:
o The comparison was against the NULL pointer. If the
compiler knows that the pointer is NULL, you had better
not be dereferencing it anyway. If the comparison is
non-equal, the compiler is none the wiser. Therefore,
it is safe to compare pointers from rcu_dereference()
against NULL pointers.
o The pointer is never dereferenced after being compared.
Since there are no subsequent dereferences, the compiler
cannot use anything it learned from the comparison
to reorder the non-existent subsequent dereferences.
This sort of comparison occurs frequently when scanning
RCU-protected circular linked lists.
Note that if checks for being within an RCU read-side
critical section are not required and the pointer is never
dereferenced, rcu_access_pointer() should be used in place
of rcu_dereference().
o The comparison is against a pointer that references memory
that was initialized "a long time ago." The reason
this is safe is that even if misordering occurs, the
misordering will not affect the accesses that follow
the comparison. So exactly how long ago is "a long
time ago"? Here are some possibilities:
o Compile time.
o Boot time.
o Module-init time for module code.
o Prior to kthread creation for kthread code.
o During some prior acquisition of the lock that
we now hold.
o Before mod_timer() time for a timer handler.
There are many other possibilities involving the Linux
kernel's wide array of primitives that cause code to
be invoked at a later time.
o The pointer being compared against also came from
rcu_dereference(). In this case, both pointers depend
on one rcu_dereference() or another, so you get proper
ordering either way.
That said, this situation can make certain RCU usage
bugs more likely to happen. Which can be a good thing,
at least if they happen during testing. An example
of such an RCU usage bug is shown in the section titled
"EXAMPLE OF AMPLIFIED RCU-USAGE BUG".
o All of the accesses following the comparison are stores,
so that a control dependency preserves the needed ordering.
That said, it is easy to get control dependencies wrong.
Please see the "CONTROL DEPENDENCIES" section of
Documentation/memory-barriers.txt for more details.
o The pointers are not equal -and- the compiler does
not have enough information to deduce the value of the
pointer. Note that the volatile cast in rcu_dereference()
will normally prevent the compiler from knowing too much.
However, please note that if the compiler knows that the
pointer takes on only one of two values, a not-equal
comparison will provide exactly the information that the
compiler needs to deduce the value of the pointer.
o Disable any value-speculation optimizations that your compiler
might provide, especially if you are making use of feedback-based
optimizations that take data collected from prior runs. Such
value-speculation optimizations reorder operations by design.
There is one exception to this rule: Value-speculation
optimizations that leverage the branch-prediction hardware are
safe on strongly ordered systems (such as x86), but not on weakly
ordered systems (such as ARM or Power). Choose your compiler
command-line options wisely!
EXAMPLE OF AMPLIFIED RCU-USAGE BUG
Because updaters can run concurrently with RCU readers, RCU readers can
see stale and/or inconsistent values. If RCU readers need fresh or
consistent values, which they sometimes do, they need to take proper
precautions. To see this, consider the following code fragment:
struct foo {
int a;
int b;
int c;
};
struct foo *gp1;
struct foo *gp2;
void updater(void)
{
struct foo *p;
p = kmalloc(...);
if (p == NULL)
deal_with_it();
p->a = 42; /* Each field in its own cache line. */
p->b = 43;
p->c = 44;
rcu_assign_pointer(gp1, p);
p->b = 143;
p->c = 144;
rcu_assign_pointer(gp2, p);
}
void reader(void)
{
struct foo *p;
struct foo *q;
int r1, r2;
p = rcu_dereference(gp2);
if (p == NULL)
return;
r1 = p->b; /* Guaranteed to get 143. */
q = rcu_dereference(gp1); /* Guaranteed non-NULL. */
if (p == q) {
/* The compiler decides that q->c is same as p->c. */
r2 = p->c; /* Could get 44 on weakly order system. */
}
do_something_with(r1, r2);
}
You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible,
but you should not be. After all, the updater might have been invoked
a second time between the time reader() loaded into "r1" and the time
that it loaded into "r2". The fact that this same result can occur due
to some reordering from the compiler and CPUs is beside the point.
But suppose that the reader needs a consistent view?
Then one approach is to use locking, for example, as follows:
struct foo {
int a;
int b;
int c;
spinlock_t lock;
};
struct foo *gp1;
struct foo *gp2;
void updater(void)
{
struct foo *p;
p = kmalloc(...);
if (p == NULL)
deal_with_it();
spin_lock(&p->lock);
p->a = 42; /* Each field in its own cache line. */
p->b = 43;
p->c = 44;
spin_unlock(&p->lock);
rcu_assign_pointer(gp1, p);
spin_lock(&p->lock);
p->b = 143;
p->c = 144;
spin_unlock(&p->lock);
rcu_assign_pointer(gp2, p);
}
void reader(void)
{
struct foo *p;
struct foo *q;
int r1, r2;
p = rcu_dereference(gp2);
if (p == NULL)
return;
spin_lock(&p->lock);
r1 = p->b; /* Guaranteed to get 143. */
q = rcu_dereference(gp1); /* Guaranteed non-NULL. */
if (p == q) {
/* The compiler decides that q->c is same as p->c. */
r2 = p->c; /* Locking guarantees r2 == 144. */
}
spin_unlock(&p->lock);
do_something_with(r1, r2);
}
As always, use the right tool for the job!
EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH
If a pointer obtained from rcu_dereference() compares not-equal to some
other pointer, the compiler normally has no clue what the value of the
first pointer might be. This lack of knowledge prevents the compiler
from carrying out optimizations that otherwise might destroy the ordering
guarantees that RCU depends on. And the volatile cast in rcu_dereference()
should prevent the compiler from guessing the value.
But without rcu_dereference(), the compiler knows more than you might
expect. Consider the following code fragment:
struct foo {
int a;
int b;
};
static struct foo variable1;
static struct foo variable2;
static struct foo *gp = &variable1;
void updater(void)
{
initialize_foo(&variable2);
rcu_assign_pointer(gp, &variable2);
/*
* The above is the only store to gp in this translation unit,
* and the address of gp is not exported in any way.
*/
}
int reader(void)
{
struct foo *p;
p = gp;
barrier();
if (p == &variable1)
return p->a; /* Must be variable1.a. */
else
return p->b; /* Must be variable2.b. */
}
Because the compiler can see all stores to "gp", it knows that the only
possible values of "gp" are "variable1" on the one hand and "variable2"
on the other. The comparison in reader() therefore tells the compiler
the exact value of "p" even in the not-equals case. This allows the
compiler to make the return values independent of the load from "gp",
in turn destroying the ordering between this load and the loads of the
return values. This can result in "p->b" returning pre-initialization
garbage values.
In short, rcu_dereference() is -not- optional when you are going to
dereference the resulting pointer.
WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE?
First, please avoid using rcu_dereference_raw() and also please avoid
using rcu_dereference_check() and rcu_dereference_protected() with a
second argument with a constant value of 1 (or true, for that matter).
With that caution out of the way, here is some guidance for which
member of the rcu_dereference() to use in various situations:
1. If the access needs to be within an RCU read-side critical
section, use rcu_dereference(). With the new consolidated
RCU flavors, an RCU read-side critical section is entered
using rcu_read_lock(), anything that disables bottom halves,
anything that disables interrupts, or anything that disables
preemption.
2. If the access might be within an RCU read-side critical section
on the one hand, or protected by (say) my_lock on the other,
use rcu_dereference_check(), for example:
p1 = rcu_dereference_check(p->rcu_protected_pointer,
lockdep_is_held(&my_lock));
3. If the access might be within an RCU read-side critical section
on the one hand, or protected by either my_lock or your_lock on
the other, again use rcu_dereference_check(), for example:
p1 = rcu_dereference_check(p->rcu_protected_pointer,
lockdep_is_held(&my_lock) ||
lockdep_is_held(&your_lock));
4. If the access is on the update side, so that it is always protected
by my_lock, use rcu_dereference_protected():
p1 = rcu_dereference_protected(p->rcu_protected_pointer,
lockdep_is_held(&my_lock));
This can be extended to handle multiple locks as in #3 above,
and both can be extended to check other conditions as well.
5. If the protection is supplied by the caller, and is thus unknown
to this code, that is the rare case when rcu_dereference_raw()
is appropriate. In addition, rcu_dereference_raw() might be
appropriate when the lockdep expression would be excessively
complex, except that a better approach in that case might be to
take a long hard look at your synchronization design. Still,
there are data-locking cases where any one of a very large number
of locks or reference counters suffices to protect the pointer,
so rcu_dereference_raw() does have its place.
However, its place is probably quite a bit smaller than one
might expect given the number of uses in the current kernel.
Ditto for its synonym, rcu_dereference_check( ... , 1), and
its close relative, rcu_dereference_protected(... , 1).
SPARSE CHECKING OF RCU-PROTECTED POINTERS
The sparse static-analysis tool checks for direct access to RCU-protected
pointers, which can result in "interesting" bugs due to compiler
optimizations involving invented loads and perhaps also load tearing.
For example, suppose someone mistakenly does something like this:
p = q->rcu_protected_pointer;
do_something_with(p->a);
do_something_else_with(p->b);
If register pressure is high, the compiler might optimize "p" out
of existence, transforming the code to something like this:
do_something_with(q->rcu_protected_pointer->a);
do_something_else_with(q->rcu_protected_pointer->b);
This could fatally disappoint your code if q->rcu_protected_pointer
changed in the meantime. Nor is this a theoretical problem: Exactly
this sort of bug cost Paul E. McKenney (and several of his innocent
colleagues) a three-day weekend back in the early 1990s.
Load tearing could of course result in dereferencing a mashup of a pair
of pointers, which also might fatally disappoint your code.
These problems could have been avoided simply by making the code instead
read as follows:
p = rcu_dereference(q->rcu_protected_pointer);
do_something_with(p->a);
do_something_else_with(p->b);
Unfortunately, these sorts of bugs can be extremely hard to spot during
review. This is where the sparse tool comes into play, along with the
"__rcu" marker. If you mark a pointer declaration, whether in a structure
or as a formal parameter, with "__rcu", which tells sparse to complain if
this pointer is accessed directly. It will also cause sparse to complain
if a pointer not marked with "__rcu" is accessed using rcu_dereference()
and friends. For example, ->rcu_protected_pointer might be declared as
follows:
struct foo __rcu *rcu_protected_pointer;
Use of "__rcu" is opt-in. If you choose not to use it, then you should
ignore the sparse warnings.

View File

@@ -1,353 +0,0 @@
.. _rcu_barrier:
RCU and Unloadable Modules
==========================
[Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/]
RCU (read-copy update) is a synchronization mechanism that can be thought
of as a replacement for read-writer locking (among other things), but with
very low-overhead readers that are immune to deadlock, priority inversion,
and unbounded latency. RCU read-side critical sections are delimited
by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPT
kernels, generate no code whatsoever.
This means that RCU writers are unaware of the presence of concurrent
readers, so that RCU updates to shared data must be undertaken quite
carefully, leaving an old version of the data structure in place until all
pre-existing readers have finished. These old versions are needed because
such readers might hold a reference to them. RCU updates can therefore be
rather expensive, and RCU is thus best suited for read-mostly situations.
How can an RCU writer possibly determine when all readers are finished,
given that readers might well leave absolutely no trace of their
presence? There is a synchronize_rcu() primitive that blocks until all
pre-existing readers have completed. An updater wishing to delete an
element p from a linked list might do the following, while holding an
appropriate lock, of course::
list_del_rcu(p);
synchronize_rcu();
kfree(p);
But the above code cannot be used in IRQ context -- the call_rcu()
primitive must be used instead. This primitive takes a pointer to an
rcu_head struct placed within the RCU-protected data structure and
another pointer to a function that may be invoked later to free that
structure. Code to delete an element p from the linked list from IRQ
context might then be as follows::
list_del_rcu(p);
call_rcu(&p->rcu, p_callback);
Since call_rcu() never blocks, this code can safely be used from within
IRQ context. The function p_callback() might be defined as follows::
static void p_callback(struct rcu_head *rp)
{
struct pstruct *p = container_of(rp, struct pstruct, rcu);
kfree(p);
}
Unloading Modules That Use call_rcu()
-------------------------------------
But what if p_callback is defined in an unloadable module?
If we unload the module while some RCU callbacks are pending,
the CPUs executing these callbacks are going to be severely
disappointed when they are later invoked, as fancifully depicted at
http://lwn.net/images/ns/kernel/rcu-drop.jpg.
We could try placing a synchronize_rcu() in the module-exit code path,
but this is not sufficient. Although synchronize_rcu() does wait for a
grace period to elapse, it does not wait for the callbacks to complete.
One might be tempted to try several back-to-back synchronize_rcu()
calls, but this is still not guaranteed to work. If there is a very
heavy RCU-callback load, then some of the callbacks might be deferred
in order to allow other processing to proceed. Such deferral is required
in realtime kernels in order to avoid excessive scheduling latencies.
rcu_barrier()
-------------
We instead need the rcu_barrier() primitive. Rather than waiting for
a grace period to elapse, rcu_barrier() waits for all outstanding RCU
callbacks to complete. Please note that rcu_barrier() does **not** imply
synchronize_rcu(), in particular, if there are no RCU callbacks queued
anywhere, rcu_barrier() is within its rights to return immediately,
without waiting for a grace period to elapse.
Pseudo-code using rcu_barrier() is as follows:
1. Prevent any new RCU callbacks from being posted.
2. Execute rcu_barrier().
3. Allow the module to be unloaded.
There is also an srcu_barrier() function for SRCU, and you of course
must match the flavor of rcu_barrier() with that of call_rcu(). If your
module uses multiple flavors of call_rcu(), then it must also use multiple
flavors of rcu_barrier() when unloading that module. For example, if
it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on
srcu_struct_2, then the following three lines of code will be required
when unloading::
1 rcu_barrier();
2 srcu_barrier(&srcu_struct_1);
3 srcu_barrier(&srcu_struct_2);
The rcutorture module makes use of rcu_barrier() in its exit function
as follows::
1 static void
2 rcu_torture_cleanup(void)
3 {
4 int i;
5
6 fullstop = 1;
7 if (shuffler_task != NULL) {
8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task");
9 kthread_stop(shuffler_task);
10 }
11 shuffler_task = NULL;
12
13 if (writer_task != NULL) {
14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
15 kthread_stop(writer_task);
16 }
17 writer_task = NULL;
18
19 if (reader_tasks != NULL) {
20 for (i = 0; i < nrealreaders; i++) {
21 if (reader_tasks[i] != NULL) {
22 VERBOSE_PRINTK_STRING(
23 "Stopping rcu_torture_reader task");
24 kthread_stop(reader_tasks[i]);
25 }
26 reader_tasks[i] = NULL;
27 }
28 kfree(reader_tasks);
29 reader_tasks = NULL;
30 }
31 rcu_torture_current = NULL;
32
33 if (fakewriter_tasks != NULL) {
34 for (i = 0; i < nfakewriters; i++) {
35 if (fakewriter_tasks[i] != NULL) {
36 VERBOSE_PRINTK_STRING(
37 "Stopping rcu_torture_fakewriter task");
38 kthread_stop(fakewriter_tasks[i]);
39 }
40 fakewriter_tasks[i] = NULL;
41 }
42 kfree(fakewriter_tasks);
43 fakewriter_tasks = NULL;
44 }
45
46 if (stats_task != NULL) {
47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
48 kthread_stop(stats_task);
49 }
50 stats_task = NULL;
51
52 /* Wait for all RCU callbacks to fire. */
53 rcu_barrier();
54
55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
56
57 if (cur_ops->cleanup != NULL)
58 cur_ops->cleanup();
59 if (atomic_read(&n_rcu_torture_error))
60 rcu_torture_print_module_parms("End of test: FAILURE");
61 else
62 rcu_torture_print_module_parms("End of test: SUCCESS");
63 }
Line 6 sets a global variable that prevents any RCU callbacks from
re-posting themselves. This will not be necessary in most cases, since
RCU callbacks rarely include calls to call_rcu(). However, the rcutorture
module is an exception to this rule, and therefore needs to set this
global variable.
Lines 7-50 stop all the kernel tasks associated with the rcutorture
module. Therefore, once execution reaches line 53, no more rcutorture
RCU callbacks will be posted. The rcu_barrier() call on line 53 waits
for any pre-existing callbacks to complete.
Then lines 55-62 print status and do operation-specific cleanup, and
then return, permitting the module-unload operation to be completed.
.. _rcubarrier_quiz_1:
Quick Quiz #1:
Is there any other situation where rcu_barrier() might
be required?
:ref:`Answer to Quick Quiz #1 <answer_rcubarrier_quiz_1>`
Your module might have additional complications. For example, if your
module invokes call_rcu() from timers, you will need to first cancel all
the timers, and only then invoke rcu_barrier() to wait for any remaining
RCU callbacks to complete.
Of course, if you module uses call_rcu(), you will need to invoke
rcu_barrier() before unloading. Similarly, if your module uses
call_srcu(), you will need to invoke srcu_barrier() before unloading,
and on the same srcu_struct structure. If your module uses call_rcu()
**and** call_srcu(), then you will need to invoke rcu_barrier() **and**
srcu_barrier().
Implementing rcu_barrier()
--------------------------
Dipankar Sarma's implementation of rcu_barrier() makes use of the fact
that RCU callbacks are never reordered once queued on one of the per-CPU
queues. His implementation queues an RCU callback on each of the per-CPU
callback queues, and then waits until they have all started executing, at
which point, all earlier RCU callbacks are guaranteed to have completed.
The original code for rcu_barrier() was as follows::
1 void rcu_barrier(void)
2 {
3 BUG_ON(in_interrupt());
4 /* Take cpucontrol mutex to protect against CPU hotplug */
5 mutex_lock(&rcu_barrier_mutex);
6 init_completion(&rcu_barrier_completion);
7 atomic_set(&rcu_barrier_cpu_count, 0);
8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
9 wait_for_completion(&rcu_barrier_completion);
10 mutex_unlock(&rcu_barrier_mutex);
11 }
Line 3 verifies that the caller is in process context, and lines 5 and 10
use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the
global completion and counters at a time, which are initialized on lines
6 and 7. Line 8 causes each CPU to invoke rcu_barrier_func(), which is
shown below. Note that the final "1" in on_each_cpu()'s argument list
ensures that all the calls to rcu_barrier_func() will have completed
before on_each_cpu() returns. Line 9 then waits for the completion.
This code was rewritten in 2008 and several times thereafter, but this
still gives the general idea.
The rcu_barrier_func() runs on each CPU, where it invokes call_rcu()
to post an RCU callback, as follows::
1 static void rcu_barrier_func(void *notused)
2 {
3 int cpu = smp_processor_id();
4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
5 struct rcu_head *head;
6
7 head = &rdp->barrier;
8 atomic_inc(&rcu_barrier_cpu_count);
9 call_rcu(head, rcu_barrier_callback);
10 }
Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure,
which contains the struct rcu_head that needed for the later call to
call_rcu(). Line 7 picks up a pointer to this struct rcu_head, and line
8 increments a global counter. This counter will later be decremented
by the callback. Line 9 then registers the rcu_barrier_callback() on
the current CPU's queue.
The rcu_barrier_callback() function simply atomically decrements the
rcu_barrier_cpu_count variable and finalizes the completion when it
reaches zero, as follows::
1 static void rcu_barrier_callback(struct rcu_head *notused)
2 {
3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
4 complete(&rcu_barrier_completion);
5 }
.. _rcubarrier_quiz_2:
Quick Quiz #2:
What happens if CPU 0's rcu_barrier_func() executes
immediately (thus incrementing rcu_barrier_cpu_count to the
value one), but the other CPU's rcu_barrier_func() invocations
are delayed for a full grace period? Couldn't this result in
rcu_barrier() returning prematurely?
:ref:`Answer to Quick Quiz #2 <answer_rcubarrier_quiz_2>`
The current rcu_barrier() implementation is more complex, due to the need
to avoid disturbing idle CPUs (especially on battery-powered systems)
and the need to minimally disturb non-idle CPUs in real-time systems.
However, the code above illustrates the concepts.
rcu_barrier() Summary
---------------------
The rcu_barrier() primitive has seen relatively little use, since most
code using RCU is in the core kernel rather than in modules. However, if
you are using RCU from an unloadable module, you need to use rcu_barrier()
so that your module may be safely unloaded.
Answers to Quick Quizzes
------------------------
.. _answer_rcubarrier_quiz_1:
Quick Quiz #1:
Is there any other situation where rcu_barrier() might
be required?
Answer: Interestingly enough, rcu_barrier() was not originally
implemented for module unloading. Nikita Danilov was using
RCU in a filesystem, which resulted in a similar situation at
filesystem-unmount time. Dipankar Sarma coded up rcu_barrier()
in response, so that Nikita could invoke it during the
filesystem-unmount process.
Much later, yours truly hit the RCU module-unload problem when
implementing rcutorture, and found that rcu_barrier() solves
this problem as well.
:ref:`Back to Quick Quiz #1 <rcubarrier_quiz_1>`
.. _answer_rcubarrier_quiz_2:
Quick Quiz #2:
What happens if CPU 0's rcu_barrier_func() executes
immediately (thus incrementing rcu_barrier_cpu_count to the
value one), but the other CPU's rcu_barrier_func() invocations
are delayed for a full grace period? Couldn't this result in
rcu_barrier() returning prematurely?
Answer: This cannot happen. The reason is that on_each_cpu() has its last
argument, the wait flag, set to "1". This flag is passed through
to smp_call_function() and further to smp_call_function_on_cpu(),
causing this latter to spin until the cross-CPU invocation of
rcu_barrier_func() has completed. This by itself would prevent
a grace period from completing on non-CONFIG_PREEMPT kernels,
since each CPU must undergo a context switch (or other quiescent
state) before the grace period can complete. However, this is
of no use in CONFIG_PREEMPT kernels.
Therefore, on_each_cpu() disables preemption across its call
to smp_call_function() and also across the local call to
rcu_barrier_func(). This prevents the local CPU from context
switching, again preventing grace periods from completing. This
means that all CPUs have executed rcu_barrier_func() before
the first rcu_barrier_callback() can possibly execute, in turn
preventing rcu_barrier_cpu_count from prematurely reaching zero.
Currently, -rt implementations of RCU keep but a single global
queue for RCU callbacks, and thus do not suffer from this
problem. However, when the -rt RCU eventually does have per-CPU
callback queues, things will have to change. One simple change
is to add an rcu_read_lock() before line 8 of rcu_barrier()
and an rcu_read_unlock() after line 8 of this same function. If
you can think of a better change, please let me know!
:ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2>`

View File

@@ -0,0 +1,325 @@
RCU and Unloadable Modules
[Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/]
RCU (read-copy update) is a synchronization mechanism that can be thought
of as a replacement for read-writer locking (among other things), but with
very low-overhead readers that are immune to deadlock, priority inversion,
and unbounded latency. RCU read-side critical sections are delimited
by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPT
kernels, generate no code whatsoever.
This means that RCU writers are unaware of the presence of concurrent
readers, so that RCU updates to shared data must be undertaken quite
carefully, leaving an old version of the data structure in place until all
pre-existing readers have finished. These old versions are needed because
such readers might hold a reference to them. RCU updates can therefore be
rather expensive, and RCU is thus best suited for read-mostly situations.
How can an RCU writer possibly determine when all readers are finished,
given that readers might well leave absolutely no trace of their
presence? There is a synchronize_rcu() primitive that blocks until all
pre-existing readers have completed. An updater wishing to delete an
element p from a linked list might do the following, while holding an
appropriate lock, of course:
list_del_rcu(p);
synchronize_rcu();
kfree(p);
But the above code cannot be used in IRQ context -- the call_rcu()
primitive must be used instead. This primitive takes a pointer to an
rcu_head struct placed within the RCU-protected data structure and
another pointer to a function that may be invoked later to free that
structure. Code to delete an element p from the linked list from IRQ
context might then be as follows:
list_del_rcu(p);
call_rcu(&p->rcu, p_callback);
Since call_rcu() never blocks, this code can safely be used from within
IRQ context. The function p_callback() might be defined as follows:
static void p_callback(struct rcu_head *rp)
{
struct pstruct *p = container_of(rp, struct pstruct, rcu);
kfree(p);
}
Unloading Modules That Use call_rcu()
But what if p_callback is defined in an unloadable module?
If we unload the module while some RCU callbacks are pending,
the CPUs executing these callbacks are going to be severely
disappointed when they are later invoked, as fancifully depicted at
http://lwn.net/images/ns/kernel/rcu-drop.jpg.
We could try placing a synchronize_rcu() in the module-exit code path,
but this is not sufficient. Although synchronize_rcu() does wait for a
grace period to elapse, it does not wait for the callbacks to complete.
One might be tempted to try several back-to-back synchronize_rcu()
calls, but this is still not guaranteed to work. If there is a very
heavy RCU-callback load, then some of the callbacks might be deferred
in order to allow other processing to proceed. Such deferral is required
in realtime kernels in order to avoid excessive scheduling latencies.
rcu_barrier()
We instead need the rcu_barrier() primitive. Rather than waiting for
a grace period to elapse, rcu_barrier() waits for all outstanding RCU
callbacks to complete. Please note that rcu_barrier() does -not- imply
synchronize_rcu(), in particular, if there are no RCU callbacks queued
anywhere, rcu_barrier() is within its rights to return immediately,
without waiting for a grace period to elapse.
Pseudo-code using rcu_barrier() is as follows:
1. Prevent any new RCU callbacks from being posted.
2. Execute rcu_barrier().
3. Allow the module to be unloaded.
There is also an srcu_barrier() function for SRCU, and you of course
must match the flavor of rcu_barrier() with that of call_rcu(). If your
module uses multiple flavors of call_rcu(), then it must also use multiple
flavors of rcu_barrier() when unloading that module. For example, if
it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on
srcu_struct_2(), then the following three lines of code will be required
when unloading:
1 rcu_barrier();
2 srcu_barrier(&srcu_struct_1);
3 srcu_barrier(&srcu_struct_2);
The rcutorture module makes use of rcu_barrier() in its exit function
as follows:
1 static void
2 rcu_torture_cleanup(void)
3 {
4 int i;
5
6 fullstop = 1;
7 if (shuffler_task != NULL) {
8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task");
9 kthread_stop(shuffler_task);
10 }
11 shuffler_task = NULL;
12
13 if (writer_task != NULL) {
14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
15 kthread_stop(writer_task);
16 }
17 writer_task = NULL;
18
19 if (reader_tasks != NULL) {
20 for (i = 0; i < nrealreaders; i++) {
21 if (reader_tasks[i] != NULL) {
22 VERBOSE_PRINTK_STRING(
23 "Stopping rcu_torture_reader task");
24 kthread_stop(reader_tasks[i]);
25 }
26 reader_tasks[i] = NULL;
27 }
28 kfree(reader_tasks);
29 reader_tasks = NULL;
30 }
31 rcu_torture_current = NULL;
32
33 if (fakewriter_tasks != NULL) {
34 for (i = 0; i < nfakewriters; i++) {
35 if (fakewriter_tasks[i] != NULL) {
36 VERBOSE_PRINTK_STRING(
37 "Stopping rcu_torture_fakewriter task");
38 kthread_stop(fakewriter_tasks[i]);
39 }
40 fakewriter_tasks[i] = NULL;
41 }
42 kfree(fakewriter_tasks);
43 fakewriter_tasks = NULL;
44 }
45
46 if (stats_task != NULL) {
47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
48 kthread_stop(stats_task);
49 }
50 stats_task = NULL;
51
52 /* Wait for all RCU callbacks to fire. */
53 rcu_barrier();
54
55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
56
57 if (cur_ops->cleanup != NULL)
58 cur_ops->cleanup();
59 if (atomic_read(&n_rcu_torture_error))
60 rcu_torture_print_module_parms("End of test: FAILURE");
61 else
62 rcu_torture_print_module_parms("End of test: SUCCESS");
63 }
Line 6 sets a global variable that prevents any RCU callbacks from
re-posting themselves. This will not be necessary in most cases, since
RCU callbacks rarely include calls to call_rcu(). However, the rcutorture
module is an exception to this rule, and therefore needs to set this
global variable.
Lines 7-50 stop all the kernel tasks associated with the rcutorture
module. Therefore, once execution reaches line 53, no more rcutorture
RCU callbacks will be posted. The rcu_barrier() call on line 53 waits
for any pre-existing callbacks to complete.
Then lines 55-62 print status and do operation-specific cleanup, and
then return, permitting the module-unload operation to be completed.
Quick Quiz #1: Is there any other situation where rcu_barrier() might
be required?
Your module might have additional complications. For example, if your
module invokes call_rcu() from timers, you will need to first cancel all
the timers, and only then invoke rcu_barrier() to wait for any remaining
RCU callbacks to complete.
Of course, if you module uses call_rcu(), you will need to invoke
rcu_barrier() before unloading. Similarly, if your module uses
call_srcu(), you will need to invoke srcu_barrier() before unloading,
and on the same srcu_struct structure. If your module uses call_rcu()
-and- call_srcu(), then you will need to invoke rcu_barrier() -and-
srcu_barrier().
Implementing rcu_barrier()
Dipankar Sarma's implementation of rcu_barrier() makes use of the fact
that RCU callbacks are never reordered once queued on one of the per-CPU
queues. His implementation queues an RCU callback on each of the per-CPU
callback queues, and then waits until they have all started executing, at
which point, all earlier RCU callbacks are guaranteed to have completed.
The original code for rcu_barrier() was as follows:
1 void rcu_barrier(void)
2 {
3 BUG_ON(in_interrupt());
4 /* Take cpucontrol mutex to protect against CPU hotplug */
5 mutex_lock(&rcu_barrier_mutex);
6 init_completion(&rcu_barrier_completion);
7 atomic_set(&rcu_barrier_cpu_count, 0);
8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
9 wait_for_completion(&rcu_barrier_completion);
10 mutex_unlock(&rcu_barrier_mutex);
11 }
Line 3 verifies that the caller is in process context, and lines 5 and 10
use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the
global completion and counters at a time, which are initialized on lines
6 and 7. Line 8 causes each CPU to invoke rcu_barrier_func(), which is
shown below. Note that the final "1" in on_each_cpu()'s argument list
ensures that all the calls to rcu_barrier_func() will have completed
before on_each_cpu() returns. Line 9 then waits for the completion.
This code was rewritten in 2008 and several times thereafter, but this
still gives the general idea.
The rcu_barrier_func() runs on each CPU, where it invokes call_rcu()
to post an RCU callback, as follows:
1 static void rcu_barrier_func(void *notused)
2 {
3 int cpu = smp_processor_id();
4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
5 struct rcu_head *head;
6
7 head = &rdp->barrier;
8 atomic_inc(&rcu_barrier_cpu_count);
9 call_rcu(head, rcu_barrier_callback);
10 }
Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure,
which contains the struct rcu_head that needed for the later call to
call_rcu(). Line 7 picks up a pointer to this struct rcu_head, and line
8 increments a global counter. This counter will later be decremented
by the callback. Line 9 then registers the rcu_barrier_callback() on
the current CPU's queue.
The rcu_barrier_callback() function simply atomically decrements the
rcu_barrier_cpu_count variable and finalizes the completion when it
reaches zero, as follows:
1 static void rcu_barrier_callback(struct rcu_head *notused)
2 {
3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
4 complete(&rcu_barrier_completion);
5 }
Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes
immediately (thus incrementing rcu_barrier_cpu_count to the
value one), but the other CPU's rcu_barrier_func() invocations
are delayed for a full grace period? Couldn't this result in
rcu_barrier() returning prematurely?
The current rcu_barrier() implementation is more complex, due to the need
to avoid disturbing idle CPUs (especially on battery-powered systems)
and the need to minimally disturb non-idle CPUs in real-time systems.
However, the code above illustrates the concepts.
rcu_barrier() Summary
The rcu_barrier() primitive has seen relatively little use, since most
code using RCU is in the core kernel rather than in modules. However, if
you are using RCU from an unloadable module, you need to use rcu_barrier()
so that your module may be safely unloaded.
Answers to Quick Quizzes
Quick Quiz #1: Is there any other situation where rcu_barrier() might
be required?
Answer: Interestingly enough, rcu_barrier() was not originally
implemented for module unloading. Nikita Danilov was using
RCU in a filesystem, which resulted in a similar situation at
filesystem-unmount time. Dipankar Sarma coded up rcu_barrier()
in response, so that Nikita could invoke it during the
filesystem-unmount process.
Much later, yours truly hit the RCU module-unload problem when
implementing rcutorture, and found that rcu_barrier() solves
this problem as well.
Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes
immediately (thus incrementing rcu_barrier_cpu_count to the
value one), but the other CPU's rcu_barrier_func() invocations
are delayed for a full grace period? Couldn't this result in
rcu_barrier() returning prematurely?
Answer: This cannot happen. The reason is that on_each_cpu() has its last
argument, the wait flag, set to "1". This flag is passed through
to smp_call_function() and further to smp_call_function_on_cpu(),
causing this latter to spin until the cross-CPU invocation of
rcu_barrier_func() has completed. This by itself would prevent
a grace period from completing on non-CONFIG_PREEMPT kernels,
since each CPU must undergo a context switch (or other quiescent
state) before the grace period can complete. However, this is
of no use in CONFIG_PREEMPT kernels.
Therefore, on_each_cpu() disables preemption across its call
to smp_call_function() and also across the local call to
rcu_barrier_func(). This prevents the local CPU from context
switching, again preventing grace periods from completing. This
means that all CPUs have executed rcu_barrier_func() before
the first rcu_barrier_callback() can possibly execute, in turn
preventing rcu_barrier_cpu_count from prematurely reaching zero.
Currently, -rt implementations of RCU keep but a single global
queue for RCU callbacks, and thus do not suffer from this
problem. However, when the -rt RCU eventually does have per-CPU
callback queues, things will have to change. One simple change
is to add an rcu_read_lock() before line 8 of rcu_barrier()
and an rcu_read_unlock() after line 8 of this same function. If
you can think of a better change, please let me know!

View File

@@ -57,12 +57,6 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
CONFIG_PREEMPT_RCU case, you might see stall-warning
messages.
You can use the rcutree.kthread_prio kernel boot parameter to
increase the scheduling priority of RCU's kthreads, which can
help avoid this problem. However, please note that doing this
can increase your system's context-switch rate and thus degrade
performance.
o A periodic interrupt whose handler takes longer than the time
interval between successive pairs of interrupts. This can
prevent RCU's kthreads and softirq handlers from running.
@@ -225,13 +219,18 @@ an estimate of the total number of RCU callbacks queued across all CPUs
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
for each CPU:
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
The "last_accelerate:" prints the low-order 16 bits (in hex) of the
jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from
rcu_prepare_for_idle(). "dyntick_enabled: 1" indicates that dyntick-idle
processing is enabled.
rcu_prepare_for_idle(). The "Nonlazy posted:" indicates lazy-callback
status, so that an "l" indicates that all callbacks were lazy at the start
of the last idle period and an "L" indicates that there are currently
no non-lazy callbacks (in both cases, "." is printed otherwise, as
shown above) and "D" indicates that dyntick-idle processing is enabled
("." is printed otherwise, for example, if disabled via the "nohz="
kernel boot parameter).
If the grace period ends just as the stall warning starts printing,
there will be a spurious stall-warning message, which will include

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More