Commit 42baefac63 upstream.
gnttab_end_foreign_access() is used to free a grant reference and
optionally to free the associated page. In case the grant is still in
use by the other side processing is being deferred. This leads to a
problem in case no page to be freed is specified by the caller: the
caller doesn't know that the page is still mapped by the other side
and thus should not be used for other purposes.
The correct way to handle this situation is to take an additional
reference to the granted page in case handling is being deferred and
to drop that reference when the grant reference could be freed
finally.
This requires that there are no users of gnttab_end_foreign_access()
left directly repurposing the granted page after the call, as this
might result in clobbered data or information leaks via the not yet
freed grant reference.
This is part of CVE-2022-23041 / XSA-396.
Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b0576cc9c6 upstream.
Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.
This is part of CVE-2022-23041 / XSA-396.
Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 1dbd11ca75 upstream.
Remove gnttab_query_foreign_access(), as it is unused and unsafe to
use.
All previous use cases assumed a grant would not be in use after
gnttab_query_foreign_access() returned 0. This information is useless
in best case, as it only refers to a situation in the past, which could
have changed already.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit d3b6372c58 upstream.
Using gnttab_query_foreign_access() is unsafe, as it is racy by design.
The use case in the gntalloc driver is not needed at all. While at it
replace the call of gnttab_end_foreign_access_ref() with a call of
gnttab_end_foreign_access(), which is what is really wanted there. In
case the grant wasn't used due to an allocation failure, just free the
grant via gnttab_free_grant_reference().
This is CVE-2022-23039 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6b1775f26a upstream.
Add a new grant table function gnttab_try_end_foreign_access(), which
will remove and free a grant if it is not in use.
Its main use case is to either free a grant if it is no longer in use,
or to take some other action if it is still in use. This other action
can be an error exit, or (e.g. in the case of blkfront persistent grant
feature) some special handling.
This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3777ea7bac upstream.
Letting xenbus_grant_ring() tear down grants in the error case is
problematic, as the other side could already have used these grants.
Calling gnttab_end_foreign_access_ref() without checking success is
resulting in an unclear situation for any caller of xenbus_grant_ring()
as in the error case the memory pages of the ring page might be
partially mapped. Freeing them would risk unwanted foreign access to
them, while not freeing them would leak memory.
In order to remove the need to undo any gnttab_grant_foreign_access()
calls, use gnttab_alloc_grant_references() to make sure no further
error can occur in the loop granting access to the ring pages.
It should be noted that this way of handling removes leaking of
grant entries in the error case, too.
This is CVE-2022-23040 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce2f46f353 upstream.
While working with Xen's libxenvchan library I have faced an issue with
unmap notifications sent in wrong order if both UNMAP_NOTIFY_SEND_EVENT
and UNMAP_NOTIFY_CLEAR_BYTE were requested: first we send an event channel
notification and then clear the notification byte which renders in the below
inconsistency (cli_live is the byte which was requested to be cleared on unmap):
[ 444.514243] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6
libxenvchan_is_open cli_live 1
[ 444.515239] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14
Thus it is not possible to reliably implement the checks like
- wait for the notification (UNMAP_NOTIFY_SEND_EVENT)
- check the variable (UNMAP_NOTIFY_CLEAR_BYTE)
because it is possible that the variable gets checked before it is cleared
by the kernel.
To fix that we need to re-order the notifications, so the variable is first
gets cleared and then the event channel notification is sent.
With this fix I can see the correct order of execution:
[ 54.522611] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14
[ 54.537966] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6
libxenvchan_is_open cli_live 0
Cc: stable@vger.kernel.org
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211210092817.580718-1-andr2000@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe415186b4 upstream.
The Xen console driver is still vulnerable for an attack via excessive
number of events sent by the backend. Fix that by using a lateeoi event
channel.
For the normal domU initial console this requires the introduction of
bind_evtchn_to_irq_lateeoi() as there is no xenbus device available
at the time the event channel is bound to the irq.
As the decision whether an interrupt was spurious or not requires to
test for bytes having been read from the backend, move sending the
event into the if statement, as sending an event without having found
any bytes to be read is making no sense at all.
This is part of XSA-391
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36e8f60f08 upstream.
If the xenstore page hasn't been allocated properly, reading the value
of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return
error. Instead, it will succeed and return zero. Instead of attempting
to xen_remap a bad guest physical address, detect this condition and
return early.
Note that although a guest physical address of zero for
HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice
and zero has never been validly used in that capacity.
Also recognize all bits set as an invalid value.
For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns
above ULONG_MAX should never be passed by the Xen tools to HVM guests
anyway, so check for this condition and return early.
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20211123210748.1910236-1-sstabellini@kernel.org
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08f6c2b09e upstream.
In case of errors in xenbus_init (e.g. missing xen_store_gfn parameter),
we goto out_error but we forget to reset xen_store_domain_type to
XS_UNKNOWN. As a consequence xenbus_probe_initcall and other initcalls
will still try to initialize xenstore resulting into a crash at boot.
[ 2.479830] Call trace:
[ 2.482314] xb_init_comms+0x18/0x150
[ 2.486354] xs_init+0x34/0x138
[ 2.489786] xenbus_probe+0x4c/0x70
[ 2.498432] xenbus_probe_initcall+0x2c/0x7c
[ 2.503944] do_one_initcall+0x54/0x1b8
[ 2.507358] kernel_init_freeable+0x1ac/0x210
[ 2.511617] kernel_init+0x28/0x130
[ 2.516112] ret_from_fork+0x10/0x20
Cc: <Stable@vger.kernel.org>
Cc: jbeulich@suse.com
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Link: https://lore.kernel.org/r/20211115222719.2558207-1-sstabellini@kernel.org
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40fdea0284 upstream.
When running as PVH or HVM guest with actual memory < max memory the
hypervisor is using "populate on demand" in order to allow the guest
to balloon down from its maximum memory size. For this to work
correctly the guest must not touch more memory pages than its target
memory size as otherwise the PoD cache will be exhausted and the guest
is crashed as a result of that.
In extreme cases ballooning down might not be finished today before
the init process is started, which can consume lots of memory.
In order to avoid random boot crashes in such cases, add a late init
call to wait for ballooning down having finished for PVH/HVM guests.
Warn on console if initial ballooning fails, panic() after stalling
for more than 3 minutes per default. Add a module parameter for
changing this timeout.
[boris: replaced pr_info() with pr_notice()]
Cc: <stable@vger.kernel.org>
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211102091944.17487-1-jgross@suse.com
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull xen fixes from Juergen Gross:
- fix two minor issues in the Xen privcmd driver plus a cleanup patch
for that driver
- fix multiple issues related to running as PVH guest and some related
earlyprintk fixes for other Xen guest types
- fix an issue introduced in 5.15 the Xen balloon driver
* tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: fix cancelled balloon action
xen/x86: adjust data placement
x86/PVH: adjust function/data placement
xen/x86: hook up xen_banner() also for PVH
xen/x86: generalize preferred console model from PV to PVH Dom0
xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
xen/x86: allow "earlyprintk=xen" to work for PV Dom0
xen/x86: make "earlyprintk=xen" work better for PVH Dom0
xen/x86: allow PVH Dom0 without XEN_PV=y
xen/x86: prevent PVH type from getting clobbered
xen/privcmd: drop "pages" parameter from xen_remap_pfn()
xen/privcmd: fix error handling in mmap-resource processing
xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages
Decouple XEN_DOM0 from XEN_PV, converting some existing uses of XEN_DOM0
to a new XEN_PV_DOM0. (I'm not convinced all are really / should really
be PV-specific, but for starters I've tried to be conservative.)
For PVH Dom0 the hypervisor populates MADT with only x2APIC entries, so
without x2APIC support enabled in the kernel things aren't going to work
very well. (As opposed, DomU-s would only ever see LAPIC entries in MADT
as of now.) Note that this then requires PVH Dom0 to be 64-bit, as
X86_X2APIC depends on X86_64.
In the course of this xen_running_on_version_or_later() needs to be
available more broadly. Move it from a PV-specific to a generic file,
considering that what it does isn't really PV-specific at all anyway.
Note that xen/interface/version.h cannot be included on its own; in
enlighten.c, which uses SCHEDOP_* anyway, include xen/interface/sched.h
first to resolve the apparently sole missing type (xen_ulong_t).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/983bb72f-53df-b6af-14bd-5e088bd06a08@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
The function doesn't use it and all of its callers say in a comment that
their respective arguments are to be non-NULL only in auto-translated
mode. Since xen_remap_domain_mfn_array() isn't supposed to be used by
non-PV, drop the parameter there as well. It was bogusly passed as non-
NULL (PRIV_VMA_LOCKED) by its only caller anyway. For
xen_remap_domain_gfn_range(), otoh, it's not clear at all why this
wouldn't want / might not need to gain auto-translated support down the
road, so the parameter is retained there despite now remaining unused
(and the only caller passing NULL); correct a respective comment as
well.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/036ad8a2-46f9-ac3d-6219-bdc93ab9e10b@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull xen fixes from Juergen Gross:
"Some minor cleanups and fixes of some theoretical bugs, as well as a
fix of a bug introduced in 5.15-rc1"
* tag 'for-linus-5.15b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/x86: fix PV trap handling on secondary processors
xen/balloon: fix balloon kthread freezing
swiotlb-xen: this is PV-only on x86
xen/pci-swiotlb: reduce visibility of symbols
PCI: only build xen-pcifront in PV-enabled environments
swiotlb-xen: ensure to issue well-formed XENMEM_exchange requests
Xen/gntdev: don't ignore kernel unmapping error
xen/x86: drop redundant zeroing from cpu_initialize_context()
Commit 8480ed9c2b ("xen/balloon: use a kernel thread instead a
workqueue") switched the Xen balloon driver to use a kernel thread.
Unfortunately the patch omitted to call try_to_freeze() or to use
wait_event_freezable_timeout(), causing a system suspend to fail.
Fixes: 8480ed9c2b ("xen/balloon: use a kernel thread instead a workqueue")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210920100345.21939-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
The code is unreachable for HVM or PVH, and it also makes little sense
in auto-translated environments. On Arm, with
xen_{create,destroy}_contiguous_region() both being stubs, I have a hard
time seeing what good the Xen specific variant does - the generic one
ought to be fine for all purposes there. Still Arm code explicitly
references symbols here, so the code will continue to be included there.
Instead of making PCI_XEN's "select" conditional, simply drop it -
SWIOTLB_XEN will be available unconditionally in the PV case anyway, and
is - as explained above - dead code in non-PV environments.
This in turn allows dropping the stubs for
xen_{create,destroy}_contiguous_region(), the former of which was broken
anyway - it failed to set the DMA handle output.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/5947b8ae-fdc7-225c-4838-84712265fc1e@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
While the hypervisor hasn't been enforcing this, we would still better
avoid issuing requests with GFNs not aligned to the requested order.
Instead of altering the value also in the call to panic(), drop it
there for being static and hence easy to determine without being part
of the panic message.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/7b3998e3-1233-4e5a-89ec-d740e77eb166@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull xen fixes from Juergen Gross:
- The first hunk of a Xen swiotlb fixup series fixing multiple minor
issues and doing some small cleanups
- Some further Xen related fixes avoiding WARN() splats when running as
Xen guests or dom0
- A Kconfig fix allowing the pvcalls frontend to be built as a module
* tag 'for-linus-5.15b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
swiotlb-xen: drop DEFAULT_NSLABS
swiotlb-xen: arrange to have buffer info logged
swiotlb-xen: drop leftover __ref
swiotlb-xen: limit init retries
swiotlb-xen: suppress certain init retries
swiotlb-xen: maintain slab count properly
swiotlb-xen: fix late init retry
swiotlb-xen: avoid double free
xen/pvcalls: backend can be a module
xen: fix usage of pmd_populate in mremap for pv guests
xen: reset legacy rtc flag for PV domU
PM: base: power: don't try to use non-existing RTC for storing data
xen/balloon: use a kernel thread instead a workqueue
Generic swiotlb code makes sure to keep the slab count a multiple of the
number of slabs per segment. Yet even without checking whether any such
assumption is made elsewhere, it is easy to see that xen_swiotlb_fixup()
might alter unrelated memory when calling xen_create_contiguous_region()
for the last segment, when that's not a full one - the function acts on
full order-N regions, not individual pages.
Align the slab count suitably when halving it for a retry. Add a build
time check and a runtime one. Replace the no longer useful local
variable "slabs" by an "order" one calculated just once, outside of the
loop. Re-use "order" for calculating "dma_bits", and change the type of
the latter as well as the one of "i" while touching this anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/dc054cb0-bec4-4db0-fc06-c9fc957b6e66@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Today the Xen ballooning is done via delayed work in a workqueue. This
might result in workqueue hangups being reported in case of large
amounts of memory are being ballooned in one go (here 16GB):
BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 stuck for 64s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
in-flight: 229:balloon_process
pending: cache_reap
workqueue events_freezable_power_: flags=0x84
pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: disk_events_workfn
workqueue mm_percpu_wq: flags=0x8
pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_update
pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=64s workers=3 idle: 2222 43
This can easily be avoided by using a dedicated kernel thread for doing
the ballooning work.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210827123206.15429-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull Kbuild updates from Masahiro Yamada:
- Add -s option (strict mode) to merge_config.sh to make it fail when
any symbol is redefined.
- Show a warning if a different compiler is used for building external
modules.
- Infer --target from ARCH for CC=clang to let you cross-compile the
kernel without CROSS_COMPILE.
- Make the integrated assembler default (LLVM_IAS=1) for CC=clang.
- Add <linux/stdarg.h> to the kernel source instead of borrowing
<stdarg.h> from the compiler.
- Add Nick Desaulniers as a Kbuild reviewer.
- Drop stale cc-option tests.
- Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG
to handle symbols in inline assembly.
- Show a warning if 'FORCE' is missing for if_changed rules.
- Various cleanups
* tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
kbuild: redo fake deps at include/ksym/*.h
kbuild: clean up objtool_args slightly
modpost: get the *.mod file path more simply
checkkconfigsymbols.py: Fix the '--ignore' option
kbuild: merge vmlinux_link() between ARCH=um and other architectures
kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
kbuild: remove stale *.symversions
kbuild: remove unused quiet_cmd_update_lto_symversions
gen_compile_commands: extract compiler command from a series of commands
x86: remove cc-option-yn test for -mtune=
arc: replace cc-option-yn uses with cc-option
s390: replace cc-option-yn uses with cc-option
ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild
sparc: move the install rule to arch/sparc/Makefile
security: remove unneeded subdir-$(CONFIG_...)
kbuild: sh: remove unused install script
kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
kbuild: Switch to 'f' variants of integrated assembler flag
kbuild: Shuffle blank line to improve comment meaning
...
Pull swiotlb updates from Konrad Rzeszutek Wilk:
"A new feature called restricted DMA pools. It allows SWIOTLB to
utilize per-device (or per-platform) allocated memory pools instead of
using the global one.
The first big user of this is ARM Confidential Computing where the
memory for DMA operations can be set per platform"
* 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: (23 commits)
swiotlb: use depends on for DMA_RESTRICTED_POOL
of: restricted dma: Don't fail device probe on rmem init failure
of: Move of_dma_set_restricted_buffer() into device.c
powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()
s390/pv: fix the forcing of the swiotlb
swiotlb: Free tbl memory in swiotlb_exit()
swiotlb: Emit diagnostic in swiotlb_exit()
swiotlb: Convert io_default_tlb_mem to static allocation
of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS
swiotlb: add overflow checks to swiotlb_bounce
swiotlb: fix implicit debugfs declarations
of: Add plumbing for restricted DMA pool
dt-bindings: of: Add restricted DMA pool
swiotlb: Add restricted DMA pool initialization
swiotlb: Add restricted DMA alloc/free support
swiotlb: Refactor swiotlb_tbl_unmap_single
swiotlb: Move alloc_size to swiotlb_find_slots
swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
swiotlb: Update is_swiotlb_active to add a struct device argument
swiotlb: Update is_swiotlb_buffer to add a struct device argument
...
Pull xen updates from Juergen Gross:
- some small cleanups
- a fix for a bug when running as Xen PV guest which could result in
not all memory being transferred in case of a migration of the guest
- a small series for getting rid of code for supporting very old Xen
hypervisor versions nobody should be using since many years now
- a series for hardening the Xen block frontend driver
- a fix for Xen PV boot code issuing warning messages due to a stray
preempt_disable() on the non-boot processors
* tag 'for-linus-5.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: remove stray preempt_disable() from PV AP startup code
xen/pcifront: Removed unnecessary __ref annotation
x86: xen: platform-pci-unplug: use pr_err() and pr_warn() instead of raw printk()
drivers/xen/xenbus/xenbus_client.c: fix bugon.cocci warnings
xen/blkfront: don't trust the backend response data blindly
xen/blkfront: don't take local copy of a request from the ring page
xen/blkfront: read response from backend only once
xen: assume XENFEAT_gnttab_map_avail_bits being set for pv guests
xen: assume XENFEAT_mmu_pt_update_preserve_ad being set for pv guests
xen: check required Xen features
xen: fix setting of max_pfn in shared_info
Pull dma-mapping updates from Christoph Hellwig:
- fix debugfs initialization order (Anthony Iliopoulos)
- use memory_intersects() directly (Kefeng Wang)
- allow to return specific errors from ->map_sg (Logan Gunthorpe,
Martin Oliveira)
- turn the dma_map_sg return value into an unsigned int (me)
- provide a common global coherent pool іmplementation (me)
* tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping: (31 commits)
hexagon: use the generic global coherent pool
dma-mapping: make the global coherent pool conditional
dma-mapping: add a dma_init_global_coherent helper
dma-mapping: simplify dma_init_coherent_memory
dma-mapping: allow using the global coherent pool for !ARM
ARM/nommu: use the generic dma-direct code for non-coherent devices
dma-direct: add support for dma_coherent_default_memory
dma-mapping: return an unsigned int from dma_map_sg{,_attrs}
dma-mapping: disallow .map_sg operations from returning zero on error
dma-mapping: return error code from dma_dummy_map_sg()
x86/amd_gart: don't set failed sg dma_address to DMA_MAPPING_ERROR
x86/amd_gart: return error code from gart_map_sg()
xen: swiotlb: return error code from xen_swiotlb_map_sg()
parisc: return error code from .map_sg() ops
sparc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
sparc/iommu: return error codes from .map_sg() ops
s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR
s390/pci: return error code from s390_dma_map_sg()
powerpc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
powerpc/iommu: return error code from .map_sg() ops
...
Pull driver core updates from Greg KH:
"Here is the big set of driver core patches for 5.15-rc1.
These do change a number of different things across different
subsystems, and because of that, there were 2 stable tags created that
might have already come into your tree from different pulls that did
the following
- changed the bus remove callback to return void
- sysfs iomem_get_mapping rework
Other than those two things, there's only a few small things in here:
- kernfs performance improvements for huge numbers of sysfs users at
once
- tiny api cleanups
- other minor changes
All of these have been in linux-next for a while with no reported
problems, other than the before-mentioned merge issue"
* tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits)
MAINTAINERS: Add dri-devel for component.[hc]
driver core: platform: Remove platform_device_add_properties()
ARM: tegra: paz00: Handle device properties with software node API
bitmap: extend comment to bitmap_print_bitmask/list_to_buf
drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI
topology: use bin_attribute to break the size limitation of cpumap ABI
lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases
cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list
sysfs: Rename struct bin_attribute member to f_mapping
sysfs: Invoke iomem_get_mapping() from the sysfs open callback
debugfs: Return error during {full/open}_proxy_open() on rmmod
zorro: Drop useless (and hardly used) .driver member in struct zorro_dev
zorro: Simplify remove callback
sh: superhyway: Simplify check in remove callback
nubus: Simplify check in remove callback
nubus: Make struct nubus_driver::remove return void
kernfs: dont call d_splice_alias() under kernfs node lock
kernfs: use i_lock to protect concurrent inode updates
kernfs: switch kernfs to use an rwsem
kernfs: use VFS negative dentry caching
...
Delete/fixup few includes in anticipation of global -isystem compile
option removal.
Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition
of uintptr_t error (one definition comes from <stddef.h>, another from
<linux/types.h>).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull xen fixes from Juergen Gross:
"A small cleanup patch and a fix of a rare race in the Xen evtchn
driver"
* tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: Fix race in set_evtchn_to_irq
xen/events: remove redundant initialization of variable irq
There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
mapping are lazily allocated in this function. The check whether the row
is already present and the row initialization is not synchronized. Two
threads can at the same time allocate a new row for evtchn_to_irq and
add the irq mapping to the their newly allocated row. One thread will
overwrite what the other has set for evtchn_to_irq[row] and therefore
the irq mapping is lost. This will trigger a BUG_ON later in
bind_evtchn_to_cpu:
INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802
INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002)
INFO: nvme nvme77: 1/0/0 default/read/poll queues
CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
WARN: invalid opcode: 0000 [#1] SMP NOPTI
WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
WARN: Call Trace:
WARN: set_affinity_irq+0x121/0x150
WARN: irq_do_set_affinity+0x37/0xe0
WARN: irq_setup_affinity+0xf6/0x170
WARN: irq_startup+0x64/0xe0
WARN: __setup_irq+0x69e/0x740
WARN: ? request_threaded_irq+0xad/0x160
WARN: request_threaded_irq+0xf5/0x160
WARN: ? nvme_timeout+0x2f0/0x2f0 [nvme]
WARN: pci_request_irq+0xa9/0xf0
WARN: ? pci_alloc_irq_vectors_affinity+0xbb/0x130
WARN: queue_request_irq+0x4c/0x70 [nvme]
WARN: nvme_reset_work+0x82d/0x1550 [nvme]
WARN: ? check_preempt_wakeup+0x14f/0x230
WARN: ? check_preempt_curr+0x29/0x80
WARN: ? nvme_irq_check+0x30/0x30 [nvme]
WARN: process_one_work+0x18e/0x3c0
WARN: worker_thread+0x30/0x3a0
WARN: ? process_one_work+0x3c0/0x3c0
WARN: kthread+0x113/0x130
WARN: ? kthread_park+0x90/0x90
WARN: ret_from_fork+0x3a/0x50
This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
will be set only once. The row is now cleared before writing it to
evtchn_to_irq in order to not create a race once the row is visible for
other threads.
While at it, do not require the page to be zeroed, because it will be
overwritten with -1's in clear_evtchn_to_irq_row anyway.
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Fixes: d0b075ffee ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated")
Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The .map_sg() op now expects an error code instead of zero on failure.
xen_swiotlb_map_sg() may only fail if xen_swiotlb_map_page() fails, but
xen_swiotlb_map_page() only supports returning errors as
DMA_MAPPING_ERROR. So coalesce all errors into EIO per the documentation
for dma_map_sgtable().
Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Since commit 69031f5008 ("swiotlb: Set dev->dma_io_tlb_mem to the
swiotlb pool used"), 'struct device' may hold a copy of the global
'io_default_tlb_mem' pointer if the device is using swiotlb for DMA. A
subsequent call to swiotlb_exit() will therefore leave dangling pointers
behind in these device structures, resulting in KASAN splats such as:
| BUG: KASAN: use-after-free in __iommu_dma_unmap_swiotlb+0x64/0xb0
| Read of size 8 at addr ffff8881d7830000 by task swapper/0/0
|
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3-debug #1
| Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020
| Call Trace:
| <IRQ>
| dump_stack+0x9c/0xcf
| print_address_description.constprop.0+0x18/0x130
| kasan_report.cold+0x7f/0x111
| __iommu_dma_unmap_swiotlb+0x64/0xb0
| nvme_pci_complete_rq+0x73/0x130
| blk_complete_reqs+0x6f/0x80
| __do_softirq+0xfc/0x3be
Convert 'io_default_tlb_mem' to a static structure, so that the
per-device pointers remain valid after swiotlb_exit() has been invoked.
All users are updated to reference the static structure directly, using
the 'nslabs' field to determine whether swiotlb has been initialised.
The 'slots' array is still allocated dynamically and referenced via a
pointer rather than a flexible array member.
Cc: Claire Chang <tientzu@chromium.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fixes: 69031f5008 ("swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Claire Chang <tientzu@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.
Signed-off-by: Claire Chang <tientzu@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Includes Will's fix]