linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-22 17:52:09 +00:00

Author	SHA1	Message	Date
Raju Rangoju	f97fc7ef41	amd-xgbe: Yellow carp devices do not need rrc Link stability issues are noticed on Yellow carp platforms when Receiver Reset Cycle is issued. Since the CDR workaround is disabled on these platforms, the Receiver Reset Cycle is not needed. So, avoid issuing rrc on Yellow carp platforms. Fixes: `dbb6c58b5a` ("net: amd-xgbe: Add Support for Yellow Carp Ethernet device") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-21 22:40:21 -07:00
Alexei Starovoitov	bed54aeb6a	Merge branch 'Wait for busy refill_work when destroying bpf memory allocator' Hou Tao says: ==================== From: Hou Tao <houtao1@huawei.com> Hi, The patchset aims to fix one problem of bpf memory allocator destruction when there is PREEMPT_RT kernel or kernel with arch_irq_work_has_interrupt() being false (e.g. 1-cpu arm32 host or mips). The root cause is that there may be busy refill_work when the allocator is destroying and it may incur oops or other problems as shown in patch #1. Patch #1 fixes the problem by waiting for the completion of irq work during destroying and patch #2 is just a clean-up patch based on patch #1. Please see individual patches for more details. Comments are always welcome. Change Log: v2: * patch 1: fix typos and add notes about the overhead of irq_work_sync() * patch 1 & 2: add Acked-by tags from sdf@google.com v1: https://lore.kernel.org/bpf/20221019115539.983394-1-houtao@huaweicloud.com/T/#t ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-10-21 19:17:38 -07:00
Hou Tao	fa4447cb73	bpf: Use __llist_del_all() whenever possbile during memory draining Except for waiting_for_gp list, there are no concurrent operations on free_by_rcu, free_llist and free_llist_extra lists, so use __llist_del_all() instead of llist_del_all(). waiting_for_gp list can be deleted by RCU callback concurrently, so still use llist_del_all(). Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20221021114913.60508-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-10-21 19:17:38 -07:00
Hou Tao	3d05818707	bpf: Wait for busy refill_work when destroying bpf memory allocator A busy irq work is an unfinished irq work and it can be either in the pending state or in the running state. When destroying bpf memory allocator, refill_work may be busy for PREEMPT_RT kernel in which irq work is invoked in a per-CPU RT-kthread. It is also possible for kernel with arch_irq_work_has_interrupt() being false (e.g. 1-cpu arm32 host or mips) and irq work is inovked in timer interrupt. The busy refill_work leads to various issues. The obvious one is that there will be concurrent operations on free_by_rcu and free_list between irq work and memory draining. Another one is call_rcu_in_progress will not be reliable for the checking of pending RCU callback because do_call_rcu() may have not been invoked by irq work yet. The other is there will be use-after-free if irq work is freed before the callback of irq work is invoked as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 12ab94067 P4D 12ab94067 PUD 1796b4067 PMD 0 Oops: 0010 [#1] PREEMPT_RT SMP CPU: 5 PID: 64 Comm: irq_work/5 Not tainted 6.0.0-rt11+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffadc080293e78 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffffcdc07fb6a388 RCX: ffffa05000a2e000 RDX: ffffa05000a2e000 RSI: ffffffff96cc9827 RDI: ffffcdc07fb6a388 ...... Call Trace: <TASK> irq_work_single+0x24/0x60 irq_work_run_list+0x24/0x30 run_irq_workd+0x23/0x30 smpboot_thread_fn+0x203/0x300 kthread+0x126/0x150 ret_from_fork+0x1f/0x30 </TASK> Considering the ease of concurrency handling, no overhead for irq_work_sync() under non-PREEMPT_RT kernel and has-irq-work-interrupt kernel and the short wait time used for irq_work_sync() under PREEMPT_RT (When running two test_maps on PREEMPT_RT kernel and 72-cpus host, the max wait time is about 8ms and the 99th percentile is 10us), just using irq_work_sync() to wait for busy refill_work to complete before memory draining and memory freeing. Fixes: `7c8199e24f` ("bpf: Introduce any context BPF specific memory allocator.") Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20221021114913.60508-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-10-21 19:17:38 -07:00
Linus Torvalds	4da34b7d17	Merge tag 'thermal-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "This fixes the control CPU selection in the intel_powerclamp thermal driver" * tag 'thermal-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel_powerclamp: Use first online CPU as control_cpu	2022-10-21 18:26:00 -07:00
Linus Torvalds	20df096147	Merge tag 'pm-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix some issues and clean up code in ARM cpufreq drivers. Specifics: - Fix module loading in the Tegra124 cpufreq driver (Jon Hunter) - Fix memory leak and update to read-only region in the qcom cpufreq driver (Fabien Parent) - Miscellaneous minor cleanups to cpufreq drivers (Fabien Parent, Yang Yingliang)" * tag 'pm-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: sun50i: Switch to use dev_err_probe() helper cpufreq: qcom-nvmem: Switch to use dev_err_probe() helper cpufreq: imx6q: Switch to use dev_err_probe() helper cpufreq: dt: Switch to use dev_err_probe() helper cpufreq: qcom: remove unused parameter in function definition cpufreq: qcom: fix writes in read-only memory region cpufreq: qcom: fix memory leak in error path cpufreq: tegra194: Fix module loading	2022-10-21 18:19:42 -07:00
Linus Torvalds	9d6e681d33	Merge tag 'acpi-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix issues introduced during this merge window (ACPI/PCI, device enumeration and documentation) and some other ones found recently. Specifics: - Add missing device reference counting to acpi_get_pci_dev() after changing it recently (Rafael Wysocki) - Fix resource list walk in acpi_dma_get_range() (Robin Murphy) - Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ override warning message (Jiri Slaby) - Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra) - Fix multiple error records handling in one of the ACPI extlog driver code paths (Tony Luck) - Prune DSDT override documentation from index after dropping it (Bagas Sanjaya)" * tag 'acpi-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: scan: Fix DMA range assignment ACPI: PCI: Fix device reference counting in acpi_get_pci_dev() ACPI: resource: note more about IRQ override ACPI: resource: do IRQ override on LENOVO IdeaPad ACPI: extlog: Handle multiple records ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init() Documentation: ACPI: Prune DSDT override documentation from index	2022-10-21 18:08:30 -07:00
Linus Torvalds	ec4cf5dbb1	Merge tag 'efi-fixes-for-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - fixes for the EFI variable store refactor that landed in v6.0 - fixes for issues that were introduced during the merge window - back out some changes related to EFI zboot signing - we'll add a better solution for this during the next cycle * tag 'efi-fixes-for-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0 efi: libstub: Fix incorrect payload size in zboot header efi: libstub: Give efi_main() asmlinkage qualification efi: efivars: Fix variable writes without query_variable_store() efi: ssdt: Don't free memory if ACPI table was loaded successfully efi: libstub: Remove zboot signing from build options	2022-10-21 18:02:36 -07:00
Linus Torvalds	e97eace635	Merge tag 'iommu-fixes-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: "Intel VT-d fixes: - Fix a lockdep splat issue in intel_iommu_init() - Allow NVS regions to pass RMRR check - Domain cleanup in error path" * tag 'iommu-fixes-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Clean up si_domain in the init_dmars() error path iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() iommu/vt-d: Use rcu_lock in get_resv_regions iommu: Add gfp parameter to iommu_alloc_resv_region	2022-10-21 17:47:39 -07:00
Linus Torvalds	334fe5d3a9	Merge tag 'for-linus-2022102101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - a 12 year old bug fix for the Apple Magic Trackpad v1 (José Expósito) - a fix for a potential crash on removal of the Playstation controllers (Roderick Colenbrander) - a few new device IDs and device-specific quirks, most notably support of the new Playstation DualSense Edge controller * tag 'for-linus-2022102101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: lenovo: Make array tp10ubkbd_led static const HID: saitek: add madcatz variant of MMO7 mouse device ID HID: playstation: support updated DualSense rumble mode. HID: playstation: add initial DualSense Edge controller support HID: playstation: stop DualSense output work on remove. HID: magicmouse: Do not set BTN_MOUSE on double report	2022-10-21 17:41:57 -07:00
Linus Torvalds	bd8e963412	Merge tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: - memory leak fixes - fixes for directory leases, including an important one which fixes a problem noticed by git functional tests - fixes relating to missing free_xid calls (helpful for tracing/debugging of entry/exit into cifs.ko) - a multichannel fix - a small cleanup fix (use of list_move instead of list_del/list_add) * tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module number cifs: fix memory leaks in session setup cifs: drop the lease for cached directories on rmdir or rename smb3: interface count displayed incorrectly cifs: Fix memory leak when build ntlmssp negotiate blob failed cifs: set rc to -ENOENT if we can not get a dentry for the cached dir cifs: use LIST_HEAD() and list_move() to simplify code cifs: Fix xid leak in cifs_get_file_info_unix() cifs: Fix xid leak in cifs_ses_add_channel() cifs: Fix xid leak in cifs_flock() cifs: Fix xid leak in cifs_copy_file_range() cifs: Fix xid leak in cifs_create()	2022-10-21 16:01:53 -07:00
Linus Torvalds	022c028f4c	Merge tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Fixes for patches merged in v6.1" * tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: ensure we always call fh_verify_error tracepoint NFSD: unregister shrinker when nfsd_init_net() fails	2022-10-21 15:51:30 -07:00
Chang S. Bae	471f0aa7fa	x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly When an extended state component is not present in fpstate, but in init state, the function copies from init_fpstate via copy_feature(). But, dynamic states are not present in init_fpstate because of all-zeros init states. Then retrieving them from init_fpstate will explode like this: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... RIP: 0010:memcpy_erms+0x6/0x10 ? __copy_xstate_to_uabi_buf+0x381/0x870 fpu_copy_guest_fpstate_to_uabi+0x28/0x80 kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm] ? __this_cpu_preempt_check+0x13/0x20 ? vmx_vcpu_put+0x2e/0x260 [kvm_intel] kvm_vcpu_ioctl+0xea/0x6b0 [kvm] ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm] ? __fget_light+0xd4/0x130 __x64_sys_ioctl+0xe3/0x910 ? debug_smp_processor_id+0x17/0x20 ? fpregs_assert_state_consistent+0x27/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Adjust the 'mask' to zero out the userspace buffer for the features that are not available both from fpstate and from init_fpstate. The dynamic features depend on the compacted XSAVE format. Ensure it is enabled before reading XCOMP_BV in init_fpstate. Fixes: `2308ee57d9` ("x86/fpu/amx: Enable the AMX feature in 64-bit mode") Reported-by: Yuan Yao <yuan.yao@intel.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/ Link: https://lkml.kernel.org/r/20221021185844.13472-1-chang.seok.bae@intel.com	2022-10-21 15:22:09 -07:00
Linus Torvalds	ed5377958c	Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small changes, one in the lpfc driver and the other in the core. The core change is an additional footgun guard which prevents users from writing the wrong state to sysfs and causing a hang" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: lpfc: Fix memory leak in lpfc_create_port() scsi: core: Restrict legal sdev_state transitions via sysfs	2022-10-21 15:19:43 -07:00
Linus Torvalds	d4b7332eef	Merge tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - fix nvme-hwmon for DMA non-cohehrent architectures (Serge Semin) - add a nvme-hwmong maintainer (Christoph Hellwig) - fix error pointer dereference in error handling (Dan Carpenter) - fix invalid memory reference in nvmet_subsys_attr_qid_max_show (Daniel Wagner) - don't limit the DMA segment size in nvme-apple (Russell King) - fix workqueue MEM_RECLAIM flushing dependency (Sagi Grimberg) - disable write zeroes on various Kingston SSDs (Xander Li) - fix a memory leak with block device tracing (Ye) - flexible-array fix for ublk (Yushan) - document the ublk recovery feature from this merge window (ZiyangZhang) - remove dead bfq variable in struct (Yuwei) - error handling rq clearing fix (Yu) - add an IRQ safety check for the cached bio freeing (Pavel) - drbd bio cloning fix (Christoph) * tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux: blktrace: remove unnessary stop block trace in 'blk_trace_shutdown' blktrace: fix possible memleak in '__blk_trace_remove' blktrace: introduce 'blk_trace_{start,stop}' helper bio: safeguard REQ_ALLOC_CACHE bio put block, bfq: remove unused variable for bfq_queue drbd: only clone bio if we have a backing device ublk_drv: use flexible-array member instead of zero-length array nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show nvmet: fix workqueue MEM_RECLAIM flushing dependency nvme-hwmon: kmalloc the NVME SMART log buffer nvme-hwmon: consistently ignore errors from nvme_hwmon_init nvme: add Guenther as nvme-hwmon maintainer nvme-apple: don't limit DMA segement size nvme-pci: disable write zeroes on various Kingston SSD nvme: fix error pointer dereference in error handling Documentation: document ublk user recovery feature blk-mq: fix null pointer dereference in blk_mq_clear_rq_mapping()	2022-10-21 15:14:14 -07:00
Linus Torvalds	294e73ffb0	Merge tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux Pull io_uring fixes from Jens Axboe: - Fix a potential memory leak in the error handling path of io-wq setup (Rafael) - Kill an errant debug statement that got added in this release (me) - Fix an oops with an invalid direct descriptor with IORING_OP_MSG_RING (Harshit) - Remove unneeded FFS_SCM flagging (Pavel) - Remove polling off the exit path (Pavel) - Move out direct descriptor debug check to the cleanup path (Pavel) - Use the proper helper rather than open-coding cached request get (Pavel) * tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux: io-wq: Fix memory leak in worker creation io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd() io_uring/rw: remove leftover debug statement io_uring: don't iopoll from io_ring_ctx_wait_and_kill() io_uring: reuse io_alloc_req() io_uring: kill hot path fixed file bitmap debug checks io_uring: remove FFS_SCM	2022-10-21 15:09:10 -07:00
Linus Torvalds	1d61754caa	Merge tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Just two fixes for the new 'virtio with grants' feature" * tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/virtio: Convert PAGE_SIZE/PAGE_SHIFT/PFN_UP to Xen counterparts xen/virtio: Handle cases when page offset > PAGE_SIZE properly	2022-10-21 14:43:09 -07:00
Linus Torvalds	0de0b76837	Merge tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "A small SELinux fix for a GFP_KERNEL allocation while a spinlock is held. The patch, while still fairly small, is a bit larger than one might expect from a simple s/GFP_KERNEL/GFP_ATOMIC/ conversion because we added support for the function to be called with different gfp flags depending on the context, preserving GFP_KERNEL for those cases that can safely sleep" * tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()	2022-10-21 14:33:36 -07:00
Linus Torvalds	440b7895c9	Merge tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morron: "Seventeen hotfixes, mainly for MM. Five are cc:stable and the remainder address post-6.0 issues" * tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nouveau: fix migrate_to_ram() for faulting page mm/huge_memory: do not clobber swp_entry_t during THP split hugetlb: fix memory leak associated with vma_lock structure mm/page_alloc: reduce potential fragmentation in make_alloc_exact() mm: /proc/pid/smaps_rollup: fix maple tree search mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages mm/mmap: fix MAP_FIXED address return on VMA merge mm/mmap.c: __vma_adjust(): suppress uninitialized var warning mm/mmap: undo ->mmap() when mas_preallocate() fails init: Kconfig: fix spelling mistake "satify" -> "satisfy" ocfs2: clear dinode links count in case of error ocfs2: fix BUG when iput after ocfs2_mknod fails gcov: support GCC 12.1 and newer compilers zsmalloc: zs_destroy_pool: add size_class NULL check mm/mempolicy: fix mbind_range() arguments to vma_merge() mailmap: update email for Qais Yousef mailmap: update Dan Carpenter's email address	2022-10-21 12:33:03 -07:00
Linus Torvalds	ce3d90a877	Merge tag 'trace-tools-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tool update from Steven Rostedt: - Make dot2c generate monitor's automata definition static * tag 'trace-tools-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv/dot2c: Make automaton definition static	2022-10-21 12:29:52 -07:00
Linus Torvalds	4f1e0c18bc	Merge tag 'linux-watchdog-6.1-rc2' of git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - Add tracing events for the most common watchdog events * tag 'linux-watchdog-6.1-rc2' of git://www.linux-watchdog.org/linux-watchdog: watchdog: Add tracing events for the most usual watchdog events	2022-10-21 12:25:39 -07:00
Rafael J. Wysocki	3f8deab61e	Merge branches 'acpi-scan', 'acpi-resource', 'acpi-apei', 'acpi-extlog' and 'acpi-docs' Merge assorted ACPI fixes for 6.1-rc2: - Fix resource list walk in acpi_dma_get_range() (Robin Murphy). - Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ override warning message (Jiri Slaby). - Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra). - Fix multiple error records handling in one of the ACPI extlog driver code paths (Tony Luck). - Prune DSDT override documentation from index after dropping it (Bagas Sanjaya). * acpi-scan: ACPI: scan: Fix DMA range assignment * acpi-resource: ACPI: resource: note more about IRQ override ACPI: resource: do IRQ override on LENOVO IdeaPad * acpi-apei: ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init() * acpi-extlog: ACPI: extlog: Handle multiple records * acpi-docs: Documentation: ACPI: Prune DSDT override documentation from index	2022-10-21 20:07:41 +02:00
Chen Zhongjin	230db82413	x86/unwind/orc: Fix unreliable stack dump with gcov When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing the stack trace to show all text addresses as unreliable: # echo l > /proc/sysrq-trigger [ 477.521031] sysrq: Show backtrace of all active CPUs [ 477.523813] NMI backtrace for cpu 0 [ 477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65 [ 477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 477.526439] Call Trace: [ 477.526854] <TASK> [ 477.527216] ? dump_stack_lvl+0xc7/0x114 [ 477.527801] ? dump_stack+0x13/0x1f [ 477.528331] ? nmi_cpu_backtrace.cold+0xb5/0x10d [ 477.528998] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 477.529641] ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0 [ 477.530393] ? arch_trigger_cpumask_backtrace+0x1d/0x30 [ 477.531136] ? sysrq_handle_showallcpus+0x1b/0x30 [ 477.531818] ? __handle_sysrq.cold+0x4e/0x1ae [ 477.532451] ? write_sysrq_trigger+0x63/0x80 [ 477.533080] ? proc_reg_write+0x92/0x110 [ 477.533663] ? vfs_write+0x174/0x530 [ 477.534265] ? handle_mm_fault+0x16f/0x500 [ 477.534940] ? ksys_write+0x7b/0x170 [ 477.535543] ? __x64_sys_write+0x1d/0x30 [ 477.536191] ? do_syscall_64+0x6b/0x100 [ 477.536809] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 477.537609] </TASK> This happens when the compiled code for show_stack() has a single word on the stack, and doesn't use a tail call to show_stack_log_lvl(). (CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.) Then the __unwind_start() skip logic hits an off-by-one bug and fails to unwind all the way to the intended starting frame. Fix it by reverting the following commit: `f1d9a2abff` ("x86/unwind/orc: Don't skip the first frame for inactive tasks") The original justification for that commit no longer exists. That original issue was later fixed in a different way, with the following commit: `f2ac57a4c4` ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels") Fixes: `f1d9a2abff` ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> [jpoimboe: rewrite commit log] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org>	2022-10-21 14:56:42 +02:00
Colin Ian King	80e5acb6dd	wifi: rtl8xxxu: Fix reads of uninitialized variables hw_ctrl_s1, sw_ctrl_s1 Variables hw_ctrl_s1 and sw_ctrl_s1 are not being initialized and potentially can contain any garbage value. Currently there is an if statement that sets one or the other of these variables, followed by an if statement that checks if any of these variables have been set to a non-zero value. In the case where they may contain uninitialized non-zero values, the latter if statement may be taken as true when it was not expected to. Fix this by ensuring hw_ctrl_s1 and sw_ctrl_s1 are initialized. Cleans up clang warning: drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8188f.c:432:7: warning: variable 'hw_ctrl_s1' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (hw_ctrl) { ^~~~~~~ drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8188f.c:440:7: note: uninitialized use occurs here if (hw_ctrl_s1 \|\| sw_ctrl_s1) { ^~~~~~~~~~ drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8188f.c:432:3: note: remove the 'if' if its condition is always true if (hw_ctrl) { ^~~~~~~~~~~~~ Fixes: `c888183b21` ("wifi: rtl8xxxu: Support new chip RTL8188FU") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221020135709.1549086-1-colin.i.king@gmail.com	2022-10-21 15:54:06 +03:00
Jakub Kicinski	4d814b329a	MAINTAINERS: add keyword match on PTP Most of PTP drivers live under ethernet and we have to keep telling people to CC the PTP maintainers. Let's try a keyword match, we can refine as we go if it causes false positives. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-21 13:23:48 +01:00
Jakub Kicinski	46cdedf2a0	ethtool: pse-pd: fix null-deref on genl_info in dump ethnl_default_dump_one() passes NULL as info. It's correct not to set extack during dump, as we should just silently skip interfaces which can't provide the information. Reported-by: syzbot+81c4b4bbba6eea2cfcae@syzkaller.appspotmail.com Fixes: `18ff0bcda6` ("ethtool: add interface to interact with Ethernet Power Equipment") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-21 13:18:05 +01:00
Alexandru Tachici	3bd5549bd4	dt-bindings: net: adin1110: Document reset Document GPIO for HW reset. Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-21 13:11:33 +01:00
Alexandru Tachici	36934cac7a	net: ethernet: adi: adin1110: add reset GPIO Add an optional GPIO to be used for a hardware reset of the IC. Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-21 13:11:33 +01:00
Jeff Johnson	c0facc045a	net: ipa: Make QMI message rules const Commit `ff6d365898` ("soc: qcom: qmi: use const for struct qmi_elem_info") allows QMI message encoding/decoding rules to be const, so do that for IPA. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Sibi Sankar <quic_sibis@quicinc.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-21 12:39:16 +01:00
Ard Biesheuvel	37926f9630	efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0 The generic EFI stub can be instructed to avoid SetVirtualAddressMap(), and simply run with the firmware's 1:1 mapping. In this case, it populates the virtual address fields of the runtime regions in the memory map with the physical address of each region, so that the mapping code has to be none the wiser. Only if SetVirtualAddressMap() fails, the virtual addresses are wiped and the kernel code knows that the regions cannot be mapped. However, wiping amounts to setting it to zero, and if a runtime region happens to live at physical address 0, its valid 1:1 mapped virtual address could be mistaken for a wiped field, resulting on loss of access to the EFI services at runtime. So let's only assume that VA == 0 means 'no runtime services' if the region in question does not live at PA 0x0. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-10-21 11:09:41 +02:00
Ard Biesheuvel	53a7ea284d	efi: libstub: Fix incorrect payload size in zboot header The linker script symbol definition that captures the size of the compressed payload inside the zboot decompressor (which is exposed via the image header) refers to '.' for the end of the region, which does not give the correct result as the expression is not placed at the end of the payload. So use the symbol name explicitly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-10-21 11:09:41 +02:00
Ard Biesheuvel	db14655ad7	efi: libstub: Give efi_main() asmlinkage qualification To stop the bots from sending sparse warnings to me and the list about efi_main() not having a prototype, decorate it with asmlinkage so that it is clear that it is called from assembly, and therefore needs to remain external, even if it is never declared in a header file. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-10-21 11:09:40 +02:00
Ard Biesheuvel	8a254d90a7	efi: efivars: Fix variable writes without query_variable_store() Commit `bbc6d2c6ef` ("efi: vars: Switch to new wrapper layer") refactored the efivars layer so that the 'business logic' related to which UEFI variables affect the boot flow in which way could be moved out of it, and into the efivarfs driver. This inadvertently broke setting variables on firmware implementations that lack the QueryVariableInfo() boot service, because we no longer tolerate a EFI_UNSUPPORTED result from check_var_size() when calling efivar_entry_set_get_size(), which now ends up calling check_var_size() a second time inadvertently. If QueryVariableInfo() is missing, we support writes of up to 64k - let's move that logic into check_var_size(), and drop the redundant call. Cc: <stable@vger.kernel.org> # v6.0 Fixes: `bbc6d2c6ef` ("efi: vars: Switch to new wrapper layer") Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-10-21 11:09:40 +02:00
Ard Biesheuvel	4b017e59f0	efi: ssdt: Don't free memory if ACPI table was loaded successfully Amadeusz reports KASAN use-after-free errors introduced by commit `3881ee0b1e` ("efi: avoid efivars layer when loading SSDTs from variables"). The problem appears to be that the memory that holds the new ACPI table is now freed unconditionally, instead of only when the ACPI core reported a failure to load the table. So let's fix this, by omitting the kfree() on success. Cc: <stable@vger.kernel.org> # v6.0 Link: https://lore.kernel.org/all/a101a10a-4fbb-5fae-2e3c-76cf96ed8fbd@linux.intel.com/ Fixes: `3881ee0b1e` ("efi: avoid efivars layer when loading SSDTs from variables") Reported-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-10-21 11:09:40 +02:00
Ard Biesheuvel	f57fb375a2	efi: libstub: Remove zboot signing from build options The zboot decompressor series introduced a feature to sign the PE/COFF kernel image for secure boot as part of the kernel build. This was necessary because there are actually two images that need to be signed: the kernel with the EFI stub attached, and the decompressor application. This is a bit of a burden, because it means that the images must be signed on the the same system that performs the build, and this is not realistic for distros. During the next cycle, we will introduce changes to the zboot code so that the inner image no longer needs to be signed. This means that the outer PE/COFF image can be handled as usual, and be signed later in the release process. Let's remove the associated Kconfig options now so that they don't end up in a LTS release while already being deprecated. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-10-21 11:09:40 +02:00
Jerry Snitselaar	620bf9f981	iommu/vt-d: Clean up si_domain in the init_dmars() error path A splat from kmem_cache_destroy() was seen with a kernel prior to commit `ee2653bbe8` ("iommu/vt-d: Remove domain and devinfo mempool") when there was a failure in init_dmars(), because the iommu_domain cache still had objects. While the mempool code is now gone, there still is a leak of the si_domain memory if init_dmars() fails. So clean up si_domain in the init_dmars() error path. Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Fixes: `86080ccc22` ("iommu/vt-d: Allocate si_domain in init_dmars()") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-10-21 10:49:35 +02:00
Charlotte Tan	5566e68d82	iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI Reserved region, but it seems like it should accept an NVS region as well. The ACPI spec https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html uses similar wording for "Reserved" and "NVS" region types; for NVS regions it says "This range of addresses is in use or reserved by the system and must not be used by the operating system." There is an old comment on this mailing list that also suggests NVS regions should pass the arch_rmrr_sanity_check() test: The warnings come from arch_rmrr_sanity_check() since it checks whether the region is E820_TYPE_RESERVED. However, if the purpose of the check is to detect RMRR has regions that may be used by OS as free memory, isn't E820_TYPE_NVS safe, too? This patch overlaps with another proposed patch that would add the region type to the log since sometimes the bug reporter sees this log on the console but doesn't know to include the kernel log: https://lore.kernel.org/lkml/20220611204859.234975-3-atomlin@redhat.com/ Here's an example of the "Firmware Bug" apparent false positive (wrapped for line length): DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x000000006f760000-0x000000006f762fff], contact BIOS vendor for fixes DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x000000006f760000-0x000000006f762fff] This is the snippet from the e820 table: BIOS-e820: [mem 0x0000000068bff000-0x000000006ebfefff] reserved BIOS-e820: [mem 0x000000006ebff000-0x000000006f9fefff] ACPI NVS BIOS-e820: [mem 0x000000006f9ff000-0x000000006fffefff] ACPI data Fixes: `f036c7fa0a` ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") Cc: Will Mortensen <will@extrahop.com> Link: https://lore.kernel.org/linux-iommu/64a5843d-850d-e58c-4fc2-0a0eeeb656dc@nec.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216443 Signed-off-by: Charlotte Tan <charlotte@extrahop.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Link: https://lore.kernel.org/r/20220929044449.32515-1-charlotte@extrahop.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-10-21 10:49:35 +02:00
Lu Baolu	bf638a6513	iommu/vt-d: Use rcu_lock in get_resv_regions Commit `5f64ce5411` ("iommu/vt-d: Duplicate iommu_resv_region objects per device list") converted rcu_lock in get_resv_regions to dmar_global_lock to allow sleeping in iommu_alloc_resv_region(). This introduced possible recursive locking if get_resv_regions is called from within a section where intel_iommu_init() already holds dmar_global_lock. Especially, after commit `57365a04c9` ("iommu: Move bus setup to IOMMU device registration"), below lockdep splats could always be seen. ============================================ WARNING: possible recursive locking detected 6.0.0-rc4+ #325 Tainted: G I -------------------------------------------- swapper/0/1 is trying to acquire lock: ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at: intel_iommu_get_resv_regions+0x25/0x270 but task is already holding lock: ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at: intel_iommu_init+0x36d/0x6ea ... Call Trace: <TASK> dump_stack_lvl+0x48/0x5f __lock_acquire.cold.73+0xad/0x2bb lock_acquire+0xc2/0x2e0 ? intel_iommu_get_resv_regions+0x25/0x270 ? lock_is_held_type+0x9d/0x110 down_read+0x42/0x150 ? intel_iommu_get_resv_regions+0x25/0x270 intel_iommu_get_resv_regions+0x25/0x270 iommu_create_device_direct_mappings.isra.28+0x8d/0x1c0 ? iommu_get_dma_cookie+0x6d/0x90 bus_iommu_probe+0x19f/0x2e0 iommu_device_register+0xd4/0x130 intel_iommu_init+0x3e1/0x6ea ? iommu_setup+0x289/0x289 ? rdinit_setup+0x34/0x34 pci_iommu_init+0x12/0x3a do_one_initcall+0x65/0x320 ? rdinit_setup+0x34/0x34 ? rcu_read_lock_sched_held+0x5a/0x80 kernel_init_freeable+0x28a/0x2f3 ? rest_init+0x1b0/0x1b0 kernel_init+0x1a/0x130 ret_from_fork+0x1f/0x30 </TASK> This rolls back dmar_global_lock to rcu_lock in get_resv_regions to avoid the lockdep splat. Fixes: `57365a04c9` ("iommu: Move bus setup to IOMMU device registration") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220927053109.4053662-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-10-21 10:49:34 +02:00
Lu Baolu	0251d0107c	iommu: Add gfp parameter to iommu_alloc_resv_region Add gfp parameter to iommu_alloc_resv_region() for the callers to specify the memory allocation behavior. Thus iommu_alloc_resv_region() could also be available in critical contexts. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-10-21 10:49:32 +02:00
Anup Patel	cea8896bd9	RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc The kvm_riscv_vcpu_timer_pending() checks per-VCPU next_cycles and per-VCPU software injected VS timer interrupt. This function returns incorrect value when Sstc is available because the per-VCPU next_cycles are only updated by kvm_riscv_vcpu_timer_save() called from kvm_arch_vcpu_put(). As a result, when Sstc is available the VCPU does not block properly upon WFI traps. To fix the above issue, we introduce kvm_riscv_vcpu_timer_sync() which will update per-VCPU next_cycles upon every VM exit instead of kvm_riscv_vcpu_timer_save(). Fixes: `8f5cb44b1b` ("RISC-V: KVM: Support sstc extension") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>	2022-10-21 11:52:45 +05:30
Andrew Jones	5c20a3a9df	RISC-V: Fix compilation without RISCV_ISA_ZICBOM riscv_cbom_block_size and riscv_init_cbom_blocksize() should always be available and riscv_init_cbom_blocksize() should always be invoked, even when compiling without RISCV_ISA_ZICBOM enabled. This is because disabling RISCV_ISA_ZICBOM means "don't use zicbom instructions in the kernel" not "pretend there isn't zicbom, even when there is". When zicbom is available, whether the kernel enables its use with RISCV_ISA_ZICBOM or not, KVM will offer it to guests. Ensure we can build KVM and that the block size is initialized even when compiling without RISCV_ISA_ZICBOM. Fixes: `8f7e001e03` ("RISC-V: Clean up the Zicbom block size probing") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Anup Patel <anup@brainfault.org>	2022-10-21 11:52:39 +05:30
Adam Borowski	65d78b8d04	i2c: mlxbf: depend on ACPI; clean away ifdeffage This fixes maybe_unused warnings/errors. According to a comment during device tree removal, only ACPI is supported, thus let's actually require it. Fixes: `be18c5ede2` ("i2c: mlxbf: remove device tree support") Signed-off-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Wolfram Sang <wsa@kernel.org>	2022-10-21 07:59:35 +02:00
Doug Berger	070f822d07	net: bcmgenet: add RX_CLS_LOC_ANY support If a matching flow spec exists its current location is as good as ANY. If not add the new flow spec at the first available location. Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221019215123.316997-1-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-20 21:43:57 -07:00
Alexey Kodanev	377eb9aab0	sctp: remove unnecessary NULL checks in sctp_enqueue_event() After commit `178ca044aa` ("sctp: Make sctp_enqueue_event tak an skb list."), skb_list cannot be NULL. Detected using the static analysis tool - Svace. Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20221019180735.161388-3-aleksei.kodanev@bell-sw.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-20 21:43:10 -07:00
Alexey Kodanev	b66aeddbe3	sctp: remove unnecessary NULL check in sctp_ulpq_tail_event() After commit `013b96ec64` ("sctp: Pass sk_buff_head explicitly to sctp_ulpq_tail_event().") there is one more unneeded check of skb_list for NULL. Detected using the static analysis tool - Svace. Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20221019180735.161388-2-aleksei.kodanev@bell-sw.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-20 21:43:10 -07:00
Alexey Kodanev	6fdfdef7fd	sctp: remove unnecessary NULL check in sctp_association_init() '&asoc->ulpq' passed to sctp_ulpq_init() as the first argument, then sctp_qlpq_init() initializes it and eventually returns the address of the struct member back. Therefore, in this case, the return pointer cannot be NULL. Moreover, it seems sctp_ulpq_init() has always been used only in sctp_association_init(), so there's really no need to return ulpq anymore. Detected using the static analysis tool - Svace. Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20221019180735.161388-1-aleksei.kodanev@bell-sw.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-20 21:43:10 -07:00
Alistair Popple	97061d4411	nouveau: fix migrate_to_ram() for faulting page Commit `16ce101db8` ("mm/memory.c: fix race when faulting a device private page") changed the migrate_to_ram() callback to take a reference on the device page to ensure it can't be freed while handling the fault. Unfortunately the corresponding update to Nouveau to accommodate this change was inadvertently dropped from that patch causing GPU to CPU migration to fail so add it here. Link: https://lkml.kernel.org/r/20221019122934.866205-1-apopple@nvidia.com Fixes: `16ce101db8` ("mm/memory.c: fix race when faulting a device private page") Signed-off-by: Alistair Popple <apopple@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-10-20 21:27:24 -07:00
Mel Gorman	71e2d666ef	mm/huge_memory: do not clobber swp_entry_t during THP split The following has been observed when running stressng mmap since commit `b653db7735` ("mm: Clear page->private when splitting or migrating a page") watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546] CPU: 75 PID: 9546 Comm: stress-ng Tainted: G E 6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374 Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016 RIP: 0010:xas_descend+0x28/0x80 Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2 RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246 RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002 RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0 RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0 R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000 R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88 FS: 00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xas_load+0x3a/0x50 __filemap_get_folio+0x80/0x370 ? put_swap_page+0x163/0x360 pagecache_get_page+0x13/0x90 __try_to_reclaim_swap+0x50/0x190 scan_swap_map_slots+0x31e/0x670 get_swap_pages+0x226/0x3c0 folio_alloc_swap+0x1cc/0x240 add_to_swap+0x14/0x70 shrink_page_list+0x968/0xbc0 reclaim_page_list+0x70/0xf0 reclaim_pages+0xdd/0x120 madvise_cold_or_pageout_pte_range+0x814/0xf30 walk_pgd_range+0x637/0xa30 __walk_page_range+0x142/0x170 walk_page_range+0x146/0x170 madvise_pageout+0xb7/0x280 ? asm_common_interrupt+0x22/0x40 madvise_vma_behavior+0x3b7/0xac0 ? find_vma+0x4a/0x70 ? find_vma+0x64/0x70 ? madvise_vma_anon_name+0x40/0x40 madvise_walk_vmas+0xa6/0x130 do_madvise+0x2f4/0x360 __x64_sys_madvise+0x26/0x30 do_syscall_64+0x5b/0x80 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? do_syscall_64+0x67/0x80 ? common_interrupt+0x8b/0xa0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The problem can be reproduced with the mmtests config config-workload-stressng-mmap. It does not always happen and when it triggers is variable but it has happened on multiple machines. The intent of commit `b653db7735` patch was to avoid the case where PG_private is clear but folio->private is not-NULL. However, THP tail pages uses page->private for "swp_entry_t if folio_test_swapcache()" as stated in the documentation for struct folio. This patch only clobbers page->private for tail pages if the head page was not in swapcache and warns once if page->private had an unexpected value. Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net Fixes: `b653db7735` ("mm: Clear page->private when splitting or migrating a page") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yang Shi <shy828301@gmail.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-10-20 21:27:24 -07:00
Mike Kravetz	612b8a3170	hugetlb: fix memory leak associated with vma_lock structure The hugetlb vma_lock structure hangs off the vm_private_data pointer of sharable hugetlb vmas. The structure is vma specific and can not be shared between vmas. At fork and various other times, vmas are duplicated via vm_area_dup(). When this happens, the pointer in the newly created vma must be cleared and the structure reallocated. Two hugetlb specific routines deal with this hugetlb_dup_vma_private and hugetlb_vm_op_open. Both routines are called for newly created vmas. hugetlb_dup_vma_private would always clear the pointer and hugetlb_vm_op_open would allocate the new vms_lock structure. This did not work in the case of this calling sequence pointed out in [1]. move_vma copy_vma new_vma = vm_area_dup(vma); new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock. is_vm_hugetlb_page(vma) clear_vma_resv_huge_pages hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL When clearing hugetlb_dup_vma_private we actually leak the associated vma_lock structure. The vma_lock structure contains a pointer to the associated vma. This information can be used in hugetlb_dup_vma_private and hugetlb_vm_op_open to ensure we only clear the vm_private_data of newly created (copied) vmas. In such cases, the vma->vma_lock->vma field will not point to the vma. Update hugetlb_dup_vma_private and hugetlb_vm_op_open to not clear vm_private_data if vma->vma_lock->vma == vma. Also, log a warning if hugetlb_vm_op_open ever encounters the case where vma_lock has already been correctly allocated for the vma. [1] https://lore.kernel.org/linux-mm/5154292a-4c55-28cd-0935-82441e512fc3@huawei.com/ Link: https://lkml.kernel.org/r/20221019201957.34607-1-mike.kravetz@oracle.com Fixes: `131a79b474` ("hugetlb: fix vma lock handling during split vma and range unmapping") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: James Houghton <jthoughton@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-10-20 21:27:23 -07:00
Liam R. Howlett	df48a5f7a3	mm/page_alloc: reduce potential fragmentation in make_alloc_exact() Try to avoid using the left over split page on the next request for a page by calling __free_pages_ok() with FPI_TO_TAIL. This increases the potential of defragmenting memory when it's used for a short period of time. Link: https://lkml.kernel.org/r/20220531185626.yvlmymbxyoe5vags@revolver Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-10-20 21:27:23 -07:00

... 7 8 9 10 11 ...

1137092 Commits