linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-24 02:52:38 +00:00

Author	SHA1	Message	Date
Arnd Bergmann	0dd0e9437f	um: don't generate asm/bpf_perf_event.h If we start validating the existence of the asm-generic side of generated headers, this one causes a warning: make[3]: *** No rule to make target 'arch/um/include/generated/asm/bpf_perf_event.h', needed by 'all'. Stop. The problem is that the asm-generic header only exists for the uapi variant, but arch/um has no uapi headers and instead uses the x86 userspace API. Add a custom file with an explicit redirect to avoid this. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-10 14:23:30 +02:00
Johannes Berg	98ff534ec2	um: vector: always reset vp->opened If open fails, we have already set vp->opened, but it's not reset so that any further attempts will just return -ENXIO. Reset vp->opened even if close has nothing to do. This helps e.g. with slirp4netns handling only a single connection, you can then restart it and open the device again. Link: https://patch.msgid.link/20240703184622.df40c5c38461.Id4e20b48938c6019d99e6133227a34ac059db466@changeid Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-04 12:03:26 +02:00
Johannes Berg	a0470a9f69	um: vector: remove vp->lock This lock is useless, all the places that are using it for some locking will already hold the RTNL. Just remove it. Link: https://patch.msgid.link/20240703184606.19aa35b14959.I9cf5f2c4e35abd06cc89bf2e990fa755eb8e5f0f@changeid Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-04 12:03:23 +02:00
Johannes Berg	86abcd6eeb	um: register power-off handler Otherwise we always get reboot: Power off not available: System halted instead which is really quite pointless. Link: https://patch.msgid.link/20240703173839.fcbb538c6686.I3d333f4773cff93c4337c4d128ee0b1b501b3dfa@changeid Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-04 12:03:20 +02:00
Johannes Berg	824ac4a5ed	um: line: always fill *error_out in setup_one_line() The pointer isn't initialized by callers, but I have encountered cases where it's still printed; initialize it in all possible cases in setup_one_line(). Link: https://patch.msgid.link/20240703172235.ad863568b55f.Iaa1eba4db8265d7715ba71d5f6bb8c7ff63d27e9@changeid Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-04 12:03:14 +02:00
Anton Ivanov	cd01672d64	um: Enable preemption in UML Since userspace state is saved in the MM process, kernel using FPU still doesn't really need to do anything, so this really is as simple as enabling preemption. The irq critical section in sigio_handler() needs preempt_disable()/preempt_enable(). Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20240702102549.d2fcea450854.I12f5a53d80ec1e425e66ef272b1e95cb523b608e@changeid [rebase, remove FPU save/restore, fix x86/um Makefile, rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:10:43 +02:00
Benjamin Berg	bcf3d957c6	um: refactor TLB update handling Conceptually, we want the memory mappings to always be up to date and represent whatever is in the TLB. To ensure that, we need to sync them over in the userspace case and for the kernel we need to process the mappings. The kernel will call flush_tlb_* if page table entries that were valid before become invalid. Unfortunately, this is not the case if entries are added. As such, change both flush_tlb_* and set_ptes to track the memory range that has to be synchronized. For the kernel, we need to execute a flush_tlb_kern_* immediately but we can wait for the first page fault in case of set_ptes. For userspace in contrast we only store that a range of memory needs to be synced and do so whenever we switch to that process. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-13-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:50 +02:00
Benjamin Berg	573a446fc8	um: simplify and consolidate TLB updates The HVC update was mostly used to compress consecutive calls into one. This is mostly relevant for userspace where it is already handled by the syscall stub code. Simplify the whole logic and consolidate it for both kernel and userspace. This does remove the sequential syscall compression for the kernel, however that shouldn't be the main factor in most runs. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-12-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:50 +02:00
Benjamin Berg	ef714f1502	um: remove force_flush_all from fork_handler There should be no need for this. It may be that this used to work around another issue where after a clone the MM was in a bad state. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-11-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:50 +02:00
Benjamin Berg	5168f6b4a4	um: Do not flush MM in flush_thread There should be no need to flush the memory in flush_thread. Doing this likely worked around some issue where memory was still incorrectly mapped when creating or cloning an MM. With the removal of the special clone path, that isn't relevant anymore. However, add the flush into MM initialization so that any new userspace MM is guaranteed to be clean. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-10-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:49 +02:00
Benjamin Berg	3c83170d7c	um: Delay flushing syscalls until the thread is restarted As running the syscalls is expensive due to context switches, we should do so as late as possible in case more syscalls need to be queued later on. This will also benefit a later move to a SECCOMP enabled userspace as in that case the need for extra context switches is removed entirely. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-9-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:49 +02:00
Benjamin Berg	a5d2cfe749	um: remove copy_context_skas0 The kernel flushes the memory ranges anyway for CoW and does not assume that the userspace process has anything set up already. So, start with a fresh process for the new mm context. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-8-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:49 +02:00
Benjamin Berg	7911b650a0	um: remove LDT support The current LDT code has a few issues that mean it should be redone in a different way once we always start with a fresh MM even when cloning. In a new and better world, the kernel would just ensure its own LDT is clear at startup. At that point, all that is needed is a simple function to populate the LDT from another MM in arch_dup_mmap combined with some tracking of the installed LDT entries for each MM. Note that the old implementation was even incorrect with regard to reading, as it copied out the LDT entries in the internal format rather than converting them to the userspace structure. Removal should be fine as the LDT is not used for thread-local storage anymore. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-7-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:49 +02:00
Benjamin Berg	6d8992e49e	um: compress memory related stub syscalls while adding them To keep the number of syscalls that the stub has to do lower, compress two consecutive syscalls of the same type if the second is just a continuation of the first. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-6-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:49 +02:00
Benjamin Berg	76ed9158e1	um: Rework syscall handling Rework syscall handling to be platform independent. Also create a clean split between queueing of syscalls and flushing them out, removing the need to keep state in the code that triggers the syscalls. The code adds syscall_data_len to the global mm_id structure. This will be used later to allow surrounding code to track whether syscalls still need to run and if errors occurred. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-5-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:49 +02:00
Benjamin Berg	dc26184a9d	um: Create signal stack memory assignment in stub_data When we switch to use seccomp, we need both the signal stack and other data (i.e. syscall information) to co-exist in the stub data. To facilitate this, start by defining separate memory areas for the stack and syscall data. This moves the signal stack onto a new page as the memory area is not sufficient to hold both signal stack and syscall information. Only change the signal stack setup for now, as the syscall code will be reworked later. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:48 +02:00
Benjamin Berg	d1d3a2e69b	um: Remove stub-data.h include from common-offsets.h Further commits will require values from common-offsets.h inside stub-data.h. Resolve the possible circular dependency and simply use offsetof() inside stub_32.h and stub_64.h. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-2-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:48 +02:00
Johannes Berg	2cf3a3c4b8	um: time-travel: fix signal blocking race/hang When signals are hard-blocked in order to do time-travel socket processing, we set signals_blocked and then handle SIGIO signals by setting the SIGIO bit in signals_pending. When unblocking, we first set signals_blocked to 0, and then handle all pending signals. We have to set it first, so that we can again properly block/unblock inside the unblock, if the time-travel handlers need to be processed. Unfortunately, this is racy. We can get into this situation: // signals_pending = SIGIO_MASK unblock_signals_hard() signals_blocked = 0; if (signals_pending && signals_enabled) { block_signals(); unblock_signals() ... sig_handler_common(SIGIO, NULL, NULL); sigio_handler() ... sigio_reg_handler() irq_do_timetravel_handler() reg->timetravel_handler() == vu_req_interrupt_comm_handler() vu_req_read_message() vhost_user_recv_req() vhost_user_recv() vhost_user_recv_header() // reads 12 bytes header of // 20 bytes message <-- receive SIGIO here <-- sig_handler() int enabled = signals_enabled; // 1 if ((signals_blocked \|\| !enabled) && (sig == SIGIO)) { if (!signals_blocked && time_travel_mode == TT_MODE_EXTERNAL) sigio_run_timetravel_handlers() _sigio_handler() sigio_reg_handler() ... as above ... vhost_user_recv_header() // reads 8 bytes that were message payload // as if it were header - but aborts since // it then gets -EAGAIN ... --> end signal handler --> // continue in vhost_user_recv() // full_read() for 8 bytes payload busy loops // entire process hangs here Conceptually, to fix this, we need to ensure that the signal handler cannot run while we hard-unblock signals. The thing that makes this more complex is that we can be doing hard-block/unblock while unblocking. Introduce a new signals_blocked_pending variable that we can keep at non-zero as long as pending signals are being processed, then we only need to ensure it's decremented safely and the signal handler will only increment it if it's already non-zero (or signals_blocked is set, of course.) Note also that only the outermost call to hard-unblock is allowed to decrement signals_blocked_pending, since it could otherwise reach zero in an inner call, and leave the same race happening if the timetravel_handler loops, but that's basically required of it. Fixes: `d6b399a0e0` ("um: time-travel/signals: fix ndelay() in interrupt") Link: https://patch.msgid.link/20240703110144.28034-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:20 +02:00
Johannes Berg	4561022588	um: time-travel: remove time_exit() This function is unused and unneeded, remove it. Link: https://patch.msgid.link/20240703130105.02b3a974acb7.I7264821f7cfa17ea713b7a3e4787aa41a3107d01@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 17:09:08 +02:00
Jeff Johnson	36c5005f11	um: harddog: add missing MODULE_DESCRIPTION() macro With ARCH=um, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/um/drivers/harddog.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://patch.msgid.link/20240702-md-um-arch-um-drivers-v1-1-79e4f50b5bab@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:25:07 +02:00
Johannes Berg	bfb80d8bc9	um: add shared memory optimisation for time-travel=ext With external time travel, a LOT of message can end up being exchanged on the socket, taking a significant amount of time just to do that. Add a new shared memory optimisation to that, where a number of changes are made: - the controller sends a client ID and a shared memory FD (and a logging FD we don't use) in the ACK message to the initial START - the shared memory holds the current time and the free_until value, so that there's no need to exchange messages for that - if the client that's running has shared memory support, any client (the running one included) can request the next time it wants to run inside the shared memory, rather than sending a message, by also updating the free_until value - when shared memory is enabled, RUN/WAIT messages no longer have an ACK, further cutting down on messages Together, this can reduce the number of messages very significantly, and reduce overall test/simulation run time. Co-developed-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Link: https://patch.msgid.link/20240702192118.6ad0a083f574.Ie41206c8ce4507fe26b991937f47e86c24ca7a31@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:24:54 +02:00
Johannes Berg	e20f9b3c59	um: add mmap/mremap OS calls For the upcoming shared-memory time-travel external optimisations, we need to be able to mmap/mremap. Add the necessary OS calls. Link: https://patch.msgid.link/20240702192118.ca4472963638.Ic2da1d3a983fe57340c1b693badfa9c5bd2d8c61@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:24:48 +02:00
Johannes Berg	5cde6096a4	um: generalize os_rcv_fd Change os_rcv_fd() to os_rcv_fd_msg() that can more generally receive any number of FDs in any kind of message. Link: https://patch.msgid.link/20240702192118.40b78b2bfe4e.Ic6ec12d72630e5bcae1e597d6bd5c6f29f441563@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:24:25 +02:00
Mordechay Goodstein	6555acdefc	um: time-travel: support time-travel protocol broadcast messages Add a message type to the time-travel protocol to broadcast a small (64-bit) value to all participants in a simulation. The main use case is to have an identical message come to all participants in a simulation, e.g. to separate out logs for different tests running in a single simulation. Down in the guts of time_travel_handle_message() we can't use printk() and not even printk_deferred(), so just store the message and print it at the start of the userspace() function. Unfortunately this means that other prints in the kernel can actually bypass the message, but in most cases where this is used, for example to separate test logs, userspace will be involved. Also, even if we could use printk_deferred(), we'd still need to flush it out in the userspace() function since otherwise userspace messages might cross it. As a result, this is a reasonable compromise, there's no need to have any core changes and it solves the main use case we have for it. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Link: https://patch.msgid.link/20240702192118.c4093bc5b15e.I2ca8d006b67feeb866ac2017af7b741c9e06445a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:24:22 +02:00
Johannes Berg	53585f9ea4	um: enable UBSAN We can select ARCH_HAS_UBSAN, it works just fine. It had been enabled and we even used it, but then commit `890a64810d` ("ubsan: Restore dependency on ARCH_HAS_UBSAN") (correctly) disabled it again, enable ARCH_HAS_UBSAN to get it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20240701220034.995eb04d656d.Ia29fe091b207fe66b5e26298c1e427ebcf131642@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:23:40 +02:00
Wei Yang	5cd93c7532	um/mm: remove redundant assignment of max_low_pfn Current calculation of max_low_pfn is introduced in commit `af84eab208` ("[PATCH] uml: fix LVM crash"). It is intended to set max_low_pfn to the same value as max_pfn. But I am not sure why the max_pfn is set to totalram_pages, which represents the number of usable pages in system instead of an absolute page frame number. (The change history stops there.) While we have already calculate it in setup_physmem(), so not necessary to do it again. Also this would help changing totalram_pages accounting, since we plan to move the accounting into __free_pages_core(). With this change, totalram_pages may not represent the total usable pages at this point, since some pages would be deferred initialized. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> CC: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Alasdair G Kergon <agk@redhat.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Mike Rapoport (IBM) <rppt@kernel.org> CC: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Link: https://patch.msgid.link/20240615034150.2958-1-richard.weiyang@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:22:39 +02:00
David Gow	ab0f4cedc3	arch: um: rust: Add i386 support for Rust At present, Rust in the kernel only supports 64-bit x86, so UML has followed suit. However, it's significantly easier to support 32-bit i386 on UML than on bare metal, as UML does not use the -mregparm option (which alters the ABI), which is not yet supported by rustc[1]. Add support for CONFIG_RUST on um/i386, by adding a new target config to generate_rust_target, and replacing various checks on CONFIG_X86_64 to also support CONFIG_X86_32. We still use generate_rust_target, rather than a built-in rustc target, in order to match x86_64, provide a future place for -mregparm, and more easily disable floating point instructions. With these changes, the KUnit tests pass with: kunit.py run --make_options LLVM=1 --kconfig_add CONFIG_RUST=y --kconfig_add CONFIG_64BIT=n --kconfig_add CONFIG_FORTIFY_SOURCE=n An earlier version of these changes was proposed on the Rust-for-Linux github[2]. [1]: https://github.com/rust-lang/rust/issues/116972 [2]: https://github.com/Rust-for-Linux/linux/pull/966 Signed-off-by: David Gow <davidgow@google.com> Link: https://patch.msgid.link/20240604224052.3138504-1-davidgow@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:22:22 +02:00
Tiwei Bie	cb2759431a	um: Remove /proc/sysemu support code Currently /proc/sysemu will never be registered, as sysemu_supported is initialized to zero implicitly and no code updates it. And there is also nothing to configure via sysemu in UML anymore. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240527134024.1539848-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:21:57 +02:00
Tiwei Bie	6fdae1da76	um: Remove unused ncpus variable It's no longer used. And uml_ncpus_setup doesn't exist anymore. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240527134024.1539848-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:21:57 +02:00
Dr. David Alan Gilbert	1cf855ded6	ubd: Remove unused mutex 'ubd_mutex' Commit `fb5d1d389c` ("ubd: open the backing files in ubd_add") removed the last use of ubd_mutex. Remove it. Build and kernel startup test only. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20240505001508.255096-1-linux@treblig.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:21:29 +02:00
Johannes Berg	7d0a8a490a	um: time-travel: fix time-travel-start option We need to have the = as part of the option so that the value can be parsed properly. Also document that it must be given in nanoseconds, not seconds. Fixes: `065038706f` ("um: Support time travel mode") Link: https://patch.msgid.link/20240417102744.14b9a9d4eba0.Ib22e9136513126b2099d932650f55f193120cd97@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:21:16 +02:00
Niklas Schnelle	ddd268c428	um: Select HAS_IOREMAP for UML_IOMEM_EMULATION In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at compile time. UML supports these via its UML_IOMEM_EMULATION so let that select HAS_IOPORT and also reflect this in NO_IOPORT_MAP. Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://patch.msgid.link/20240403124300.65379-2-schnelle@linux.ibm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:21:01 +02:00
Anton Ivanov	12b8e7e69a	um: Remove obsolete pcap driver Remove the pcap driver in UML. It is obsolete. It does not build on recent systems due to changes in libpcap and its dependencies. The vector driver's raw transport in UML provides identical functionality. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20240328132424.376456-1-anton.ivanov@cambridgegreys.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:19:25 +02:00
Benjamin Berg	b2f9b77c7f	um: chan: use blocking IO for console output for time-travel When in time-travel mode (infinite-cpu or external) time should not pass for writing to the console. As such, it makes sense to put the FD for the output side into blocking mode and simply let any write to it hang. If we did not do this, then time could pass waiting for the console to become writable again. This is not desirable as it has random effects on the clock between runs. Implement this by duplicating the FD if output is active in a relevant mode and setting the duplicate to be blocking. This avoids changing the input channel to be blocking should it exists. After this, use the blocking FD for all write operations and do not allocate an IRQ it is set. Without time-travel mode fd_out will always match fd_in and IRQs are registered. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20231018123643.1255813-4-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:18:02 +02:00
Benjamin Berg	4cfb44df8d	um: chan_user: retry partial writes In the next commit, we are going to set the output FD to be blocking. Once that is done, the write() may be short if an interrupt happens while it is writing out data. As such, to properly catch an EINTR error, we need to retry the write. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20231018123643.1255813-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:18:02 +02:00
Benjamin Berg	c6c4cbaa01	um: chan_user: catch EINTR when reading and writing If the read/write function returns an error then we expect to see an event/IRQ later on. However, this will only happen after an EAGAIN as we are using edge based event triggering. As such, EINTR needs to be caught should it happen. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20231018123643.1255813-2-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:18:01 +02:00
Benjamin Berg	c140a5bd5d	um: irqs: process outstanding IRQs when unblocking signals When in time-travel mode, the eventfd events are read even when signals are blocked as SIGIO still needs to be processed. In this case, the event is cleared on the eventfd but the IRQ still needs to be fired later. We did already ensure that the SIGIO handler is run again. However, the FDs are configured to be level triggered, so that eventfd will not notify again. As such, add some logic to mark the IRQ as pending and process it at the next opportunity. To avoid duplication, reuse the logic used for the suspend/resume case. This does not really change anything except for delaying running the IRQs with timetravel_handler at a slightly later point in time (and possibly running non-timetravel IRQs that shouldn't happen earlier). While at it, move marking as pending into irq_event_handler as that is the more logical place for it to happen. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20231018123643.1255813-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-03 12:18:01 +02:00
Christoph Hellwig	bd4a633b6f	block: move the nonrot flag to queue_limits Move the nonrot flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Use the chance to switch to defaulting to non-rotational and require the driver to opt into rotational, which matches the polarity of the sysfs interface. For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new rotational flag is not set as they clearly are not rotational despite this being a behavior change. There are some other drivers that unconditionally set the rotational flag to keep the existing behavior as they arguably can be used on rotational devices even if that is probably not their main use today (e.g. virtio_blk and drbd). The flag is automatically inherited in blk_stack_limits matching the existing behavior in dm and md. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00
Christoph Hellwig	1122c0c1cc	block: move cache control settings out of queue->flags Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00
Herve Codina	a701f8e93b	_PATCH_19_23_um_virt_pci_Use_irq_domain_instantiate_ um_pci_init() uses __irq_domain_add(). With the introduction of irq_domain_instantiate(), __irq_domain_add() becomes obsolete. In order to fully remove __irq_domain_add(), use directly irq_domain_instantiate(). [ tglx: Fixup struct initializer ] Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-20-herve.codina@bootlin.com	2024-06-17 15:48:15 +02:00
Christoph Hellwig	73e3715ed1	block: add special APIs for run-time disabling of discard and friends A few drivers optimistically try to support discard, write zeroes and secure erase and disable the features from the I/O completion handler if the hardware can't support them. This disable can't be done using the atomic queue limits API because the I/O completion handlers can't take sleeping locks or freeze the queue. Keep the existing clearing of the relevant field to zero, but replace the old blk_queue_max_* APIs with new disable APIs that force the value to 0. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240531074837.1648501-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-14 10:19:44 -06:00
Christoph Hellwig	31ade7d4fd	ubd: untagle discard vs write zeroes not support handling Discard and Write Zeroes are different operation and implemented by different fallocate opcodes for ubd. If one fails the other one can work and vice versa. Split the code to disable the operations in ubd_handler to only disable the operation that actually failed. Fixes: `50109b5a03` ("um: Add support for DISCARD in the UBD Driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20240531074837.1648501-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-14 10:19:44 -06:00
Christoph Hellwig	5db755fbb1	ubd: refactor the interrupt handler Instead of a separate handler function that leaves no work in the interrupt hanler itself, split out a per-request end I/O helper and clean up the coding style and variable naming while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20240531074837.1648501-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-14 10:19:44 -06:00
Linus Torvalds	2313022ec5	Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Fixes for -Wmissing-prototypes warnings and further cleanup - Remove callback returning void from rtc and virtio drivers - Fix bash location * tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits) um: virtio_uml: Convert to platform remove callback returning void um: rtc: Convert to platform remove callback returning void um: Remove unused do_get_thread_area function um: Fix -Wmissing-prototypes warnings for __vdso_* um: Add an internal header shared among the user code um: Fix the declaration of kasan_map_memory um: Fix the -Wmissing-prototypes warning for get_thread_reg um: Fix the -Wmissing-prototypes warning for __switch_mm um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn um: Stop tracking host PID in cpu_tasks um: process: remove unused 'n' variable um: vector: remove unused len variable/calculation um: vector: fix bpfflash parameter evaluation um: slirp: remove set but unused variable 'pid' um: signal: move pid variable where needed um: Makefile: use bash from the environment um: Add winch to winch_handlers before registering winch IRQ um: Fix -Wmissing-prototypes warnings for __warp_* and foo um: Fix -Wmissing-prototypes warnings for text_poke* um: Move declarations to proper headers ...	2024-05-25 13:17:48 -07:00
Linus Torvalds	2ef32ad224	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-net is finally supported in vduse - virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-pci: Check if is_avq is NULL virtio: delete vq in vp_find_vqs_msix() when request_irq() fails MAINTAINERS: add Eugenio Pérez as reviewer vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API vp_vdpa: don't allocate unused msix vectors sound: virtio: drop owner assignment fuse: virtio: drop owner assignment scsi: virtio: drop owner assignment rpmsg: virtio: drop owner assignment nvdimm: virtio_pmem: drop owner assignment wifi: mac80211_hwsim: drop owner assignment vsock/virtio: drop owner assignment net: 9p: virtio: drop owner assignment net: virtio: drop owner assignment net: caif: virtio: drop owner assignment misc: nsm: drop owner assignment iommu: virtio: drop owner assignment drm/virtio: drop owner assignment gpio: virtio: drop owner assignment firmware: arm_scmi: virtio: drop owner assignment ...	2024-05-23 12:04:36 -07:00
Krzysztof Kozlowski	9731be4bc8	um: virt-pci: drop owner assignment virtio core already sets the .owner, so driver does not need to. Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-5-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-05-22 08:31:16 -04:00
Linus Torvalds	3eb3c33c1d	Merge tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "These are a few cross-architecture cleanup patches: - separate out fbdev support from the asm/video.h contents that may be used by either the old fbdev drivers or the newer drm display code (Thomas Zimmermann) - cleanups for the generic bitops code and asm-generic/bug.h (Thorsten Blum) - remove the orphaned include/asm-generic/page.h header that used to be included by long-removed mmu-less architectures (me)" * tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: Fix name collision with ACPI's video.o bug: Improve comment asm-generic: remove unused asm-generic/page.h arch: Rename fbdev header and source files arch: Remove struct fb_info from video helpers arch: Select fbdev helpers with CONFIG_VIDEO bitops: Change function return types from long to int	2024-05-20 15:18:34 -07:00
Linus Torvalds	61307b7be4	Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...	2024-05-19 09:21:03 -07:00
Linus Torvalds	ff9a79307f	Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig * tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits) kconfig: use sym_get_choice_menu() in sym_check_prop() rapidio: remove choice for enumeration kconfig: lxdialog: remove initialization with A_NORMAL kconfig: m/nconf: merge two item_add_str() calls kconfig: m/nconf: remove dead code to display value of bool choice kconfig: m/nconf: remove dead code to display children of choice members kconfig: gconf: show checkbox for choice correctly kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal Makefile: remove redundant tool coverage variables kbuild: provide reasonable defaults for tool coverage modules: Drop the .export_symbol section from the final modules kconfig: use menu_list_for_each_sym() in sym_check_choice_deps() kconfig: use sym_get_choice_menu() in conf_write_defconfig() kconfig: add sym_get_choice_menu() helper kconfig: turn defaults and additional prompt for choice members into error kconfig: turn missing prompt for choice members into error kconfig: turn conf_choice() into void function kconfig: use linked list in sym_set_changed() kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED kconfig: gconf: remove debug code ...	2024-05-18 12:39:20 -07:00
Masahiro Yamada	b1992c3772	kbuild: use $(src) instead of $(srctree)/$(src) for source directory Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>	2024-05-10 04:34:52 +09:00

... 2 3 4 5 6 ...

3001 Commits