Commit Graph

441 Commits

Author SHA1 Message Date
Heiko Carstens
4f00d4ef66 s390: adjust indentation of RELOCS command build step out
Common pattern in non-verbose build output for quiet commands is that the
shorthand of a command including whitespace contains at least eight
characters. Adjust this for the RELOCS command, which comes only with seven
characters.

Before:
  SORTTAB vmlinux
  CC      arch/s390/boot/version.o
  RELOCS arch/s390/boot/relocs.S
  OBJCOPY arch/s390/boot/info.bin

After:
  SORTTAB vmlinux
  CC      arch/s390/boot/version.o
  RELOCS  arch/s390/boot/relocs.S
  OBJCOPY arch/s390/boot/info.bin

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-09 17:29:56 +02:00
Linus Torvalds
902861e34c Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
   from hotplugged memory rather than only from main memory. Series
   "implement "memmap on memory" feature on s390".

 - More folio conversions from Matthew Wilcox in the series

	"Convert memcontrol charge moving to use folios"
	"mm: convert mm counter to take a folio"

 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes. The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".

 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.

 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools". Measured improvements are modest.

 - zswap cleanups and simplifications from Yosry Ahmed in the series
   "mm: zswap: simplify zswap_swapoff()".

 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is
   hotplugged as system memory.

 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.

 - More DAMON work from SeongJae Park in the series

	"mm/damon: make DAMON debugfs interface deprecation unignorable"
	"selftests/damon: add more tests for core functionalities and corner cases"
	"Docs/mm/damon: misc readability improvements"
	"mm/damon: let DAMOS feeds and tame/auto-tune itself"

 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving
   policy wherein we allocate memory across nodes in a weighted fashion
   rather than uniformly. This is beneficial in heterogeneous memory
   environments appearing with CXL.

 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".

 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".

 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format. Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.

 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
   targeted at arm64, this significantly speeds up fork() when the
   process has a large number of pte-mapped folios.

 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
   implements batching during munmap() and other pte teardown
   situations. The microbenchmark improvements are nice.

 - And in the series "Transparent Contiguous PTEs for User Mappings"
   Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings"). Kernel build times on arm64 improved nicely. Ryan's
   series "Address some contpte nits" provides some followup work.

 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page
   faults. He has also added a reproducer under the selftest code.

 - In the series "selftests/mm: Output cleanups for the compaction
   test", Mark Brown did what the title claims.

 - Kinsey Ho has added the series "mm/mglru: code cleanup and
   refactoring".

 - Even more zswap material from Nhat Pham. The series "fix and extend
   zswap kselftests" does as claimed.

 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
   in our handling of DAX on archiecctures which have virtually aliasing
   data caches. The arm architecture is the main beneficiary.

 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides
   dramatic improvements in worst-case mmap_lock hold times during
   certain userfaultfd operations.

 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series

	"page_owner: print stacks and their outstanding allocations"
	"page_owner: Fixup and cleanup"

 - Uladzislau Rezki has contributed some vmalloc scalability
   improvements in his series "Mitigate a vmap lock contention". It
   realizes a 12x improvement for a certain microbenchmark.

 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".

 - Some zsmalloc maintenance work from Chengming Zhou in the series

	"mm/zsmalloc: fix and optimize objects/page migration"
	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"

 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0. This a step along the path to implementaton of the merging
   of large anonymous folios. The series is named "Enable >0 order folio
   memory compaction".

 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages()
   to an iterator".

 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".

 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios. The
   series is named "Split a folio to any lower order folios".

 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.

 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".

 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which
   are configured to use large numbers of hugetlb pages.

 - Matthew Wilcox's series "PageFlags cleanups" does that.

 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also. S390 is affected.

 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".

 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM
   Selftests".

 - Also, of course, many singleton patches to many things. Please see
   the individual changelogs for details.

* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  memtest: use {READ,WRITE}_ONCE in memory scanning
  mm: prohibit the last subpage from reusing the entire large folio
  mm: recover pud_leaf() definitions in nopmd case
  selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
  selftests/mm: skip uffd hugetlb tests with insufficient hugepages
  selftests/mm: dont fail testsuite due to a lack of hugepages
  mm/huge_memory: skip invalid debugfs new_order input for folio split
  mm/huge_memory: check new folio order when split a folio
  mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
  mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
  mm: fix list corruption in put_pages_list
  mm: remove folio from deferred split list before uncharging it
  filemap: avoid unnecessary major faults in filemap_fault()
  mm,page_owner: drop unnecessary check
  mm,page_owner: check for null stack_record before bumping its refcount
  mm: swap: fix race between free_swap_and_cache() and swapoff()
  mm/treewide: align up pXd_leaf() retval across archs
  mm/treewide: drop pXd_large()
  ...
2024-03-14 17:43:30 -07:00
Peter Xu
0a845e0f63 mm/treewide: replace pud_large() with pud_leaf()
pud_large() is always defined as pud_leaf().  Merge their usages.  Chose
pud_leaf() because pud_leaf() is a global API, while pud_large() is not.

Link: https://lkml.kernel.org/r/20240305043750.93762-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06 13:04:19 -08:00
Peter Xu
2f709f7bfd mm/treewide: replace pmd_large() with pmd_leaf()
pmd_large() is always defined as pmd_leaf().  Merge their usages.  Chose
pmd_leaf() because pmd_leaf() is a global API, while pmd_large() is not.

Link: https://lkml.kernel.org/r/20240305043750.93762-8-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06 13:04:19 -08:00
Alexander Gordeev
13ff094d32 s390/boot: fix minor comment style damages
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26 10:25:09 +01:00
Alexander Gordeev
923d48e480 s390/boot: do not check for zero-termination relocation entry
The relocation table is not expected to contain a zero-termination
entry. The existing check is likely a left-over from similar x86
code that uses zero-entries as delimiters. s390 does not have ones
and therefore the check could be avoided.

Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26 10:25:09 +01:00
Alexander Gordeev
4394a50792 s390/boot: make type of __vmlinux_relocs_64_start|end consistent
Make the type of __vmlinux_relocs_64_start|end symbols as
char array, just like it is done for all other sections.
Function rescue_relocs() is simplified as result.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26 10:25:09 +01:00
Alexander Gordeev
8495fd4dfe s390/boot: sanitize kaslr_adjust_relocs() function prototype
Do not use vmlinux.image_size within kaslr_adjust_relocs() function
to calculate the upper relocation table boundary. Instead, make both
lower and upper boundaries the function input parameters.

Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26 10:25:09 +01:00
Alexander Gordeev
3334fda639 s390/boot: simplify GOT handling
The end of GOT is calculated dynamically on boot. The size of GOT
is calculated on build from the start and end of GOT. Avoid both
calculations and use the end of GOT directly.

Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26 10:25:09 +01:00
Heiko Carstens
a795e5d234 s390: vmlinux.lds.S: fix .got.plt assertion
Naresh reported this build error on linux-next:

s390x-linux-gnu-ld: Unexpected GOT/PLT entries detected!
make[3]: *** [/builds/linux/arch/s390/boot/Makefile:87:
arch/s390/boot/vmlinux.syms] Error 1
make[3]: Target 'arch/s390/boot/bzImage' not remade because of errors.

The reason for the build error is an incorrect/incomplete assertion which
checks the size of the .got.plt section. Similar to x86 the size is either
zero or 24 bytes (three entries).

See commit 262b5cae67 ("x86/boot/compressed: Move .got.plt entries out of
the .got section") for more details. The three reserved/additional entries
for s390 are described in chapter 3.2.2 of the s390x ABI [1] (thanks to
Andreas Krebbel for pointing this out!).

[1] https://github.com/IBM/s390x-abi/releases/download/v1.6.1/lzsabi_s390x.pdf

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYvWp8TY-fMEvc3UhoVtoR_eM5VsfHj3+n+kexcfJJ+Cvw@mail.gmail.com
Fixes: 30226853d6 ("s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-25 18:01:09 +01:00
Nathan Chancellor
7f115ff4fc s390/boot: workaround current 'llvm-objdump -t -j ...' behavior
When building with OBJDUMP=llvm-objdump, there are a series of warnings
from the section comparisons that arch/s390/boot/Makefile performs
between vmlinux and arch/s390/boot/vmlinux:

  llvm-objdump: warning: section '.boot.preserved.data' mentioned in a -j/--section option, but not found in any input file
  llvm-objdump: warning: section '.boot.data' mentioned in a -j/--section option, but not found in any input file
  llvm-objdump: warning: section '.boot.preserved.data' mentioned in a -j/--section option, but not found in any input file
  llvm-objdump: warning: section '.boot.data' mentioned in a -j/--section option, but not found in any input file

The warning is a little misleading, as these sections do exist in the
input files. It is really pointing out that llvm-objdump does not match
GNU objdump's behavior of respecting '-j' / '--section' in combination
with '-t' / '--syms':

  $ s390x-linux-gnu-objdump -t -j .boot.data vmlinux.full

  vmlinux.full:     file format elf64-s390

  SYMBOL TABLE:
  0000000001951000 l     O .boot.data     0000000000003000 sclp_info_sccb
  00000000019550e0 l     O .boot.data     0000000000000001 sclp_info_sccb_valid
  00000000019550e2 g     O .boot.data     0000000000001000 early_command_line
  ...

  $ llvm-objdump -t -j .boot.data vmlinux.full

  vmlinux.full:   file format elf64-s390

  SYMBOL TABLE:
  0000000000100040 l     O .text  0000000000000010 dw_psw
  0000000000000000 l    df *ABS*  0000000000000000 main.c
  00000000001001b0 l     F .text  00000000000000c6 trace_event_raw_event_initcall_level
  0000000000100280 l     F .text  0000000000000100 perf_trace_initcall_level
  ...

It may be possible to change llvm-objdump's behavior to match GNU
objdump's behavior but the difficulty of that task has not yet been
explored. The combination of '$(OBJDUMP) -t -j' is not common in the
kernel tree on a whole, so workaround this tool difference by grepping
for the sections in the full symbol table output in a similar manner to
the sed invocation. This results in no visible change for GNU objdump
users while fixing the warnings for OBJDUMP=llvm-objdump, further
enabling use of LLVM=1 for ARCH=s390 with versions of LLVM that have
support for s390 in ld.lld and llvm-objcopy.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Closes: https://lore.kernel.org/20240219113248.16287-C-hca@linux.ibm.com/
Link: https://github.com/ClangBuiltLinux/linux/issues/859
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240220-s390-work-around-llvm-objdump-t-j-v1-1-47bb0366a831@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-22 16:06:56 +01:00
Josh Poimboeuf
778666df60 s390: compile relocatable kernel without -fPIE
On s390, currently kernel uses the '-fPIE' compiler flag for compiling
vmlinux.  This has a few problems:

  - It uses dynamic symbols (.dynsym), for which the linker refuses to
    allow more than 64k sections.  This can break features which use
    '-ffunction-sections' and '-fdata-sections', including kpatch-build
    [1] and Function Granular KASLR.

  - It unnecessarily uses GOT relocations, adding an extra layer of
    indirection for many memory accesses.

Instead of using '-fPIE', resolve all the relocations at link time and
then manually adjust any absolute relocations (R_390_64) during boot.

This is done by first telling the linker to preserve all relocations
during the vmlinux link.  (Note this is harmless: they are later
stripped in the vmlinux.bin link.)

Then use the 'relocs' tool to find all absolute relocations (R_390_64)
which apply to allocatable sections.  The offsets of those relocations
are saved in a special section which is then used to adjust the
relocations during boot.

(Note: For some reason, Clang occasionally creates a GOT reference, even
without '-fPIE'.  So Clang-compiled kernels have a GOT, which needs to
be adjusted.)

On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%.

[1] https://github.com/dynup/kpatch/issues/1284
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html

Compiler consideration:

Gcc recently implemented an optimization [2] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [3] in recent gcc versions. This option has to be used with
future gcc versions.

Older Clang lacks support for handling unaligned symbols generated
by kernel linker scripts when the kernel is built without -fPIE. However,
future versions of Clang will include support for the -munaligned-symbols
option. When the support is unavailable, compile the kernel with -fPIE
to maintain the existing behavior.

In addition to it:
move vmlinux.relocs to safe relocation

When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire
uncompressed vmlinux.bin is positioned in the bzImage decompressor
image at the default kernel LMA of 0x100000, enabling it to be executed
in-place. However, the size of .vmlinux.relocs could be large enough to
cause an overlap with the uncompressed kernel at the address 0x100000.
To address this issue, .vmlinux.relocs is positioned after the
.rodata.compressed in the bzImage. Nevertheless, in this configuration,
vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To
overcome that, move vmlinux.relocs to a safe location before clearing
.bss and handling relocs.

Compile warning fix from Sumanth Korikkar:

When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are
several warnings:

ld: warning: orphan section `.rela.iplt' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.head.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.init.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.rodata.cst8' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'

Orphan sections are sections that exist in an object file but don't have
a corresponding output section in the final executable. ld raises a
warning when it identifies such sections.

Eliminate the warning by placing all .rela orphan sections in .rela.dyn
and raise an error when size of .rela.dyn is greater than zero. i.e.
Dont just neglect orphan sections.

This is similar to adjustment performed in x86, where kernel is built
with -fno-PIE.
commit 5354e84598 ("x86/build: Add asserts for unwanted sections")

[sumanthk@linux.ibm.com: rebased Josh Poimboeuf patches and move
 vmlinux.relocs to safe location]
[hca@linux.ibm.com: merged compile warning fix from Sumanth]
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-4-sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20240219132734.22881-5-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:33 +01:00
Nathan Chancellor
9ea30fd166 s390/boot: add 'alloc' to info.bin .vmlinux.info section flags
When attempting to boot a kernel compiled with OBJCOPY=llvm-objcopy,
there is a crash right at boot:

  Out of memory allocating 6d7800 bytes 8 aligned in range 0:20000000
  Reserved memory ranges:
  0000000000000000 a394c3c30d90cdaf DECOMPRESSOR
  Usable online memory ranges (info source: sclp read info [3]):
  0000000000000000 0000000020000000
  Usable online memory total: 20000000 Reserved: a394c3c30d90cdaf Free: 0
  Call Trace:
  (sp:0000000000033e90 [<0000000000012fbc>] physmem_alloc_top_down+0x5c/0x104)
   sp:0000000000033f00 [<0000000000011d56>] startup_kernel+0x3a6/0x77c
   sp:0000000000033f60 [<00000000000100f4>] startup_normal+0xd4/0xd4

GNU objcopy does not have any issues. Looking at differences between the
object files in each build reveals info.bin does not get properly
populated with llvm-objcopy, which results in an empty .vmlinux.info
section.

  $ file {gnu,llvm}-objcopy/arch/s390/boot/info.bin
  gnu-objcopy/arch/s390/boot/info.bin:  data
  llvm-objcopy/arch/s390/boot/info.bin: empty

  $ llvm-readelf --section-headers {gnu,llvm}-objcopy/arch/s390/boot/vmlinux | rg 'File:|\.vmlinux\.info|\.decompressor\.syms'
  File: gnu-objcopy/arch/s390/boot/vmlinux
    [12] .vmlinux.info     PROGBITS        0000000000034000 035000 000078 00  WA  0   0  1
    [13] .decompressor.syms PROGBITS       0000000000034078 035078 000b00 00  WA  0   0  1
  File: llvm-objcopy/arch/s390/boot/vmlinux
    [12] .vmlinux.info     PROGBITS        0000000000034000 035000 000000 00  WA  0   0  1
    [13] .decompressor.syms PROGBITS       0000000000034000 035000 000b00 00  WA  0   0  1

Ulrich points out that llvm-objcopy only copies sections marked as alloc
with a binary output target, whereas the .vmlinux.info section is only
marked as load. Add 'alloc' in addition to 'load', so that both objcopy
implementations work properly:

  $ file {gnu,llvm}-objcopy/arch/s390/boot/info.bin
  gnu-objcopy/arch/s390/boot/info.bin:  data
  llvm-objcopy/arch/s390/boot/info.bin: data

  $ llvm-readelf --section-headers {gnu,llvm}-objcopy/arch/s390/boot/vmlinux | rg 'File:|\.vmlinux\.info|\.decompressor\.syms'
  File: gnu-objcopy/arch/s390/boot/vmlinux
    [12] .vmlinux.info     PROGBITS        0000000000034000 035000 000078 00  WA  0   0  1
    [13] .decompressor.syms PROGBITS       0000000000034078 035078 000b00 00  WA  0   0  1
  File: llvm-objcopy/arch/s390/boot/vmlinux
    [12] .vmlinux.info     PROGBITS        0000000000034000 035000 000078 00  WA  0   0  1
    [13] .decompressor.syms PROGBITS       0000000000034078 035078 000b00 00  WA  0   0  1

Closes: https://github.com/ClangBuiltLinux/linux/issues/1996
Link: 3c02cb7492
Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240216-s390-fix-boot-with-llvm-objcopy-v1-1-0ac623daf42b@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:33 +01:00
Nathan Chancellor
c0f98ea0e7 s390/boot: vmlinux.lds.S: handle commonly discarded sections
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are several series of warnings
from the various discardable sections that the kernel adds for build
purposes that are not needed at runtime:

  s390-linux-ld: warning: orphan section `.export_symbol' from `arch/s390/boot/decompressor.o' being placed in section `.export_symbol'
  s390-linux-ld: warning: orphan section `.discard.addressable' from `arch/s390/boot/decompressor.o' being placed in section `.discard.addressable'
  s390-linux-ld: warning: orphan section `.modinfo' from `arch/s390/boot/decompressor.o' being placed in section `.modinfo'

include/asm-generic/vmlinux.lds.h has a macro for easily discarding
these sections across the kernel named COMMON_DISCARDS, use it to clear
up the warnings.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-9-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:53 +01:00
Nathan Chancellor
6a4d37c886 s390/boot: vmlinux.lds.S: handle ELF required sections
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there is a warning around the '.comment'
section for each file in arch/s390/boot

  s390-linux-ld: warning: orphan section `.comment' from `arch/s390/boot/als.o' being placed in section `.comment'
  s390-linux-ld: warning: orphan section `.comment' from `arch/s390/boot/startup.o' being placed in section `.comment'
  s390-linux-ld: warning: orphan section `.comment' from `arch/s390/boot/physmem_info.o' being placed in section `.comment'

include/asm-generic/vmlinux.lds.h has a macro for required ELF sections
not related to debugging named ELF_DETAILS, use it to clear up the
warnings.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-8-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:53 +01:00
Nathan Chancellor
ba6c26af1e s390/boot: vmlinux.lds.S: handle DWARF debug sections
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are several series of warnings for
each file in arch/s390/boot due to the boot linker script not handling
the DWARF debug sections:

  s390-linux-ld: warning: orphan section `.debug_line' from `arch/s390/boot/head.o' being placed in section `.debug_line'
  s390-linux-ld: warning: orphan section `.debug_info' from `arch/s390/boot/head.o' being placed in section `.debug_info'
  s390-linux-ld: warning: orphan section `.debug_abbrev' from `arch/s390/boot/head.o' being placed in section `.debug_abbrev'
  s390-linux-ld: warning: orphan section `.debug_aranges' from `arch/s390/boot/head.o' being placed in section `.debug_aranges'
  s390-linux-ld: warning: orphan section `.debug_str' from `arch/s390/boot/head.o' being placed in section `.debug_str'

include/asm-generic/vmlinux.lds.h has a macro for DWARF debug sections
named DWARF_DEBUG, use it to clear up the warnings.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-7-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:52 +01:00
Nathan Chancellor
64d590a24f s390/boot: vmlinux.lds.S: handle '.rela' sections
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are several warnings from
arch/s390/boot/head.o due to the unhandled presence of '.rela' sections:

  s390-linux-ld: warning: orphan section `.rela.iplt' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.head.text' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.got' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.data' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.data.rel.ro' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.iplt' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.head.text' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.got' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.data' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'
  s390-linux-ld: warning: orphan section `.rela.data.rel.ro' from `arch/s390/boot/head.o' being placed in section `.rela.dyn'

These sections are unneeded for the decompressor and they are not
emitted in the binary currently. In a manner similar to other
architectures, coalesce the sections into '.rela.dyn' and ensure it is
zero sized, which is a safe/tested approach versus full discard.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-6-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:52 +01:00
Nathan Chancellor
b23ab303dd s390/boot: vmlinux.lds.S: handle '.init.text'
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there is a warning about the presence of
an '.init.text' section in arch/s390/boot:

  s390-linux-ld: warning: orphan section `.init.text' from `arch/s390/boot/sclp_early_core.o' being placed in section `.init.text'

arch/s390/boot/sclp_early_core.c includes a file from the main kernel
build, which picks up a usage of '__init' somewhere. For the
decompressed image, this section can just be coalesced into '.text'.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-5-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:52 +01:00
Nathan Chancellor
30226853d6 s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are a lot of warnings around the
GOT and PLT sections:

  s390-linux-ld: warning: orphan section `.plt' from `arch/s390/kernel/head64.o' being placed in section `.plt'
  s390-linux-ld: warning: orphan section `.got' from `arch/s390/kernel/head64.o' being placed in section `.got'
  s390-linux-ld: warning: orphan section `.got.plt' from `arch/s390/kernel/head64.o' being placed in section `.got.plt'
  s390-linux-ld: warning: orphan section `.iplt' from `arch/s390/kernel/head64.o' being placed in section `.iplt'
  s390-linux-ld: warning: orphan section `.igot.plt' from `arch/s390/kernel/head64.o' being placed in section `.igot.plt'

  s390-linux-ld: warning: orphan section `.iplt' from `arch/s390/boot/head.o' being placed in section `.iplt'
  s390-linux-ld: warning: orphan section `.igot.plt' from `arch/s390/boot/head.o' being placed in section `.igot.plt'
  s390-linux-ld: warning: orphan section `.got' from `arch/s390/boot/head.o' being placed in section `.got'

Currently, only the '.got' section is actually emitted in the final
binary. In a manner similar to other architectures, put the '.got'
section near the '.data' section and coalesce the PLT sections,
checking that the final section is zero sized, which is a safe/tested
approach versus full discard.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-3-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:52 +01:00
Nathan Chancellor
2151fd9a6d s390/boot: add support for CONFIG_LD_ORPHAN_WARN
arch/s390/boot/vmlinux uses a different linker script and build rules
than the main vmlinux, so the '--orphan-handling' flag is not applied to
it. Add support for '--orphan-handling' so that all sections are
properly described in the linker script, which helps eliminate bugs
between linker implementations having different orphan section
heuristics.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-1-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:52 +01:00
Alexander Gordeev
65f8780e2d s390/boot: always align vmalloc area on segment boundary
The size of vmalloc area depends from various factors
on boot and could be set to:

1. Default size as determined by VMALLOC_DEFAULT_SIZE macro;
2. One half of the virtual address space not occupied by
   modules and fixed mappings;
3. The size provided by user with vmalloc= kernel command
   line parameter;

In cases [1] and [2] the vmalloc area base address is aligned
on Region3 table type boundary, while in case [3] in might get
aligned on page boundary.

Limit the waste of page tables and always align vmalloc area
size and base address on segment boundary.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-22 14:07:28 +01:00
Heiko Carstens
a51324c430 s390/cmma: rework no-dat handling
Rework the way physical pages are set no-dat / dat:

The old way is:

- Rely on that all pages are initially marked "dat"
- Allocate page tables for the kernel mapping
- Enable dat
- Walk the whole kernel mapping and set PG_arch_1 bit in all struct pages
  that belong to pages of kernel page tables
- Walk all struct pages and test and clear the PG_arch_1 bit. If the bit is
  not set, set the page state to no-dat
- For all subsequent page table allocations, set the page state to dat
  (remove the no-dat state) on allocation time

Change this rather complex logic to a simpler approach:

- Set the whole physical memory (all pages) to "no-dat"
- Explicitly set those page table pages to "dat" which are part of the
  kernel image (e.g. swapper_pg_dir)
- For all subsequent page table allocations, set the page state to dat
  (remove the no-dat state) on allocation time

In result the code is simpler, and this also allows to get rid of one
odd usage of the PG_arch_1 bit.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-11-05 22:34:58 +01:00
Heiko Carstens
468a3bc2b7 s390/cmma: move parsing of cmma kernel parameter to early boot code
The "cmma=" kernel command line parameter needs to be parsed early for
upcoming changes. Therefore move the parsing code.

Note that EX_TABLE handling of cmma_test_essa() needs to be open-coded,
since the early boot code doesn't have infrastructure for handling expected
exceptions.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-11-05 22:34:57 +01:00
Linus Torvalds
e392ea4d4d Merge tag 's390-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:

 - Get rid of private VM_FAULT flags

 - Add word-at-a-time implementation

 - Add DCACHE_WORD_ACCESS support

 - Cleanup control register handling

 - Disallow CPU hotplug of CPU 0 to simplify its handling complexity,
   following a similar restriction in x86

 - Optimize pai crypto map allocation

 - Update the list of crypto express EP11 coprocessor operation modes

 - Fixes and improvements for secure guests AP pass-through

 - Several fixes to address incorrect page marking for address
   translation with the "cmma no-dat" feature, preventing potential
   incorrect guest TLB flushes

 - Fix early IPI handling

 - Several virtual vs physical address confusion fixes

 - Various small fixes and improvements all over the code

* tag 's390-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (74 commits)
  s390/cio: replace deprecated strncpy with strscpy
  s390/sclp: replace deprecated strncpy with strtomem
  s390/cio: fix virtual vs physical address confusion
  s390/cio: export CMG value as decimal
  s390: delete the unused store_prefix() function
  s390/cmma: fix handling of swapper_pg_dir and invalid_pg_dir
  s390/cmma: fix detection of DAT pages
  s390/sclp: handle default case in sclp memory notifier
  s390/pai_crypto: remove per-cpu variable assignement in event initialization
  s390/pai: initialize event count once at initialization
  s390/pai_crypto: use PERF_ATTACH_TASK define for per task detection
  s390/mm: add missing arch_set_page_dat() call to gmap allocations
  s390/mm: add missing arch_set_page_dat() call to vmem_crst_alloc()
  s390/cmma: fix initial kernel address space page table walk
  s390/diag: add missing virt_to_phys() translation to diag224()
  s390/mm,fault: move VM_FAULT_ERROR handling to do_exception()
  s390/mm,fault: remove VM_FAULT_BADMAP and VM_FAULT_BADACCESS
  s390/mm,fault: remove VM_FAULT_SIGNAL
  s390/mm,fault: remove VM_FAULT_BADCONTEXT
  s390/mm,fault: simplify kfence fault handling
  ...
2023-11-03 10:17:22 -10:00
Vasily Gorbik
19ba9ead8a s390/vmem: remove unused variable
Fix the follow warning reported by sparse:

arch/s390/boot/vmem.c:170:15: warning: unused variable ‘entry’ [-Wunused-variable]
  170 |         pte_t entry;
      |               ^~~~~

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-16 13:04:09 +02:00
Vasily Gorbik
327899674e s390/kasan: handle DCSS mapping in memory holes
When physical memory is defined under z/VM using DEF STOR CONFIG, there
may be memory holes that are not hotpluggable memory. In such cases,
DCSS mapping could be placed in one of these memory holes. Subsequently,
attempting memory access to such DCSS mapping would result in a kasan
failure because there is no shadow memory mapping for it.

To maintain consistency with cases where DCSS mapping is positioned after
the kernel identity mapping, which is then covered by kasan zero shadow
mapping, handle the scenario above by populating zero shadow mapping
for memory holes where DCSS mapping could potentially be placed.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-16 13:03:05 +02:00
Heiko Carstens
99441a38c3 s390: use control register bit defines
Use control register bit defines instead of plain numbers where
possible.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:57 +02:00
Heiko Carstens
527618abb9 s390/ctlreg: add struct ctlreg
Add struct ctlreg to enforce strict type checking / usage for control
register functions.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00
Heiko Carstens
2372d39142 s390/ctlreg: use local_ctl_load() and local_ctl_store() where possible
Convert all single control register usages of __local_ctl_load() and
__local_ctl_store() to local_ctl_load() and local_ctl_store().

This also requires to change the type of some struct lowcore members
from __u64 to unsigned long.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00
Heiko Carstens
8d5e98f8d6 s390/ctlreg: add local and system prefix to some functions
Add local and system prefix to some functions to clarify they change
control register contents on either the local CPU or the on all CPUs.

This results in the following API:

Two defines which load and save multiple control registers.
The defines correlate with the following C prototypes:

void __local_ctl_load(unsigned long *, unsigned int cr_low, unsigned int cr_high);
void __local_ctl_store(unsigned long *, unsigned int cr_low, unsigned int cr_high);

Two functions which locally set or clear one bit for a specified
control register:

void local_ctl_set_bit(unsigned int cr, unsigned int bit);
void local_ctl_clear_bit(unsigned int cr, unsigned int bit);

Two functions which set or clear one bit for a specified control
register on all CPUs:

void system_ctl_set_bit(unsigned int cr, unsigned int bit);
void system_ctl_clear_bit(unsigend int cr, unsigned int bit);

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00
Heiko Carstens
ebe1cd530f s390/ctlreg: rename ctl_reg.h to ctlreg.h
Rename ctl_reg.h to ctlreg.h so it matches not only ctlreg.c but also
other control register related function, union, and structure names,
which all come with a ctlreg prefix.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00
Heiko Carstens
0c4d01f395 s390/ctlreg: move control register code to separate file
Control register handling has nothing to do with low level SMP code.
Move it to a separate file.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00
Heiko Carstens
c0f1d47812 s390/mm: simplify kernel mapping setup
The kernel mapping is setup in two stages: in the decompressor map all
pages with RWX permissions, and within the kernel change all mappings to
their final permissions, where most of the mappings are changed from RWX to
RWNX.

Change this and map all pages RWNX from the beginning, however without
enabling noexec via control register modification. This means that
effectively all pages are used with RWX permissions like before. When the
final permissions have been applied to the kernel mapping enable noexec via
control register modification.

This allows to remove quite a bit of non-obvious code.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Heiko Carstens
b6f10e2f66 s390: remove "noexec" option
Do the same like x86 with commit 76ea0025a2 ("x86/cpu: Remove "noexec"")
and remove the "noexec" kernel command line option.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Alexander Gordeev
5cfdff02e9 s390/boot: fix multi-line comments style
Make multi-line comment style consistent across the source.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-16 15:13:03 +02:00
Alexander Gordeev
09cd4ffafb s390/boot: account Real Memory Copy and Lowcore areas
Real Memory Copy and (absolute) Lowcore areas are
not accounted when virtual memory layout is set up.

Fixes: 4df29d2b90 ("s390/smp: rework absolute lowcore access")
Fixes: 2f0e8aae26 ("s390/mm: rework memcpy_real() to avoid DAT-off mode")
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-16 15:13:02 +02:00
Alexander Gordeev
a984f27ec2 s390/mm: define Real Memory Copy size and mask macros
Make Real Memory Copy area size and mask explicit.
This does not bring any functional change and only
needed for clarity.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-16 15:13:02 +02:00
Alexander Gordeev
8ddccc8a7d s390/boot: cleanup number of page table levels setup
The separate vmalloc area size check against _REGION2_SIZE
is needed in case user provided insanely large value using
vmalloc= kernel command line parameter. That could lead to
overflow and selecting 3 page table levels instead of 4.

Use size_add() for the overflow check and get rid of the
extra vmalloc area check.

With the current values of CONFIG_MAX_PHYSMEM_BITS and
PAGES_PER_SECTION the sum of maximal possible size of
identity mapping and vmemmap area (derived from these
macros) plus modules area size MODULES_LEN can not
overflow. Thus, that sum is used as first addend while
vmalloc area size is second addend for size_add().

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-16 15:13:02 +02:00
Alexander Gordeev
e7e828ebeb s390/mm: get rid of VMEM_MAX_PHYS macro
There are no users of VMEM_MAX_PHYS macro left, remove it.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-24 12:12:24 +02:00
Alexander Gordeev
94fd522069 s390/mm: rework arch_get_mappable_range() callback
As per description in mm/memory_hotplug.c platforms should define
arch_get_mappable_range() that provides maximum possible addressable
physical memory range for which the linear mapping could be created.

The current implementation uses VMEM_MAX_PHYS macro as the maximum
mappable physical address and it is simply a cast to vmemmap. Since
the address is in physical address space the natural upper limit of
MAX_PHYSMEM_BITS is honoured:

	vmemmap_start = min(vmemmap_start, 1UL << MAX_PHYSMEM_BITS);

Further, to make sure the identity mapping would not overlay with
vmemmap, the size of identity mapping could be stripped like this:

	ident_map_size = min(ident_map_size, vmemmap_start);

Similarily, any other memory that could be added (e.g DCSS segment)
should not overlay with vmemmap as well and that is prevented by
using vmemmap (VMEM_MAX_PHYS macro) as the upper limit.

However, while the use of VMEM_MAX_PHYS brings the desired result
it actually poses two issues:

1. As described, vmemmap is handled as a physical address, although
   it is actually a pointer to struct page in virtual address space.

2. As vmemmap is a virtual address it could have been located
   anywhere in the virtual address space. However, the desired
   necessity to honour MAX_PHYSMEM_BITS limit prevents that.

Rework arch_get_mappable_range() callback in a way it does not
use VMEM_MAX_PHYS macro and does not confuse the notion of virtual
vs physical address spacees as result. That paves the way for moving
vmemmap elsewhere and optimizing the virtual address space layout.

Introduce max_mappable preserved boot variable and let function
setup_kernel_memory_layout() set it up. As result, the rest of the
code is does not need to know the virtual memory layout specifics.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-24 12:12:23 +02:00
Linus Torvalds
a452483508 Merge tag 's390-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:

 - Fix virtual vs physical address confusion in vmem_add_range() and
   vmem_remove_range() functions

 - Include <linux/io.h> instead of <asm/io.h> and <asm-generic/io.h>
   throughout s390 code

 - Make all PSW related defines also available for assembler files.
   Remove PSW_DEFAULT_KEY define from uapi for that

 - When adding an undefined symbol the build still succeeds, but
   userspace crashes trying to execute VDSO, because the symbol is not
   resolved. Add undefined symbols check to prevent that

 - Use kvmalloc_array() instead of kzalloc() for allocaton of 256k
   memory when executing s390 crypto adapter IOCTL

 - Add -fPIE flag to prevent decompressor misaligned symbol build error
   with clang

 - Use .balign instead of .align everywhere. This is a no-op for s390,
   but with this there no mix in using .align and .balign anymore

 - Filter out -mno-pic-data-is-text-relative flag when compiling kernel
   to prevent VDSO build error

 - Rework entering of DAT-on mode on CPU restart to use PSW_KERNEL_BITS
   mask directly

 - Do not retry administrative requests to some s390 crypto cards, since
   the firmware assumes replay attacks

 - Remove most of the debug code, which is build in when kernel config
   option CONFIG_ZCRYPT_DEBUG is enabled

 - Remove CONFIG_ZCRYPT_MULTIDEVNODES kernel config option and switch
   off the multiple devices support for the s390 zcrypt device driver

 - With the conversion to generic entry machine checks are accounted to
   the current context instead of irq time. As result, the STCKF
   instruction at the beginning of the machine check handler and the
   lowcore member are no longer required, therefore remove it

 - Fix various typos found with codespell

 - Minor cleanups to CPU-measurement Counter and Sampling Facilities
   code

 - Revert patch that removes VMEM_MAX_PHYS macro, since it causes a
   regression

* tag 's390-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (25 commits)
  Revert "s390/mm: get rid of VMEM_MAX_PHYS macro"
  s390/cpum_sf: remove check on CPU being online
  s390/cpum_sf: handle casts consistently
  s390/cpum_sf: remove unnecessary debug statement
  s390/cpum_sf: remove parameter in call to pr_err
  s390/cpum_sf: simplify function setup_pmu_cpu
  s390/cpum_cf: remove unneeded debug statements
  s390/entry: remove mcck clock
  s390: fix various typos
  s390/zcrypt: remove ZCRYPT_MULTIDEVNODES kernel config option
  s390/zcrypt: do not retry administrative requests
  s390/zcrypt: cleanup some debug code
  s390/entry: rework entering DAT-on mode on CPU restart
  s390/mm: fence off VM macros from asm and linker
  s390: include linux/io.h instead of asm/io.h
  s390/ptrace: make all psw related defines also available for asm
  s390/ptrace: remove PSW_DEFAULT_KEY from uapi
  s390/vdso: filter out mno-pic-data-is-text-relative cflag
  s390: consistently use .balign instead of .align
  s390/decompressor: fix misaligned symbol build error
  ...
2023-07-06 13:18:30 -07:00
Alexander Gordeev
54372cf043 Revert "s390/mm: get rid of VMEM_MAX_PHYS macro"
This reverts commit 456be42aa7.

The assumption VMEM_MAX_PHYS should match ident_map_size
is wrong. At least discontiguous saved segments (DCSS)
could be loaded at addresses beyond ident_map_size and
dcssblk device driver might fail as result.

Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-04 07:46:26 +02:00
Linus Torvalds
e8069f5a8e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Eager page splitting optimization for dirty logging, optionally
     allowing for a VM to avoid the cost of hugepage splitting in the
     stage-2 fault path.

   - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
     with services that live in the Secure world. pKVM intervenes on
     FF-A calls to guarantee the host doesn't misuse memory donated to
     the hyp or a pKVM guest.

   - Support for running the split hypervisor with VHE enabled, known as
     'hVHE' mode. This is extremely useful for testing the split
     hypervisor on VHE-only systems, and paves the way for new use cases
     that depend on having two TTBRs available at EL2.

   - Generalized framework for configurable ID registers from userspace.
     KVM/arm64 currently prevents arbitrary CPU feature set
     configuration from userspace, but the intent is to relax this
     limitation and allow userspace to select a feature set consistent
     with the CPU.

   - Enable the use of Branch Target Identification (FEAT_BTI) in the
     hypervisor.

   - Use a separate set of pointer authentication keys for the
     hypervisor when running in protected mode, as the host is untrusted
     at runtime.

   - Ensure timer IRQs are consistently released in the init failure
     paths.

   - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
     Traps (FEAT_EVT), as it is a register commonly read from userspace.

   - Erratum workaround for the upcoming AmpereOne part, which has
     broken hardware A/D state management.

  RISC-V:

   - Redirect AMO load/store misaligned traps to KVM guest

   - Trap-n-emulate AIA in-kernel irqchip for KVM guest

   - Svnapot support for KVM Guest

  s390:

   - New uvdevice secret API

   - CMM selftest and fixes

   - fix racy access to target CPU for diag 9c

  x86:

   - Fix missing/incorrect #GP checks on ENCLS

   - Use standard mmu_notifier hooks for handling APIC access page

   - Drop now unnecessary TR/TSS load after VM-Exit on AMD

   - Print more descriptive information about the status of SEV and
     SEV-ES during module load

   - Add a test for splitting and reconstituting hugepages during and
     after dirty logging

   - Add support for CPU pinning in demand paging test

   - Add support for AMD PerfMonV2, with a variety of cleanups and minor
     fixes included along the way

   - Add a "nx_huge_pages=never" option to effectively avoid creating NX
     hugepage recovery threads (because nx_huge_pages=off can be toggled
     at runtime)

   - Move handling of PAT out of MTRR code and dedup SVM+VMX code

   - Fix output of PIC poll command emulation when there's an interrupt

   - Add a maintainer's handbook to document KVM x86 processes,
     preferred coding style, testing expectations, etc.

   - Misc cleanups, fixes and comments

  Generic:

   - Miscellaneous bugfixes and cleanups

  Selftests:

   - Generate dependency files so that partial rebuilds work as
     expected"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
  Documentation/process: Add a maintainer handbook for KVM x86
  Documentation/process: Add a label for the tip tree handbook's coding style
  KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
  RISC-V: KVM: Remove unneeded semicolon
  RISC-V: KVM: Allow Svnapot extension for Guest/VM
  riscv: kvm: define vcpu_sbi_ext_pmu in header
  RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
  RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
  RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
  RISC-V: KVM: Add in-kernel emulation of AIA APLIC
  RISC-V: KVM: Implement device interface for AIA irqchip
  RISC-V: KVM: Skeletal in-kernel AIA irqchip support
  RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
  RISC-V: KVM: Add APLIC related defines
  RISC-V: KVM: Add IMSIC related defines
  RISC-V: KVM: Implement guest external interrupt line management
  KVM: x86: Remove PRIx* definitions as they are solely for user space
  s390/uv: Update query for secret-UVCs
  s390/uv: replace scnprintf with sysfs_emit
  s390/uvdevice: Add 'Lock Secret Store' UVC
  ...
2023-07-03 15:32:22 -07:00
Heiko Carstens
cada938a01 s390: fix various typos
Fix various typos found with codespell.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-03 11:19:42 +02:00
Heiko Carstens
27d45655fa s390: consistently use .balign instead of .align
The .align directive has inconsistent behavior across architectures. Use
.balign instead everywhere. This is a no-op for s390, but with this there
is no mix in using .align and .balign anymore.

Future code is supposed to use only .balign.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-28 13:57:09 +02:00
Alexander Gordeev
456be42aa7 s390/mm: get rid of VMEM_MAX_PHYS macro
VMEM_MAX_PHYS is supposed to be the highest physical
address that can be added to the identity mapping.
It should match ident_map_size, which has the same
meaning. However, unlike ident_map_size it is not
adjusted against various limiting factors (see the
comment to setup_ident_map_size() function). That
renders all checks against VMEM_MAX_PHYS invalid.

Further, VMEM_MAX_PHYS is currently set to vmemmap,
which is an address in virtual memory space. However,
it gets compared against physical addresses in various
locations. That works, because both address spaces
are the same on s390, but otherwise it is wrong.

Instead of fixing VMEM_MAX_PHYS misuse and semantics
just remove it.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-28 13:57:08 +02:00
Alexander Gordeev
3e8261003b s390/kasan: avoid short by one page shadow memory
Kernel Address Sanitizer uses 3 bits per byte to
encode memory. That is the number of bits the start
and end address of a memory range is shifted right
when the corresponding shadow memory is created for
that memory range.

The used memory mapping routine expects page-aligned
addresses, while the above described 3-bit shift might
turn the shadow memory range start and end boundaries
into non-page-aligned in case the size of the original
memory range is less than (PAGE_SIZE << 3). As result,
the resulting shadow memory range could be short on one
page.

Align on page boundary the start and end addresses when
mapping a shadow memory range and avoid the described
issue in the future.

Note, that does not fix a real problem, since currently
no virtual regions of size less than (PAGE_SIZE << 3)
exist.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20 19:52:13 +02:00
Steffen Eiden
db54dfc9f7 s390/uv: Update query for secret-UVCs
Update the query struct such that secret-UVC related
information can be parsed.
Add sysfs files for these new values.

'supp_add_secret_req_ver' notes the supported versions for the
Add Secret UVC. Bit 0 indicates that version 0x100 is supported,
bit 1 indicates 0x200, and so on.

'supp_add_secret_pcf' notes the supported plaintext flags for
the Add Secret UVC.

'supp_secret_types' notes the supported types of secrets.
Bit 0 indicates secret type 1, bit 1 indicates type 2, and so on.

'max_secrets' notes the maximum amount of secrets the secret store can
store per pv guest.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20230615100533.3996107-8-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <20230615100533.3996107-8-seiden@linux.ibm.com>
2023-06-16 11:08:09 +02:00
Heiko Carstens
81e8479649 s390/mm: fix direct map accounting
Commit bb1520d581 ("s390/mm: start kernel with DAT enabled") did not
implement direct map accounting in the early page table setup code. In
result the reported values are bogus now:

$cat /proc/meminfo
...
DirectMap4k:        5120 kB
DirectMap1M:    18446744073709546496 kB
DirectMap2G:           0 kB

Fix this by adding the missing accounting. The result looks sane again:

$cat /proc/meminfo
...
DirectMap4k:        6156 kB
DirectMap1M:     2091008 kB
DirectMap2G:     6291456 kB

Fixes: bb1520d581 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Heiko Carstens
07fdd6627f s390/mm: rename POPULATE_ONE2ONE to POPULATE_DIRECT
Architectures generally use the "direct map" wording for mapping the whole
physical memory. Use that wording as well in arch/s390/boot/vmem.c, instead
of "one to one" in order to avoid confusion.

This also matches what is already done in arch/s390/mm/vmem.c.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00