Commit Graph

1919 Commits

Author SHA1 Message Date
Uwe Kleine-König
d27cb32e00 EDAC/dmc520: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-8-u.kleine-koenig@pengutronix.de
2023-11-20 22:01:16 +01:00
Uwe Kleine-König
0576ded05b EDAC/cpc925: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-7-u.kleine-koenig@pengutronix.de
2023-11-20 21:56:49 +01:00
Uwe Kleine-König
d8d9f99fd0 EDAC/cell: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-6-u.kleine-koenig@pengutronix.de
2023-11-20 21:39:13 +01:00
Uwe Kleine-König
a5347591eb EDAC/bluefield: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-5-u.kleine-koenig@pengutronix.de
2023-11-20 21:28:08 +01:00
Uwe Kleine-König
2546fffd91 EDAC/aspeed: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-4-u.kleine-koenig@pengutronix.de
2023-11-20 21:26:01 +01:00
Uwe Kleine-König
5aafd02da7 EDAC/armada_xp: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-3-u.kleine-koenig@pengutronix.de
2023-11-20 20:07:53 +01:00
Uwe Kleine-König
b73e11c873 EDAC/altera: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-2-u.kleine-koenig@pengutronix.de
2023-11-20 19:43:18 +01:00
Rob Herring
1b09892962 EDAC/altera: Use device_get_match_data()
Use preferred device_get_match_data() instead of of_match_device() to
get the driver match data. With this, adjust the includes to explicitly
include the correct headers.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231115210201.3743564-1-robh@kernel.org
2023-11-20 09:59:16 +01:00
Linus Torvalds
befaa609f4 Merge tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
 "One of the more voluminous set of changes is for adding the new
  __counted_by annotation[1] to gain run-time bounds checking of
  dynamically sized arrays with UBSan.

   - Add LKDTM test for stuck CPUs (Mark Rutland)

   - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo)

   - Refactor more 1-element arrays into flexible arrays (Gustavo A. R.
     Silva)

   - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem
     Shaikh)

   - Convert group_info.usage to refcount_t (Elena Reshetova)

   - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva)

   - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas
     Bulwahn)

   - Fix randstruct GCC plugin performance mode to stay in groups (Kees
     Cook)

   - Fix strtomem() compile-time check for small sources (Kees Cook)"

* tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits)
  hwmon: (acpi_power_meter) replace open-coded kmemdup_nul
  reset: Annotate struct reset_control_array with __counted_by
  kexec: Annotate struct crash_mem with __counted_by
  virtio_console: Annotate struct port_buffer with __counted_by
  ima: Add __counted_by for struct modsig and use struct_size()
  MAINTAINERS: Include stackleak paths in hardening entry
  string: Adjust strtomem() logic to allow for smaller sources
  hardening: x86: drop reference to removed config AMD_IOMMU_V2
  randstruct: Fix gcc-plugin performance mode to stay in group
  mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by
  drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by
  irqchip/imx-intmux: Annotate struct intmux_data with __counted_by
  KVM: Annotate struct kvm_irq_routing_table with __counted_by
  virt: acrn: Annotate struct vm_memory_region_batch with __counted_by
  hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by
  sparc: Annotate struct cpuinfo_tree with __counted_by
  isdn: kcapi: replace deprecated strncpy with strscpy_pad
  isdn: replace deprecated strncpy with strscpy
  NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by
  nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by
  ...
2023-10-30 19:09:55 -10:00
Shubhrajyoti Datta
6f15b178cd EDAC/versal: Add a Xilinx Versal memory controller driver
Add a EDAC driver for the RAS capabilities on the Xilinx integrated DDR
Memory Controllers (DDRMCs) which support both DDR4 and LPDDR4/4X memory
interfaces. It has four programmable Network-on-Chip (NoC) interface
ports and is designed to handle multiple streams of traffic. The driver
reports correctable and uncorrectable errors, and also creates debugfs
entries for testing through error injection.

  [ bp:
   - Add a pointer to the documentation about the register unlock code.
   - Squash in a fix for a Smatch static checker issue as reported by
     Dan Carpenter:
     https://lore.kernel.org/r/a4db6f93-8e5f-4d55-a7b8-b5a987d48a58@moroto.mountain
  ]

Co-developed-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com>
Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com>
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231005101242.14621-3-shubhrajyoti.datta@amd.com
2023-10-23 19:41:27 +02:00
Justin Stitt
6b343a4642 EDAC/mc_sysfs: Replace deprecated strncpy() with memcpy()
`strncpy` is deprecated for use on NUL-terminated destination strings [1].

We've already calculated bounds, possible truncation with '\0' or '\n'
and manually NUL-terminated. The situation is now just a literal byte
copy from one buffer to another, let's treat it as such and use a less
ambiguous interface in memcpy.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230918-strncpy-drivers-edac-edac_mc_sysfs-c-v4-1-38a23d2fcdd8@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-09-29 14:48:32 -07:00
Linus Torvalds
bb511d4b25 Merge tag 'edac_updates_for_v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull intel EDAC fixes from Tony Luck:

 - Old igen6 driver could lose pending events during initialization

 - Sapphire Rapids workstations have fewer memory controllers than their
   bigger siblings. This confused the driver.

* tag 'edac_updates_for_v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/igen6: Fix the issue of no error events
  EDAC/i10nm: Skip the absent memory controllers
2023-08-30 19:23:00 -07:00
Linus Torvalds
ef2a0b7cdb Merge tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree include cleanups from Rob Herring:
 "These are the remaining few clean-ups of DT related includes which
  didn't get applied to subsystem trees"

* tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  ipmi: Explicitly include correct DT includes
  tpm: Explicitly include correct DT includes
  lib/genalloc: Explicitly include correct DT includes
  parport: Explicitly include correct DT includes
  sbus: Explicitly include correct DT includes
  mux: Explicitly include correct DT includes
  macintosh: Explicitly include correct DT includes
  hte: Explicitly include correct DT includes
  EDAC: Explicitly include correct DT includes
  clocksource: Explicitly include correct DT includes
  sparc: Explicitly include correct DT includes
  riscv: Explicitly include correct DT includes
2023-08-30 17:04:28 -07:00
Linus Torvalds
1a7c611546 Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups & fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
2023-08-28 16:35:01 -07:00
Rob Herring
408d808893 EDAC: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it was merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Link: https://lore.kernel.org/r/20230714174434.4054728-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2023-08-28 13:31:01 -05:00
Avadhut Naik
c4d07c3712 EDAC/amd64: Add support for AMD family 1Ah models 00h-1Fh and 40h-4Fh
Add support for family 1Ah-based models 00h-1Fh and 40h-4Fh.

  [ bp: Simplify. ]

Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230809035244.2722455-4-avadhut.naik@amd.com
2023-08-10 14:25:21 +02:00
Peter Zijlstra
0cfd8fbadd x86/cpu: Fix Crestmont uarch
Sierra Forest and Grand Ridge are both E-core only using Crestmont
micro-architecture, They fit the pre-existing naming scheme prefectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230807150405.757666627@infradead.org
2023-08-09 21:51:06 +02:00
Qiuxu Zhuo
ce53ad81ed EDAC/igen6: Fix the issue of no error events
Current igen6_edac checks for pending errors before the registration
of the error handler. However, there is a possibility that the error
occurs during the registration process, leading to unhandled pending
errors and no future error events. This issue can be reproduced by
repeatedly injecting errors during the loading of the igen6_edac.

Fix this issue by moving the pending error handler after the registration
of the error handler, ensuring that no pending errors are left unhandled.

Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: Ee Wey Lim <ee.wey.lim@intel.com>
Tested-by: Ee Wey Lim <ee.wey.lim@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230725080427.23883-1-qiuxu.zhuo@intel.com
2023-08-02 13:09:56 -07:00
Qiuxu Zhuo
c545f5e412 EDAC/i10nm: Skip the absent memory controllers
Some Sapphire Rapids workstations' absent memory controllers
still appear as PCIe devices that fool the i10nm_edac driver
and result in "shift exponent -66 is negative" call traces
from skx_get_dimm_info().

Skip the absent memory controllers to avoid the call traces.

Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Closes: https://lore.kernel.org/linux-edac/CAAd53p41Ku1m1rapeqb1xtD+kKuk+BaUW=dumuoF0ZO3GhFjFA@mail.gmail.com/T/#m5de16dce60a8c836ec235868c7c16e3fefad0cc2
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Koba Ko <koba.ko@canonical.com>
Closes: https://lore.kernel.org/linux-edac/SA1PR11MB71305B71CCCC3D9305835202892AA@SA1PR11MB7130.namprd11.prod.outlook.com/T/#t
Tested-by: Koba Ko <koba.ko@canonical.com>
Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230710013232.59712-1-qiuxu.zhuo@intel.com
2023-07-24 08:57:26 -07:00
Linus Torvalds
aa35a4835e Merge tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:

 - Add initial support for RAS hardware found on AMD server GPUs (MI200).

   Those GPUs and CPUs are connected together through the coherent
   fabric and the GPU memory controllers report errors through x86's MCA
   so EDAC needs to support them. The amd64_edac driver supports now HBM
   (High Bandwidth Memory) and thus such heterogeneous memory controller
   systems

 - Other small cleanups and improvements

* tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/amd64: Cache and use GPU node map
  EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
  EDAC/amd64: Document heterogeneous system enumeration
  x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
  x86/amd_nb: Re-sort and re-indent PCI defines
  x86/amd_nb: Add MI200 PCI IDs
  ras/debugfs: Fix error checking for debugfs_create_dir()
  x86/MCE: Check a hw error's address to determine proper recovery action
2023-06-26 15:09:18 -07:00
Linus Torvalds
e5ce2f196f Merge tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:

 - amd64_edac: Add support for Zen4 client hardware

 - amd64_edac: Remove the version string as it is useless and actively
   confusing when looking at backported versions of the driver

 - Add a driver for the Nuvoton NPCM memory controller

 - A debugfs error checking cleanup

* tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/npcm: Add NPCM memory controller driver
  dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller
  EDAC/thunderx: Check debugfs file creation retval properly
  EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
  EDAC/amd64: Remove module version string
2023-06-26 15:06:42 -07:00
Yazen Ghannam
4251566ebc EDAC/amd64: Cache and use GPU node map
AMD systems have historically provided an "AMD Node ID" that is a unique
identifier for each die in a multi-die package. This was associated with
a unique instance of the AMD Northbridge on a legacy system. And now it
is associated with a unique instance of the AMD Data Fabric on modern
systems. Each instance is referred to as a "Node"; this is an
AMD-specific term not to be confused with NUMA nodes.

The data fabric provides a number of interfaces accessible through a set
of functions in a single PCI device. There is one PCI device per Data
Fabric (AMD Node), and multi-die systems will see multiple such PCI
devices. The AMD Node ID matches a Node's position in the PCI hierarchy.
For example, the Node 0 is accessed using the first PCI device, Node 1
is accessed using the second, and so on. A logical CPU can find its AMD
Node ID using CPUID. Furthermore, the AMD Node ID is used within the
hardware fabric, so it is not purely a logical value.

Heterogeneous AMD systems, with a CPU Data Fabric connected to GPU data
fabrics, follow a similar convention. Each CPU and GPU die has a unique
AMD Node ID value, and each Node ID corresponds to PCI devices in
sequential order.

However, there are two caveats:
1) GPUs are not x86, and they don't have CPUID to read their AMD Node ID
like on CPUs. This means the value is more implicit and based on PCI
enumeration and hardware-specifics.
2) There is a gap in the hardware values for AMD Node IDs. Values 0-7
are for CPUs and values 8-15 are for GPUs.

For example, a system with one CPU die and two GPUs dies will have the
following values:
  CPU0 -> AMD Node 0
  GPU0 -> AMD Node 8
  GPU1 -> AMD Node 9

EDAC is the only subsystem where this has a practical effect. Memory
errors on AMD systems are commonly reported through MCA to a CPU on the
local AMD Node. The error information is passed along to EDAC where the
AMD EDAC modules use the AMD Node ID of reporting logical CPU to access
AMD Node information.

However, memory errors from a GPU die will be reported to the CPU die.
Therefore, the logical CPU's AMD Node ID can't be used since it won't
match the AMD Node ID of the GPU die. The AMD Node ID of the GPU die is
provided as part of the MCA information, and the value will match the
hardware enumeration (e.g. 8-15).

Handle this situation by discovering GPU dies the same way as CPU dies
in the AMD NB code. But do a "node id" fixup in AMD64 EDAC where it's
needed.

The GPU data fabrics provide a register with the base AMD Node ID for
their local "type", i.e. GPU data fabric. This value is the same for all
fabrics of the same type in a system.

Read and cache the base AMD Node ID from one of the GPU devices during
module initialization. Use this to fixup the "node id" when reporting
memory errors at runtime.

  [ bp: Squash a fix making gpu_node_map static as reported by
        Tom Rix <trix@redhat.com>.
    Link: https://lore.kernel.org/r/20230610210930.174074-1-trix@redhat.com ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-6-muralimk@amd.com
2023-06-19 13:01:44 +02:00
Borislav Petkov (AMD)
852667c317 Merge ras/edac-drivers into for-next
* ras/edac-drivers:
  EDAC/npcm: Add NPCM memory controller driver
  dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-06-12 15:15:36 +02:00
Marvin Lin
d244c610f1 EDAC/npcm: Add NPCM memory controller driver
Add driver for memory controller present on Nuvoton NPCM SoCs. The
memory controller supports single bit error correction and double bit
error detection.

Signed-off-by: Marvin Lin <milkfafa@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230111093245.318745-4-milkfafa@gmail.com
2023-06-12 15:14:10 +02:00
Borislav Petkov (AMD)
0a81fa5d74 Merge ras/edac-misc into for-next
* ras/edac-misc:
  EDAC/thunderx: Check debugfs file creation retval properly

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-06-07 10:50:07 +02:00
Yeqi Fu
bf5c04ddd3 EDAC/thunderx: Check debugfs file creation retval properly
edac_debugfs_create_file() returns ERR_PTR by way of the respective
debugfs function it calls, if an error occurs.

The appropriate way to verify for errors is to use IS_ERR(). Do so.

  [ bp: Rewrite all text. ]

Signed-off-by: Yeqi Fu <asuk4.q@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230517173111.365787-1-asuk4.q@gmail.com
2023-06-06 23:04:56 +02:00
Muralidhara M K
9c42edd571 EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
AMD Family 19h Model 30h-3Fh systems can be connected to AMD MI200
accelerator/GPU devices such that the CPU and GPU data fabrics are
connected together. In this configuration, the CPU manages error logging
and reporting for MCA banks located on the GPUs. This includes HBM memory
errors reported from Unified Memory Controllers (UMCs) on the GPUs.
The GPU memory errors are handled like CPU memory errors.

AMD CPU UMC support in EDAC can be re-used for GPU UMC support. However,
keeping them separate means drastic changes in one path (e.g. to support
newer products) should have less impact on the other path.

Also, simplify the "gpu_" helper functions where possible. GPU product
configuration, like memory type and channel count, is fixed compared to
CPU products.

GPU UMCs each have four physical connections (phys) connected to eight
channels. There is a single "chip select". This differs from CPUs where
each UMC has one physical connection connected to one channel, and each
channel has up to four "chip selects".

Enumerate each UMC "phy" as an EDAC CSROW, since there is only a single
chip select for each physical connection. This is similar to how a CPU
UMC "phy" is enumerated as an EDAC CHANNEL, since there is only a single
channel for each physical connection.

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-5-muralimk@amd.com
2023-06-05 12:27:18 +02:00
Yazen Ghannam
c35977b00f x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
The MI200 (Aldebaran) series of devices introduced a new SMCA bank type
for Unified Memory Controllers. The MCE subsystem already has support
for this new type. The MCE decoder module will decode the common MCA
error information for the new bank type, but it will not pass the
information to the AMD64 EDAC module for detailed memory error decoding.

Have the MCE decoder module recognize the new bank type as an SMCA UMC
memory error and pass the MCA information to AMD64 EDAC.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com
2023-06-05 12:27:11 +02:00
Manivannan Sadhasivam
cbd77119b6 EDAC/qcom: Get rid of hardcoded register offsets
The LLCC EDAC register offsets varies between each SoC. Hardcoding the
register offsets won't work and will often result in crash due to
accessing the wrong locations.

Hence, get the register offsets from the LLCC driver matching the
individual SoCs.

Cc: <stable@vger.kernel.org> # 6.0: 5365cea199 ("soc: qcom: llcc: Rename reg_offset structs to reflect LLCC version")
Cc: <stable@vger.kernel.org> # 6.0: c13d7d261e ("soc: qcom: llcc: Pass LLCC version based register offsets to EDAC driver")
Cc: <stable@vger.kernel.org> # 6.0
Fixes: a6e9d7ef25 ("soc: qcom: llcc: Add configuration data for SM8450 SoC")
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230517114635.76358-3-manivannan.sadhasivam@linaro.org
2023-05-26 20:56:55 -07:00
Manivannan Sadhasivam
3d49f7406b EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
"ret" variable will be assigned on both success and failure cases. So there
is no need to initialize it during start of qcom_llcc_core_setup().

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230517114635.76358-2-manivannan.sadhasivam@linaro.org
2023-05-26 20:56:54 -07:00
Hristo Venev
6c79e42169 EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
instead of 12.

With two 32GB dual-rank DIMMs the sizes appear to be reported correctly:

  EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
  EDAC amd64: F19h_M60h detected (node 0).
  EDAC MC: UMC0 chip selects:
  EDAC amd64: MC: 0:     0MB 1:     0MB
  EDAC amd64: MC: 2: 16384MB 3: 16384MB
  EDAC MC: UMC1 chip selects:
  EDAC amd64: MC: 0:     0MB 1:     0MB
  EDAC amd64: MC: 2: 16384MB 3: 16384MB
  AMD64 EDAC driver v3.5.0

ECC errors can also be detected:

  mce: [Hardware Error]: Machine check events logged
  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b
  [Hardware Error]: Error Addr: 0x00000007ff7e93c0
  [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203
  [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
  EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1)
  [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

According to Mario Limonciello, the same code should also work for
models 70h-7Fh (follow thread in Link).

  [ bp: Massage, the translation logic updates are pending. ]

Signed-off-by: Hristo Venev <hristo@venev.name>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230425201239.324476-1-hristo@venev.name
Link: https://lore.kernel.org/r/20230511174506.875153-2-hristo@venev.name
2023-05-15 16:32:47 +02:00
Yazen Ghannam
b34348a0d7 EDAC/amd64: Remove module version string
The AMD64 EDAC module version information is not exposed through ABI
like MODULE_VERSION(). Instead it is printed during module init.

Version numbers can be confusing in cases where module updates are
partly backported resulting in a difference between upstream and
backported module versions.

Remove the AMD64 EDAC module version information to avoid user
confusion.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230410190959.3367528-1-yazen.ghannam@amd.com
2023-05-10 15:49:52 +02:00
Linus Torvalds
556eb8b791 Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds
a907047732 Merge tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
 "The most notable updates this time are for Qualcomm Snapdragon
  platforms. The Inline-Crypto-Engine gets a new DT binding and driver,
  and a number of drivers now support additional Snapdragon variants, in
  particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc
  (edac) and rpm drivers get notable functionality updates.

  Updates on other platforms include:

   - Various updates to the Mediatek mutex and mmsys drivers, including
     support for the Helio X10 SoC

   - Support for unidirectional mailbox channels in Arm SCMI firmware

   - Support for per cpu asynchronous notification in OP-TEE firmware

   - Minor updates for memory controller drivers.

   - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
     Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
     SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
     obsolete DT driver interfaces"

* tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
  soc: ti: smartreflex: Simplify getting the opam_sr pointer
  bus: vexpress-config: Add explicit of_platform.h include
  soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS
  memory: mtk-smi: mt8365: Add SMI Support
  dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365
  dt-bindings: memory-controllers: mediatek,smi-common: add mt8365
  memory: tegra: read values from correct device
  dt-bindings: crypto: Add Qualcomm Inline Crypto Engine
  soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver
  dt-bindings: firmware: document Qualcomm QCM2290 SCM
  soc: qcom: rpmh-rsc: Support RSC v3 minor versions
  soc: qcom: smd-rpm: Use GFP_ATOMIC in write path
  soc/tegra: fuse: Remove nvmem root only access
  soc/tegra: cbb: tegra194: Use of_address_count() helper
  soc/tegra: cbb: Remove MODULE_LICENSE in non-modules
  ARM: tegra: Remove MODULE_LICENSE in non-modules
  soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource()
  soc: tegra: cbb: Drop empty platform remove function
  firmware: arm_scmi: Add support for unidirectional mailbox channels
  dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels
  ...
2023-04-25 12:02:16 -07:00
Borislav Petkov (AMD)
ce8ac91130 Merge branches 'edac-drivers', 'edac-amd64' and 'edac-misc' into edac-updates
Combine all queued EDAC changes for submission into v6.4:

* ras/edac-drivers:
  EDAC/i10nm: Add Intel Sierra Forest server support
  EDAC/skx: Fix overflows on the DRAM row address mapping arrays

* ras/edac-amd64: (27 commits)
  EDAC/amd64: Fix indentation in umc_determine_edac_cap()
  EDAC/amd64: Add get_err_info() to pvt->ops
  EDAC/amd64: Split dump_misc_regs() into dct/umc functions
  EDAC/amd64: Split init_csrows() into dct/umc functions
  EDAC/amd64: Split determine_edac_cap() into dct/umc functions
  EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
  EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
  EDAC/amd64: Split ecc_enabled() into dct/umc functions
  EDAC/amd64: Split read_mc_regs() into dct/umc functions
  EDAC/amd64: Split determine_memory_type() into dct/umc functions
  EDAC/amd64: Split read_base_mask() into dct/umc functions
  EDAC/amd64: Split prep_chip_selects() into dct/umc functions
  EDAC/amd64: Rework hw_info_{get,put}
  EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
  EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
  EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
  EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
  EDAC/amd64: Rename debug_display_dimm_sizes()

* ras/edac-misc:
  EDAC/altera: Remove MODULE_LICENSE in non-module
  EDAC: Sanitize MODULE_AUTHOR strings
  EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
  EDAC/i5100: Fix typo in comment
  EDAC/altera: Remove redundant error logging

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-04-24 09:14:30 +02:00
Qiuxu Zhuo
96ae3995c6 EDAC/i10nm: Add Intel Sierra Forest server support
The Sierra Forest CPU model uses similar memory controller registers as
Granite Rapids server. Add Sierra Forest CPU model ID for EDAC support.

Tested-by: Li Zhang <li4.zhang@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230410131531.11914-1-qiuxu.zhuo@intel.com
2023-04-10 09:33:51 -07:00
Yang Li
49aba1c589 EDAC/amd64: Fix indentation in umc_determine_edac_cap()
Use consistent indentation to improve the readability and fix:

  drivers/edac/amd64_edac.c:1279 umc_determine_edac_cap() warn: inconsistent indenting

Fixes: f6a4b4a1aa ("EDAC/amd64: Split determine_edac_cap() into dct/umc functions")
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230404022557.46409-1-yang.lee@linux.alibaba.com
2023-04-04 17:22:55 +02:00
Nick Alcock
e088d80e2a EDAC/altera: Remove MODULE_LICENSE in non-module
Since

  8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"),

MODULE_LICENSE declarations are used to identify modules. As
a consequence, uses of the macro in non-modules will cause modprobe to
misidentify their containing object file as a module when it is not
(false positives), and modprobe might succeed rather than failing with
a suitable error message.

altera_edac is not a module for a while now, remove the macro call.

Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230217141059.392471-24-nick.alcock@oracle.com
2023-04-01 13:18:50 +02:00
Borislav Petkov (AMD)
371b27f2f3 EDAC: Sanitize MODULE_AUTHOR strings
Fixup the remaining MODULE_AUTHOR strings to not contain newlines.
Shorten and unbreak others.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230328134309.23159-1-bp@alien8.de
2023-03-28 15:43:30 +02:00
Jonathan Neuschäfer
01db1030f1 EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
MODULE_AUTHOR strings don't usually include a newline character.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230129165054.1675554-1-j.neuschaefer@gmx.net
2023-03-28 15:26:52 +02:00
Muralidhara M K
b3ece3a6a2 EDAC/amd64: Add get_err_info() to pvt->ops
GPU Nodes will use a different method to determine the chip select
and channel of an error. A function pointer should be used rather than
introduce another branching condition.

Prepare for this by adding get_err_info() to pvt->ops. This function is
only called from the modern code path, so a legacy function is not
defined.

Make sure to call this after MCA_STATUS[SyndV] is checked, since the
csrow value is found in MCA_SYND.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-23-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K
f6f36382d6 EDAC/amd64: Split dump_misc_regs() into dct/umc functions
Add a function pointer to pvt->ops.

No functional change is intended.

  [ Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-22-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K
6fb8b5fb9e EDAC/amd64: Split init_csrows() into dct/umc functions
Call them from their respective setup_mci_misc_attrs() paths.

Also, drop the check for an "empty" device, i.e. one without memory.
This is redundant and already done in instance_has_memory() earlier in
the init path.

No functional change is intended.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-21-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K
f6a4b4a1aa EDAC/amd64: Split determine_edac_cap() into dct/umc functions
Call them from their respective setup_mci_misc_attrs() paths.

No functional change is intended.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-20-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Yazen Ghannam
9369239e8d EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
...to match the "umc_" prefix convention.

No functional change is intended.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-19-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
0a42a37f65 EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
The init_one_instance() path is shared between legacy and modern
systems. So add the new functions to a function pointer in pvt->ops.

No functional change is intended.

  [ Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-18-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
eb2bcdfc37 EDAC/amd64: Split ecc_enabled() into dct/umc functions
Call them using a function pointer in pvt->ops. The "ECC enabled"
check is done outside of the hardware information gathering done in
hw_info_get(). So a high-level function pointer is needed to separate
the legacy and modern paths.

No functional change is intended.

  [Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-17-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
32ecdf8688 EDAC/amd64: Split read_mc_regs() into dct/umc functions
Call them from their respective hw_info_get() paths.

ECC symbol size is not needed on UMC systems, so determine_ecc_sym_sz()
is left out of the UMC path. Do not save TOP_MEM* values on modern
controllers because they're not needed there (read: they were used only
for debugging, if anything).

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-16-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
78ec161a91 EDAC/amd64: Split determine_memory_type() into dct/umc functions
Call them from their respective hw_info_get() paths.

Call them after all other hardware registers have been saved, since the
memory type for a device will be determined based on the saved
information.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-15-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
b29dad9bf3 EDAC/amd64: Split read_base_mask() into dct/umc functions
Call them from their respective hw_info_get() paths.

Call the new functions after the setting the chip select base and mask
counts, since those are need to read the correct number of chip select
base and mask registers. And call the new functions before the remaining
set up, because the base and mask register values will be needed later.

  [Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-14-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00