The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-8-u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-7-u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-6-u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-5-u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-4-u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-3-u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-2-u.kleine-koenig@pengutronix.de
Pull hardening updates from Kees Cook:
"One of the more voluminous set of changes is for adding the new
__counted_by annotation[1] to gain run-time bounds checking of
dynamically sized arrays with UBSan.
- Add LKDTM test for stuck CPUs (Mark Rutland)
- Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo)
- Refactor more 1-element arrays into flexible arrays (Gustavo A. R.
Silva)
- Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem
Shaikh)
- Convert group_info.usage to refcount_t (Elena Reshetova)
- Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva)
- Add Kconfig fragment for basic hardening options (Kees Cook, Lukas
Bulwahn)
- Fix randstruct GCC plugin performance mode to stay in groups (Kees
Cook)
- Fix strtomem() compile-time check for small sources (Kees Cook)"
* tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits)
hwmon: (acpi_power_meter) replace open-coded kmemdup_nul
reset: Annotate struct reset_control_array with __counted_by
kexec: Annotate struct crash_mem with __counted_by
virtio_console: Annotate struct port_buffer with __counted_by
ima: Add __counted_by for struct modsig and use struct_size()
MAINTAINERS: Include stackleak paths in hardening entry
string: Adjust strtomem() logic to allow for smaller sources
hardening: x86: drop reference to removed config AMD_IOMMU_V2
randstruct: Fix gcc-plugin performance mode to stay in group
mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by
drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by
irqchip/imx-intmux: Annotate struct intmux_data with __counted_by
KVM: Annotate struct kvm_irq_routing_table with __counted_by
virt: acrn: Annotate struct vm_memory_region_batch with __counted_by
hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by
sparc: Annotate struct cpuinfo_tree with __counted_by
isdn: kcapi: replace deprecated strncpy with strscpy_pad
isdn: replace deprecated strncpy with strscpy
NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by
nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by
...
Pull intel EDAC fixes from Tony Luck:
- Old igen6 driver could lose pending events during initialization
- Sapphire Rapids workstations have fewer memory controllers than their
bigger siblings. This confused the driver.
* tag 'edac_updates_for_v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/igen6: Fix the issue of no error events
EDAC/i10nm: Skip the absent memory controllers
Pull devicetree include cleanups from Rob Herring:
"These are the remaining few clean-ups of DT related includes which
didn't get applied to subsystem trees"
* tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
ipmi: Explicitly include correct DT includes
tpm: Explicitly include correct DT includes
lib/genalloc: Explicitly include correct DT includes
parport: Explicitly include correct DT includes
sbus: Explicitly include correct DT includes
mux: Explicitly include correct DT includes
macintosh: Explicitly include correct DT includes
hte: Explicitly include correct DT includes
EDAC: Explicitly include correct DT includes
clocksource: Explicitly include correct DT includes
sparc: Explicitly include correct DT includes
riscv: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it was merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Link: https://lore.kernel.org/r/20230714174434.4054728-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Current igen6_edac checks for pending errors before the registration
of the error handler. However, there is a possibility that the error
occurs during the registration process, leading to unhandled pending
errors and no future error events. This issue can be reproduced by
repeatedly injecting errors during the loading of the igen6_edac.
Fix this issue by moving the pending error handler after the registration
of the error handler, ensuring that no pending errors are left unhandled.
Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: Ee Wey Lim <ee.wey.lim@intel.com>
Tested-by: Ee Wey Lim <ee.wey.lim@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230725080427.23883-1-qiuxu.zhuo@intel.com
Pull RAS updates from Borislav Petkov:
- Add initial support for RAS hardware found on AMD server GPUs (MI200).
Those GPUs and CPUs are connected together through the coherent
fabric and the GPU memory controllers report errors through x86's MCA
so EDAC needs to support them. The amd64_edac driver supports now HBM
(High Bandwidth Memory) and thus such heterogeneous memory controller
systems
- Other small cleanups and improvements
* tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
EDAC/amd64: Cache and use GPU node map
EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
EDAC/amd64: Document heterogeneous system enumeration
x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
x86/amd_nb: Re-sort and re-indent PCI defines
x86/amd_nb: Add MI200 PCI IDs
ras/debugfs: Fix error checking for debugfs_create_dir()
x86/MCE: Check a hw error's address to determine proper recovery action
Pull EDAC updates from Borislav Petkov:
- amd64_edac: Add support for Zen4 client hardware
- amd64_edac: Remove the version string as it is useless and actively
confusing when looking at backported versions of the driver
- Add a driver for the Nuvoton NPCM memory controller
- A debugfs error checking cleanup
* tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/npcm: Add NPCM memory controller driver
dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller
EDAC/thunderx: Check debugfs file creation retval properly
EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
EDAC/amd64: Remove module version string
AMD systems have historically provided an "AMD Node ID" that is a unique
identifier for each die in a multi-die package. This was associated with
a unique instance of the AMD Northbridge on a legacy system. And now it
is associated with a unique instance of the AMD Data Fabric on modern
systems. Each instance is referred to as a "Node"; this is an
AMD-specific term not to be confused with NUMA nodes.
The data fabric provides a number of interfaces accessible through a set
of functions in a single PCI device. There is one PCI device per Data
Fabric (AMD Node), and multi-die systems will see multiple such PCI
devices. The AMD Node ID matches a Node's position in the PCI hierarchy.
For example, the Node 0 is accessed using the first PCI device, Node 1
is accessed using the second, and so on. A logical CPU can find its AMD
Node ID using CPUID. Furthermore, the AMD Node ID is used within the
hardware fabric, so it is not purely a logical value.
Heterogeneous AMD systems, with a CPU Data Fabric connected to GPU data
fabrics, follow a similar convention. Each CPU and GPU die has a unique
AMD Node ID value, and each Node ID corresponds to PCI devices in
sequential order.
However, there are two caveats:
1) GPUs are not x86, and they don't have CPUID to read their AMD Node ID
like on CPUs. This means the value is more implicit and based on PCI
enumeration and hardware-specifics.
2) There is a gap in the hardware values for AMD Node IDs. Values 0-7
are for CPUs and values 8-15 are for GPUs.
For example, a system with one CPU die and two GPUs dies will have the
following values:
CPU0 -> AMD Node 0
GPU0 -> AMD Node 8
GPU1 -> AMD Node 9
EDAC is the only subsystem where this has a practical effect. Memory
errors on AMD systems are commonly reported through MCA to a CPU on the
local AMD Node. The error information is passed along to EDAC where the
AMD EDAC modules use the AMD Node ID of reporting logical CPU to access
AMD Node information.
However, memory errors from a GPU die will be reported to the CPU die.
Therefore, the logical CPU's AMD Node ID can't be used since it won't
match the AMD Node ID of the GPU die. The AMD Node ID of the GPU die is
provided as part of the MCA information, and the value will match the
hardware enumeration (e.g. 8-15).
Handle this situation by discovering GPU dies the same way as CPU dies
in the AMD NB code. But do a "node id" fixup in AMD64 EDAC where it's
needed.
The GPU data fabrics provide a register with the base AMD Node ID for
their local "type", i.e. GPU data fabric. This value is the same for all
fabrics of the same type in a system.
Read and cache the base AMD Node ID from one of the GPU devices during
module initialization. Use this to fixup the "node id" when reporting
memory errors at runtime.
[ bp: Squash a fix making gpu_node_map static as reported by
Tom Rix <trix@redhat.com>.
Link: https://lore.kernel.org/r/20230610210930.174074-1-trix@redhat.com ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-6-muralimk@amd.com
AMD Family 19h Model 30h-3Fh systems can be connected to AMD MI200
accelerator/GPU devices such that the CPU and GPU data fabrics are
connected together. In this configuration, the CPU manages error logging
and reporting for MCA banks located on the GPUs. This includes HBM memory
errors reported from Unified Memory Controllers (UMCs) on the GPUs.
The GPU memory errors are handled like CPU memory errors.
AMD CPU UMC support in EDAC can be re-used for GPU UMC support. However,
keeping them separate means drastic changes in one path (e.g. to support
newer products) should have less impact on the other path.
Also, simplify the "gpu_" helper functions where possible. GPU product
configuration, like memory type and channel count, is fixed compared to
CPU products.
GPU UMCs each have four physical connections (phys) connected to eight
channels. There is a single "chip select". This differs from CPUs where
each UMC has one physical connection connected to one channel, and each
channel has up to four "chip selects".
Enumerate each UMC "phy" as an EDAC CSROW, since there is only a single
chip select for each physical connection. This is similar to how a CPU
UMC "phy" is enumerated as an EDAC CHANNEL, since there is only a single
channel for each physical connection.
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-5-muralimk@amd.com
The MI200 (Aldebaran) series of devices introduced a new SMCA bank type
for Unified Memory Controllers. The MCE subsystem already has support
for this new type. The MCE decoder module will decode the common MCA
error information for the new bank type, but it will not pass the
information to the AMD64 EDAC module for detailed memory error decoding.
Have the MCE decoder module recognize the new bank type as an SMCA UMC
memory error and pass the MCA information to AMD64 EDAC.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com
The AMD64 EDAC module version information is not exposed through ABI
like MODULE_VERSION(). Instead it is printed during module init.
Version numbers can be confusing in cases where module updates are
partly backported resulting in a difference between upstream and
backported module versions.
Remove the AMD64 EDAC module version information to avoid user
confusion.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230410190959.3367528-1-yazen.ghannam@amd.com
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
Pull ARM SoC driver updates from Arnd Bergmann:
"The most notable updates this time are for Qualcomm Snapdragon
platforms. The Inline-Crypto-Engine gets a new DT binding and driver,
and a number of drivers now support additional Snapdragon variants, in
particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc
(edac) and rpm drivers get notable functionality updates.
Updates on other platforms include:
- Various updates to the Mediatek mutex and mmsys drivers, including
support for the Helio X10 SoC
- Support for unidirectional mailbox channels in Arm SCMI firmware
- Support for per cpu asynchronous notification in OP-TEE firmware
- Minor updates for memory controller drivers.
- Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
obsolete DT driver interfaces"
* tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
soc: ti: smartreflex: Simplify getting the opam_sr pointer
bus: vexpress-config: Add explicit of_platform.h include
soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS
memory: mtk-smi: mt8365: Add SMI Support
dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365
dt-bindings: memory-controllers: mediatek,smi-common: add mt8365
memory: tegra: read values from correct device
dt-bindings: crypto: Add Qualcomm Inline Crypto Engine
soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver
dt-bindings: firmware: document Qualcomm QCM2290 SCM
soc: qcom: rpmh-rsc: Support RSC v3 minor versions
soc: qcom: smd-rpm: Use GFP_ATOMIC in write path
soc/tegra: fuse: Remove nvmem root only access
soc/tegra: cbb: tegra194: Use of_address_count() helper
soc/tegra: cbb: Remove MODULE_LICENSE in non-modules
ARM: tegra: Remove MODULE_LICENSE in non-modules
soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource()
soc: tegra: cbb: Drop empty platform remove function
firmware: arm_scmi: Add support for unidirectional mailbox channels
dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels
...
Combine all queued EDAC changes for submission into v6.4:
* ras/edac-drivers:
EDAC/i10nm: Add Intel Sierra Forest server support
EDAC/skx: Fix overflows on the DRAM row address mapping arrays
* ras/edac-amd64: (27 commits)
EDAC/amd64: Fix indentation in umc_determine_edac_cap()
EDAC/amd64: Add get_err_info() to pvt->ops
EDAC/amd64: Split dump_misc_regs() into dct/umc functions
EDAC/amd64: Split init_csrows() into dct/umc functions
EDAC/amd64: Split determine_edac_cap() into dct/umc functions
EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
EDAC/amd64: Split ecc_enabled() into dct/umc functions
EDAC/amd64: Split read_mc_regs() into dct/umc functions
EDAC/amd64: Split determine_memory_type() into dct/umc functions
EDAC/amd64: Split read_base_mask() into dct/umc functions
EDAC/amd64: Split prep_chip_selects() into dct/umc functions
EDAC/amd64: Rework hw_info_{get,put}
EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
EDAC/amd64: Rename debug_display_dimm_sizes()
* ras/edac-misc:
EDAC/altera: Remove MODULE_LICENSE in non-module
EDAC: Sanitize MODULE_AUTHOR strings
EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
EDAC/i5100: Fix typo in comment
EDAC/altera: Remove redundant error logging
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Since
8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"),
MODULE_LICENSE declarations are used to identify modules. As
a consequence, uses of the macro in non-modules will cause modprobe to
misidentify their containing object file as a module when it is not
(false positives), and modprobe might succeed rather than failing with
a suitable error message.
altera_edac is not a module for a while now, remove the macro call.
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230217141059.392471-24-nick.alcock@oracle.com
GPU Nodes will use a different method to determine the chip select
and channel of an error. A function pointer should be used rather than
introduce another branching condition.
Prepare for this by adding get_err_info() to pvt->ops. This function is
only called from the modern code path, so a legacy function is not
defined.
Make sure to call this after MCA_STATUS[SyndV] is checked, since the
csrow value is found in MCA_SYND.
[ Yazen: rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-23-yazen.ghannam@amd.com
Call them from their respective hw_info_get() paths.
Call the new functions after the setting the chip select base and mask
counts, since those are need to read the correct number of chip select
base and mask registers. And call the new functions before the remaining
set up, because the base and mask register values will be needed later.
[Yazen: Rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-14-yazen.ghannam@amd.com