Master indexing is problematic if a hub is rescanned while the
root master is being rescanned. Always allocate an index for the
FSI master, and set the device name if it hasn't already been set.
Move the call to ida_free to the bottom of master unregistration
and set the number of links to 0 in case another call to scan
comes in before the device is removed.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20230809180814.151984-2-eajames@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20230718205508.1790932-1-robh@kernel.org
Signed-off-by: Joel Stanley <joel@jms.id.au>
READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.
Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If allocation fails, the ida_simple_get() will return error number.
So master->idx could be error number and be used in dev_set_name().
Therefore, it should be better to check it and return error if fails,
like the ida_simple_get() in __fsi_get_new_minor().
Fixes: 09aecfab93 ("drivers/fsi: Add fsi master definition")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20220111073411.614138-1-jiasheng@iscas.ac.cn
Signed-off-by: Joel Stanley <joel@jms.id.au>
There is now a need for reading devicetree properties in the OCC
hwmon driver, which isn't current supported as the FSI driver just
instantiates a basic platform device. Add support for this use case
by checking for an "occ-hwmon" node and if present, creating an
OF device from it.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20220809200701.218059-3-eajames@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
Smatch reports these issues
fsi-core.c:395:12: warning: function 'fsi_slave_claim_range'
with external linkage has definition
fsi-core.c:409:13: warning: function 'fsi_slave_release_range'
with external linkage has definition
The storage-class-specifier extern is not needed in a
definition, so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20220403140937.3833578-1-trix@redhat.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
Joel writes:
FSI changes for v5.18
* Improvements in SCOM and OCC drivers for error handling and retries
* Addition of tracepoints for initialisation path
* API for setting long running SBE FIFO operations
* tag 'fsi-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/fsi:
fsi: Add trace events in initialization path
fsi: sbefifo: Implement FSI_SBEFIFO_READ_TIMEOUT_SECONDS ioctl
fsi: sbefifo: Use specified value of start of response timeout
fsi: occ: Improve response status checking
fsi: scom: Remove retries in indirect scoms
fsi: scom: Fix error handling
FSI_SBEFIFO_READ_TIMEOUT_SECONDS ioctl sets the read timeout (in
seconds) for the response received by sbefifo device from sbe. The
timeout affects only the read operation on current sbefifo device fd.
Certain SBE operations can take long time to complete and the default
timeout of 10 seconds might not be sufficient to start receiving
response from SBE. In such cases, allow the timeout to be set to the
maximum of 120 seconds.
The kernel does not contain the definition of the various SBE
operations, so we must expose an interface to userspace to set the
timeout for the given operation.
Signed-off-by: Amitay Isaacs <amitay@ozlabs.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20220121053816.82253-3-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
For some of the chip-ops where sbe needs to collect trace information,
sbe can take a long time (>30s) to respond. Currently these chip-ops
will timeout as the start of response timeout defaults to 10s.
Instead of default value, use specified value. The require timeout
value will be set using ioctl.
Signed-off-by: Amitay Isaacs <amitay@ozlabs.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20220121053816.82253-2-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
If the driver sequence number coincidentally equals the previous
command response sequence number, the driver may proceed with
fetching the entire buffer before the OCC has processed the current
command. To be sure the correct response is obtained, check the
command type and also retry if any of the response parameters have
changed when the rest of the buffer is fetched. Also initialize the
driver with a random sequence number in order to reduce the chances
of this happening.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20220208152235.19686-1-eajames@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
A struct device can never be devm_alloc()'ed.
Here, it is embedded in "struct fsi_master", and "struct fsi_master" is
embedded in "struct fsi_master_aspeed".
Since "struct device" is embedded, the data structure embedding it must be
released with the release function, as is already done here.
So use kzalloc() instead of devm_kzalloc() when allocating "aspeed" and
update all error handling branches accordingly.
This prevent a potential double free().
This also fix another issue if opb_readl() fails. Instead of a direct
return, it now jumps in the error handling path.
Fixes: 606397d67f ("fsi: Add ast2600 master driver")
Suggested-by: Greg KH <gregkh@linuxfoundation.org>
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/2c123f8b0a40dc1a061fae982169fe030b4f47e6.1641765339.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit f72ddbe1d7 ("fsi: scom: Remove retries") the retries were
removed from get and put scoms. That patch missed the retires in get and
put indirect scom.
For the same reason, remove them from the scom driver to allow the
caller to decide to retry.
This removes the following special case which would have caused the
retry code to return early:
- if ((ind_data & XSCOM_DATA_IND_COMPLETE) || (err != SCOM_PIB_BLOCKED))
- return 0;
I believe this case is handled.
Fixes: f72ddbe1d7 ("fsi: scom: Remove retries")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20211207033811.518981-3-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
SCOM error handling is made complex by trying to pass around two bits of
information: the function return code, and a status parameter that
represents the CFAM error status register.
The commit f72ddbe1d7 ("fsi: scom: Remove retries") removed the
"hidden" retries in the SCOM driver, in preference of allowing the
calling code (userspace or driver) to decide how to handle a failed
SCOM. However it introduced a bug by attempting to be smart about the
return codes that were "errors" and which were ok to fall through to the
status register parsing.
We get the following errors:
- EINVAL or ENXIO, for indirect scoms where the value is invalid
- EINVAL, where the size or address is incorrect
- EIO or ETIMEOUT, where FSI write failed (aspeed master)
- EAGAIN, where the master detected a crc error (GPIO master only)
- EBUSY, where the bus is disabled (GPIO master in external mode)
In all of these cases we should fail the SCOM read/write and return the
error.
Thanks to Dan Carpenter for the detailed bug report.
Fixes: f72ddbe1d7 ("fsi: scom: Remove retries")
Link: https://lists.ozlabs.org/pipermail/linux-fsi/2021-November/000235.html
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20211207033811.518981-2-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
Some SBE operations have extremely large responses and can require
several minutes to process the response. During this time, the device
lock must be held. If another process attempts an operation, it will
wait for the mutex for longer than the kernel hung task watchdog
allows. Therefore, use the interruptible function to lock the mutex.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20210803213016.44739-1-eajames@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
If the SBEFIFO response indicates an error, store the response in the
user buffer and return an error. Previously, the user had no way of
obtaining the SBEFIFO FFDC.
The user's buffer now contains data in the event of a failure. No change
in the event of a successful transfer.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20211019205307.36946-3-eajames@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
Allocate a large buffer for each OCC to handle response data. This
removes memory allocation during an operation, and also allows for
the maximum amount of SBE FFDC.
Previously for the putsram and attn commands, only 32 words would have
been available, and for getsram, only up to the size of the transfer.
SBE FFDC might be up to 8Kb.
The SBE interface expects data to be specified in units of words (4
bytes), defined as OCC_MAX_RESP_WORDS.
This change allows the full FFDC capture to be implemented, where before
it was not available.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20211019205307.36946-2-eajames@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
Set and increment the sequence number during the submit operation.
This prevents sequence number conflicts between different users of
the interface. A sequence number conflict may result in a user
getting an OCC response meant for a different command. Since the
sequence number is now modified, the checksum must be calculated and
set before submitting the command.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20210721190231.117185-2-eajames@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
On BMCs with lower timer resolution than 1ms, msleep(1) will take
way longer than 1ms, so looping 10k times won't wait for 10s but
significantly longer.
Fix this by using jiffies like the rest of the code.
Fixes: 9f4a8a2d7f ("fsi/sbefifo: Add driver for the SBE FIFO")
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Link: https://lore.kernel.org/r/20200724071518.430515-3-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
The lengthy timeout previously used sometimes resulted in
scheduling problems, detailed below. Therefore reduce the timeout
to 500us. This timeout selection is supported by the benchmarks
collected below with various clock dividers. This is purely the time
spent polling (reported by ktime_get()).
div 1: max:150us avg: 2us
div 2: max:155us avg: 3us
div 4: max:149us avg: 7us
div 8: max:153us avg: 13us
div 16: max:197us avg: 21us
div 32: max:181us avg: 50us
div 64: max:262us avg:100us
Jan 22 01:27:21 rain27bmc kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Jan 22 01:27:21 rain27bmc kernel: rcu: 0-....: (2099 ticks this GP) idle=0ca/1/0x40000002 softirq=349573/349573 fqs=1048
Jan 22 01:27:21 rain27bmc kernel: (t=2100 jiffies g=841149 q=7163)
Jan 22 01:27:21 rain27bmc kernel: NMI backtrace for cpu 0
Jan 22 01:27:21 rain27bmc kernel: CPU: 0 PID: 5959 Comm: ibm-read-vpd Not tainted 5.8.17-a9b4ea8 #1
Jan 22 01:27:21 rain27bmc kernel: Hardware name: Generic DT based system
Jan 22 01:27:21 rain27bmc kernel: Backtrace:
Jan 22 01:27:25 rain27bmc kernel: [<8010d92c>] (dump_backtrace) from [<8010db80>] (show_stack+0x20/0x24)
...
Jan 22 01:27:25 rain27bmc kernel: [<8010130c>] (gic_handle_irq) from [<80100b0c>] (__irq_svc+0x6c/0x90)
Jan 22 01:27:25 rain27bmc kernel: Exception stack(0xb79159b0 to 0xb79159f8)
Jan 22 01:27:25 rain27bmc kernel: 59a0: 9e88e5d5 00000559 00000559 00000018
Jan 22 01:27:25 rain27bmc kernel: 59c0: 00000000 9f217c55 00000003 00000559 a0201c00 bfa4d048 bfa4d000 b7915a44
Jan 22 01:27:25 rain27bmc kernel: 59e0: 40e88f8a b7915a00 3254e553 80734924 80030113 ffffffff
Jan 22 01:27:25 rain27bmc kernel: r9:b7914000 r8:a0201c00 r7:b79159e4 r6:ffffffff r5:80030113 r4:80734924
Jan 22 01:27:25 rain27bmc kernel: [<807348b4>] (__opb_read) from [<80734d98>] (aspeed_master_read+0xbc/0xcc)
Jan 22 01:27:25 rain27bmc kernel: r10:00000004 r9:00000002 r8:80734cdc r7:bd33fa40 r6:00000004 r5:bd33f840
Jan 22 01:27:25 rain27bmc kernel: r4:00201c00
Jan 22 01:27:25 rain27bmc kernel: [<80734cdc>] (aspeed_master_read) from [<807320f0>] (fsi_master_read+0x6c/0x1bc)
...
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20210211194846.35475-1-eajames@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>