When we want to use float point operation on Linux
we need to use within special kernel protection
(`kernel_fpu_{begin,end}()`.), otherwise the kernel
can clobber userspace FPU register state. For detecting
these issues we use a tool named objtool (with -Ffa
flags) to highlight the FPU problems, all warnings can
be summed up as follows:
./tools/objtool/objtool check -Ffa
drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.o
[..] dc/dsc/rc_calc.o: warning: objtool: get_qp_set()+0x2f8:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc.o: warning: objtool: dsc_roundf()+0x5:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc.o: warning: objtool: dsc_ceil()+0x5:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc.o: warning: objtool: get_ofs_set()+0x3eb:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc.o: warning: objtool: calc_rc_params()+0x3c:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/dc_dsc.o: warning: objtool:
get_dsc_bandwidth_range.isra.0()+0x8d:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/dc_dsc.o: warning: objtool: setup_dsc_config()+0x2ef:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc_dpi.o: warning: objtool:copy_pps_fields()+0xbb:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc_dpi.o: warning: objtool:
dscc_compute_dsc_parameters()+0x7b:
FPU instruction outside of kernel_fpu_{begin,end}()
This commit fixes the above issues by rework DSC as described:
1. Isolate all FPU operations in a single file;
2. Use FPU flags only in the file that handles FPU operations;
3. Isolate all functions that require float point operation in static
functions;
4. Add a mid-layer function that does not use any float point operation,
and that could be safely invoked in other parts of the code.
5. Keep float point operation under DC_FP_{START/END} macro.
CC: Christian König <christian.koenig@amd.com>
CC: Alexander Deucher <Alexander.Deucher@amd.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Tony Cheng <tony.cheng@amd.com>
CC: Harry Wentland <hwentlan@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use kfree() instead of kvfree() to free rgb_user in
calculate_user_regamma_ramp() because the memory is allocated with
kcalloc().
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use kvfree() instead of kfree() to free coeff in build_regamma()
because the memory is allocated with kvzalloc().
Fixes: e752058b86 ("drm/amd/display: Optimize gamma calculations")
Cc: stable@vger.kernel.org
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull drm fixes from Dave Airlie:
"These are the fixes from last week for the stuff merged in the merge
window. It got a bunch of nouveau fixes for HDA audio on some new
GPUs, some i915 and some amdpgu fixes.
i915:
- gvt: Fix one clang warning on debug only function
- Use ARRAY_SIZE for coccicheck warning
- Use after free fix for display global state.
- Whitelisting context-local timestamp on Gen9 and two scheduler
fixes with deps (Cc: stable)
- Removal of write flag from sysfs files where ineffective
nouveau:
- HDMI/DP audio HDA fixes
- display hang fix for Volta/Turing
- GK20A regression fix.
amdgpu:
- Prevent hwmon accesses while GPU is in reset
- CTF interrupt fix
- Backlight fix for renoir
- Fix for display sync groups
- Display bandwidth validation workaround"
* tag 'drm-next-2020-06-08' of git://anongit.freedesktop.org/drm/drm: (28 commits)
drm/nouveau/kms/nv50-: clear SW state of disabled windows harder
drm/nouveau: gr/gk20a: Use firmware version 0
drm/nouveau/disp/gm200-: detect and potentially disable HDA support on some SORs
drm/nouveau/disp/gp100: split SOR implementation from gm200
drm/nouveau/disp: modify OR allocation policy to account for HDA requirements
drm/nouveau/disp: split part of OR allocation logic into a function
drm/nouveau/disp: provide hint to OR allocation about HDA requirements
drm/amd/display: Revalidate bandwidth before commiting DC updates
drm/amdgpu/display: use blanked rather than plane state for sync groups
drm/i915/params: fix i915.fake_lmem_start module param sysfs permissions
drm/i915/params: don't expose inject_probe_failure in debugfs
drm/i915: Whitelist context-local timestamp in the gen9 cmdparser
drm/i915: Fix global state use-after-frees with a refcount
drm/i915: Check for awaits on still currently executing requests
drm/i915/gt: Do not schedule normal requests immediately along virtual
drm/i915: Reorder await_execution before await_request
drm/nouveau/kms/gt215-: fix race with audio driver runpm
drm/nouveau/disp/gm200-: fix NV_PDISP_SOR_HDMI2_CTRL(n) selection
Revert "drm/amd/display: disable dcn20 abm feature for bring up"
drm/amd/powerplay: ack the SMUToHost interrupt on receive V2
...
[Why]
Whenever we switch between tiled formats without also switching pixel
formats or doing anything else that recreates the DC plane state we
can run into underflow or hangs since we're not updating the
DML parameters before committing to the hardware.
[How]
If the update type is FULL then call validate_bandwidth again to update
the DML parmeters before committing the state.
This is basically just a workaround and protective measure against
update types being added DC where we could run into this issue in
the future.
We can only fully validate the state in advance before applying it to
the hardware if we recreate all the plane and stream states since
we can't modify what's currently in use.
The next step is to update DM to ensure that we're creating the plane
and stream states for whatever could potentially be a full update in
DC to pre-emptively recreate the state for DC global validation.
The workaround can stay until this has been fixed in DM.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
When determining synchronzied vblank we don't need to compare the stream
with itself
[How]
If comparing same stream, continue to next iteration
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Header Changes]
- Combine all interface dependencies between driver and fw into a
single header file
- Add FW Versioning to the dmub_cmd.h file
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
We hit an issue which driver reallocate a pipe from desktop bottom
pipe to video bottom pipe. In this case, driver need to re-enable
plane.
[how]
Enable plane if container of plane status changed.
Signed-off-by: Hugo Hu <hugo.hu@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
We want to better encapsulate all driver-fw dependencies into a single
file.
[How]
Combine all the headers under inc folder into a single header
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Diagnostics DIO test with eDP not connected is required to run
[How]
Allow Diagnostics test with eDP not connected to skip link detection but
still execute DIO test
Signed-off-by: Eric Bernstein <eric.bernstein@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
There are scenarios where no OPP is assigned to an OTG so its value is
0xF which is outside the size of the OPP array causing a potential
driver crash.
[How]
Change the assert to an early return to guard against access. If
there's no OPP assigned already, then OTG will be blank anyways so no
functionality should be lost.
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Zhan Liu <Zhan.Liu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
To facilitate DM removing the dependency between dc and the firmware
binary.
[HOW]
Setting the default values to match VBIOS: 64 KB. These values are only
used if meta is absent.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Link loss currently only retrains and re-enables the stream. This can
cause issues for some sinks.
[How]
When link loss occurs, the link and stream(s) should be completely
disabled and then reenabled.
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Current implementation is slightly inaccurate and will often
result in truncation/floor operation decrementing an exact
integer output by 1.
Only rounded down output is ever expected, just extract the fp
exponent for this to increase performance and avoid any
truncation issues.
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Eric Bernstein <Eric.Bernstein@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Previously we used link signal type to get the caps. We should use the
sink signal type
[How]
Use sink signal type instead of link signal type
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The link_status is incorrect cause driver power off eDP when backlight
on. Some eDP panels may show garbage on screen.
[How]
Correct link_status when power off encoder
Signed-off-by: Paul Hsieh <paul.hsieh@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Warnings in the kernel are generally treated as errors.
The BREAK_TO_DEBUGGER macro is not a critical error or warning, but
rather intended for developer use to help investigate behavior and
sequences for other issues.
We do still make use of DC_ERROR/ASSERT(0) in various places in the
code for things that are genuine issues.
Since most developers don't actually KGDB while debugging the kernel
these essentially would have no value on their own since the KGDB
breakpoint wouldn't trigger - ASSERT(0) was used as a shortcut to get
a stacktrace.
[How]
Turn it into a DRM_DEBUG_DRIVER print instead. We unfortunately lose
the stacktrace, but we still do retain some of the useful debug
information this offers by having at least the function and line
number loggable.
If KGDB is supported in the kernel this will still trigger a real
breakpoint as well.
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This change removes internal rounding in dml_log2 function.
Dml_log2 is expected to return a float output. In case an int is needed
dml will floor the output on it's own.
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Eric Bernstein <Eric.Bernstein@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Region 4 is non cacheable and slower than using cache window 4.
[How]
Check the firmware version to determine how we should program the
base address and memory windows.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
In order to switch over the inbox from region4 to cw4 we need to know if
the firmware is capable of properly invalidating the cache before
reading the commands.
Easiest way is to just check the firmware version, but we don't have the
helper macros or a way for the dmub_srv to know what version it is.
[How]
Add a new fw_version field to the creation parameters that driver can
optional pass in. Assumes a version of 0x00000000 is invalid.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Currently we're copying the entire bios image into vbios. Loading time
for FW with entire bios(54272 bytes) is 105138us. By copying only the
sections of bios we're using(4436 bytes), loading time drops to 104326us
which saves us 812us.
[HOW]
ROM header, master data table, and all data tables will be packed in
contiguous manner. The offsets for the data tables are remapped to their
newly packed location.
Signed-off-by: Jake Wang <haonan.wang2@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
DP link layer CTS specs updated to change the test parameters in test
4.2.1.1.
Before it requires source to delay 400us on aux no reply.
With the specs updates Errata5, it requires source to delay 3.2ms
(based on LTTPR aux timeout)
This causes our test to fail after updating with the latest test
equipment firmware.
[how]
the change is to allow LTTPR 3.2ms aux timeout delay by default.
And only set to 400us if LTTPR is not present.
Before this piece of logic is interwined with LTTPR support.
Now we will default to 3.2ms aux timeout even if LTTPR support is not
enabled by driver.
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Due to packing of abm_config_table, memory addresses aren't aligned to
32 bit boundary dmcub prefers. Therefore when using pointers to this
structure, it's possible that dmcub will automatically align the data
read from that address, yielding incorrect values.
[How]
Instead of packing 1 byte boundary, explicitly pack values to 4 byte
boundary. Since there is a dependency on the existing iram table
structure on driver side, we must copy to a second structure, which is
aligned correctly, before passing to fw.
Signed-off-by: Wyatt Wood <wyatt.wood@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian Koenig pointed out a code duplication related to bit swap in
case of big-endian manipulation. This commit adds a helper for handling
this verification and reduces the requirement of replicate some part of
the code.
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Reviewed-by: Wyatt Wood <Wyatt.Wood@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
If bss_data_size is 0 then we shouldn't be passing down fw_bss_data into
the DMUB service since the region isn't really "valid."
[How]
Pass NULL instead if the size is 0.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Zhan Liu <Zhan.Liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
New unified firmware binary with only inst const still passes down
fw_bss_data != NULL and params->bss_data_size == 0 from DM.
This leads it into the legacy path causing firmware state allocation to
be too small.
[How]
Check bss_data_size as well.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Zhan Liu <Zhan.Liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Failing validation when building scaling parameters causes corruption to
occur due to pipe splitting with smaller pixel widths than HW supports.
This needs to fail silently for now to hide the corruption until the
corruption itself can be fixed.
[HOW]
Do not fail validation if building scaling params fails.
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Remove dm_write_persistent_data and dm_read_persistent_data as
persistence should be handled in DM.
[How]
Remove functions. Move read/write calls into DM layer while maintaining
logic.
Signed-off-by: Jaehyun Chung <jaehyun.chung@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
If VUPDATE_END is before VUPDATE_START the delay calculated can become
very large, causing a soft hang.
[How]
Take the absolute value of the difference between START and END.
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Previously we used the s3 codepath for gpu reset. This can lead to issues in
certain case where we end of waiting for fences which will never come (because
parts of the hw are off due to gpu reset) and we end up waiting forever causing
a deadlock.
[How]
Handle GPU reset separately from normal s3 case. We essentially need to redo
everything we do in s3, but avoid any drm calls.
For GPU reset case
suspend:
-Acquire DC lock
-Cache current dc_state
-Commit 0 stream/planes to dc (this puts dc into a state where it can be
powered off)
-Disable interrupts
resume
-Edit cached state to force full update
-Commit cached state from suspend
-Build stream and plane updates from the cached state
-Commit stream/plane updates
-Enable interrupts
-Release DC lock
v2:
-Formatting
-Release dc_state
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>