Age | Commit message (Collapse) | Author |
|
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
|
|
|
|
https://jira-dc.qualcomm.com/jira/browse/OSR-8731
|
|
In order not to have ambiguous definitions, exclude the functions for other compilers
|
|
Replace the looped test by a single one with the offset of the last batch.
|
|
Replace the looped test by a single one with the maximal found offset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
inline PTX to support subgroup shuffle for Nvidia GPUs
|
|
|
|
|
|
|
|
|
|
and standard-deviation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
results based on kernel pre-processor
|
|
|
|
|
|
|
|
|
|
support for multi-kernel routines
|
|
|
|
|
|
the CLBlast library
|
|
|
|
|
|
|
|
|
|
|
|
GEMM kernel selection tuner
|
|
results
|
|
|
|
|
|
|
|
|