Age | Commit message (Collapse) | Author |
|
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
|
|
|
|
https://jira-dc.qualcomm.com/jira/browse/OSR-8731
|
|
Resolves https://github.com/CNugteren/CLBlast/issues/440
|
|
|
|
|
|
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).
This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).
Courtesy of Codeplay Software Ltd.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
kernel and test
|
|
|
|
|
|
First im2col+GEMM implementation of convolution
|
|
|
|
|
|
|
|
|
|
|
|
|
|
kernels to improve performance
|
|
|
|
|
|
|
|
|
|
|
|
|
|
capture other parts of the kernel code
|
|
approach for convgemm
|
|
|
|
local memory support now
|
|
edge cases now
|
|
gemm kernel
|
|
a new kernel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
compiler
|
|
|
|
|
|
|