Age | Commit message (Collapse) | Author |
|
Resolves https://github.com/CNugteren/CLBlast/issues/440
|
|
|
|
|
|
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).
This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).
Courtesy of Codeplay Software Ltd.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
kernel and test
|
|
|
|
|
|
First im2col+GEMM implementation of convolution
|
|
|
|
|
|
|
|
|
|
|
|
|
|
kernels to improve performance
|
|
|
|
|
|
|
|
|
|
|
|
|
|
capture other parts of the kernel code
|
|
approach for convgemm
|
|
|
|
local memory support now
|
|
edge cases now
|
|
gemm kernel
|
|
a new kernel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
compiler
|
|
|
|
|
|
|
|
|
|
|
|
where the AMD compiler crashes"
This reverts commit 407ed52cec41445f02e85cb45d08f590960216bb.
|