Age | Commit message (Collapse) | Author |
|
CNugteren/CLBlast-334-pyclblast-half-precision-support
PyCLBlast half precision support
|
|
|
|
|
|
|
|
|
|
Convolution with single kernel
|
|
|
|
|
|
strided-batched-GEMM routine
|
|
|
|
|
|
executions
|
|
Fix single kernel version of convgemm
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
|
|
|
|
|
|
|
|
|
|
CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG
Fixed an issue for the GEMMK == 1 kernel
|
|
|
|
Remove unnecessary qualifier of inline function
|
|
|
|
Add im2colflip and col2imflip functions
|
|
|
|
Implements col2im routine
|
|
|
|
|
|
|
|
Add col2im function
|
|
|
|
Update FindOpenCL.cmake
|
|
Add path to ROCm OpenCL as possible location in cmake script
|
|
kernel and test
|
|
|
|
|
|
|
|
Fixed a bug in the XaxpyFaster kernel for specific parameters
|
|
|
|
First im2col+GEMM implementation of convolution
|
|
Made tuning API more flexible
|
|
|
|
|
|
|
|
Fixed pre-processor warnings related to the subgroup shuffling
|