Age | Commit message (Collapse) | Author |
|
|
|
|
|
Fixed a bug in the XaxpyFaster kernel for specific parameters
|
|
|
|
First im2col+GEMM implementation of convolution
|
|
Made tuning API more flexible
|
|
|
|
|
|
|
|
Fixed pre-processor warnings related to the subgroup shuffling
|
|
|
|
|
|
Fixed pre-processor issues with the new GEMMK=1 kernel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Add Julia Wrapper
|
|
I've written a wrapper of CLBlast in Julia which can be found [here](https://github.com/JuliaGPU/CLBlast.jl). It is published and available using the Julia package manager.
|
|
Missing events in TRSV and TRSM
|
|
|
|
|
|
Netlib API with optional static OpenCL variables
|
|
|
|
|
|
Fixes bug in conjugate transpose not being executed
|
|
Added workaround for AMD Southern Islands GPU issue
|
|
transposing
|
|
|
|
|
|
|
|
|
|
|
|
Tuners now check for valid local thread size
|
|
|
|
|
|
|
|
|
|
invalid ones completely, saving compilation time
|
|
CNugteren/CLBlast-300-fix-staggered-indices-AMD-GEMMK1
Fix staggered indices on AMD GPUs for GEMMK == 1 kernel
|
|
kernels to improve performance
|
|
|
|
|
|
|
|
|