index
:
debian-clblast
debian/sid
upstream/latest
Debian package for CLBlast.
gspr@nonempty.org
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
src
/
routines
Age
Commit message (
Expand
)
Author
2018-06-03
Merge branch 'master' into CLBlast-267-convgemm
Cedric Nugteren
2018-06-01
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when bar...
Cedric Nugteren
2018-05-31
Added error-checking for half-empty local work group sizes; fixed a minor TRS...
Cedric Nugteren
2018-05-31
Some potential fixes for error -54 when launching TRSV and TRSM kernels
Cedric Nugteren
2018-05-30
Widened Apple OpenCL check, added way to debug too-large-workgroups issue
Cedric Nugteren
2018-05-27
Added a check to return 'NotImplemented' error code in case of systems with <...
Cedric Nugteren
2018-05-27
Made FillMatrix and FillVector functions take a configurable local workgroup ...
Cedric Nugteren
2018-05-21
Added method selection option to switch between im2col and single-kernel appr...
Cedric Nugteren
2018-05-19
Moved new convgemm kernel to levelx kernel folder
Cedric Nugteren
2018-05-19
Second version of direct reading from image tensor for convgemm: also with lo...
Cedric Nugteren
2018-05-19
Merge branch 'master' into CLBlast-267-convgemm
Cedric Nugteren
2018-05-17
First version of direct reading from image tensor for convgemm: only for edge...
Cedric Nugteren
2018-05-13
Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm...
Cedric Nugteren
2018-05-13
Plugged in the code of strided-batched-gemm into convgemm in preparation of a...
Cedric Nugteren
2018-05-09
Changed temporary convgemm implementation to use batched-strided GEMM
Cedric Nugteren
2018-05-09
Implemented convolution as im2col + GEMM
Cedric Nugteren
2018-05-06
Added convgemm skeleton, test infrastructure, and first reference implementation
Cedric Nugteren
2018-05-01
Now stores a shared_ptr to the Program class in the cache
Cedric Nugteren
2018-04-18
Expressed HER2K as two HERK calls
Cedric Nugteren
2018-04-18
Expressed SYR2K as two SYRK calls
Cedric Nugteren
2018-04-17
Updated HERK and SYRK to follow the GEMM style and functions to make it work ...
Cedric Nugteren
2018-04-15
Fixed some failing tests for GEMM and batched GEMM routines
Cedric Nugteren
2018-04-13
Made GEMM rotation expectations kernel-specific
Cedric Nugteren
2018-03-15
Fixed a failing TRSM test using a CPU with Apple OpenCL
Cedric Nugteren
2018-03-15
Fixed a failing TRSV test using a CPU with Apple OpenCL
Cedric Nugteren
2018-02-02
Implemented the XHAD Hadamard product routine
Cedric Nugteren
2018-01-31
Created the API and stubs for the HAD (hadamard-product) routines
Cedric Nugteren
2018-01-26
Fixed an event synchronisation issue in the batched gemm routines
Cedric Nugteren
2018-01-18
Made the batched routines also chose direct/indirect kernel like the main GEM...
Cedric Nugteren
2018-01-08
Implemented the in-direct version of the strided-batched GEMM kernel
Cedric Nugteren
2018-01-07
Implemented direct version of strided-batched GEMM kernel
Cedric Nugteren
2018-01-07
Added API and tests for new GemmStridedBatched routine
Cedric Nugteren
2018-01-06
Reduced duplicate code in the batched GEMM implementation
Cedric Nugteren
2018-01-06
Fixed the CUDA interface: replaced nullptr with 0
Cedric Nugteren
2017-12-30
Added optional temp-buffer argument to C++ interface of GEMM
Cedric Nugteren
2017-12-28
Added interface to compute the required temporary buffer size for GEMM
Cedric Nugteren
2017-12-28
Factored out argument processing from the GEMM routine
Cedric Nugteren
2017-12-28
Refactored GEMM code in preparation of separate temp-buffer computation
Cedric Nugteren
2017-12-23
Split the invert kernel in two parts to prevent error C1091 in MSVC 2013
Cedric Nugteren
2017-12-23
Updated the database to use the new TRSV and Invert tuners
Cedric Nugteren
2017-12-23
Added TRSV block-size tuner
Cedric Nugteren
2017-12-10
Fixed for error C1091 in MSVC 2013
Cedric Nugteren
2017-12-10
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
Cedric Nugteren
2017-11-17
Moved compilation function to separate file; removed dependency of tuners of ...
Cedric Nugteren
2017-11-11
Factored out the creation of the OpenCL header and the program compilation
Cedric Nugteren
2017-11-02
Integrated the GEMM routine tuner for kernel selection; added first tuning re...
Cedric Nugteren
2017-10-27
Fixed a bug when using the matrix A-offset argument for the TRSM routine
Cedric Nugteren
2017-10-27
Reduced TRSM block-size for better numerical stability
Cedric Nugteren
2017-10-27
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
Cedric Nugteren
2017-10-25
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM ...
Cedric Nugteren
[next]