index
:
debian-clblast
debian/sid
upstream/latest
Debian package for CLBlast.
gspr@nonempty.org
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
src
/
kernels
/
level3
Age
Commit message (
Expand
)
Author
2018-07-27
Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
Cedric Nugteren
2018-07-16
moved a two-line macro to a single line
Tyler Sorensen
2018-07-14
Applied feedback from Cedric from first pull request
Tyler Sorensen
2018-07-11
added inline ptx to support shuffle on Nvidia GPUs
Tyler Sorensen
2018-05-31
Some potential fixes for error -54 when launching TRSV and TRSM kernels
Cedric Nugteren
2018-04-24
Added Intel subgroup shuffle support to the 2D register caching GEMM kernel
Cedric Nugteren
2018-04-08
Fixed issues with the pre-processor
Cedric Nugteren
2018-04-07
Extended the GEMM tuner to be able to tune the new 'kernel 1'
Cedric Nugteren
2018-04-07
Fixed a compilation issue for complex datatypes and vload
Cedric Nugteren
2018-04-06
Fixed a compilation issue for complex datatypes and vload
Cedric Nugteren
2018-04-03
Added first version of 2D register tiling kernel with A and C transposed as well
Cedric Nugteren
2018-03-23
Removed arrays as function argument from GEMM kernels for Vivante OpenCL comp...
Cedric Nugteren
2018-03-15
Fixed a failing TRSM test using a CPU with Apple OpenCL
Cedric Nugteren
2018-01-08
Implemented the in-direct version of the strided-batched GEMM kernel
Cedric Nugteren
2018-01-07
Implemented direct version of strided-batched GEMM kernel
Cedric Nugteren
2017-12-31
Revert "Added options to disable parts of the invert kernel to find out where...
Cedric Nugteren
2017-12-31
Changed the invert kernel slightly; added part1a/part1b disable-defines
Cedric Nugteren
2017-12-30
Fixed ifdef's into ifndef's
Cedric Nugteren
2017-12-30
Added options to disable parts of the invert kernel to find out where the AMD...
Cedric Nugteren
2017-12-27
Simplified invert kernel a little
Cedric Nugteren
2017-12-23
Split the invert kernel in two parts to prevent error C1091 in MSVC 2013
Cedric Nugteren
2017-12-19
Added skeleton for a tuner for the invert kernel
Cedric Nugteren
2017-12-10
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
Cedric Nugteren
2017-12-09
Completed kernel modifications for pre-processor of all other kernels
Cedric Nugteren
2017-12-09
Modified the direct GEMM kernel to support array-to-register promotion
Cedric Nugteren
2017-12-09
Reformatted GEMM kernel to support array-to-register promotion
Cedric Nugteren
2017-12-09
Fixed defines parsing and substituting in pre-processor; fixed some variable ...
Cedric Nugteren
2017-12-07
Added register promotion to the main GEMM kernel
Cedric Nugteren
2017-12-03
Added GEMM (direct and in-direct) to the pre-processor testing; modified the ...
Cedric Nugteren
2017-12-03
Reformated transpose kernels for the pre-processor; extended the amount of tests
Cedric Nugteren
2017-11-29
Reformatted unrollable kernel loops and added the new promote_to_registers pr...
Cedric Nugteren
2017-10-14
Fixed a kernel/attribute order bug in the direct GEMM kernels
Cedric Nugteren
2017-10-14
Make local memory pointers a define in OpenCL; some fixes to the recently cha...
Cedric Nugteren
2017-10-14
Made transpose kernel struct init proper according to the C standard
Cedric Nugteren
2017-10-03
Gemm in-direct implementation now uses only 1 larger instead of max 3 optiona...
Cedric Nugteren
2017-07-08
Made the inline keyword in kernels optional currently only enabled for NVIDIA...
Cedric Nugteren
2017-06-30
Fixed an if-statement in the direct GEMM kernel causing a bug with specific s...
Cedric Nugteren
2017-05-14
Fixed a missing synchronization barrier in the invert kernel; fixes TRSM tests
Cedric Nugteren
2017-03-19
Added an (optional) non-direct implementation of the batched GEMM routine
Cedric Nugteren
2017-03-19
Added batched versions of the pad/copy/transpose kernels
Cedric Nugteren
2017-03-11
Added initial naive version of the batched GEMM routine based on the direct G...
Cedric Nugteren
2017-03-04
Added a proper data-preparation function for the TRSM tests
Cedric Nugteren
2017-02-26
Fixed an out-of-bounds memory access when filling a matrix with a constant
Cedric Nugteren
2017-02-26
Fixes division in the kernel for inversion of complex numbers
Cedric Nugteren
2017-02-25
Added PrepareData function for TRSM to create proper test input
Cedric Nugteren
2017-01-18
Added first version of the TRSM routine based on the diagonal invert kernel
Cedric Nugteren
2017-01-15
Added a first version of the diagonal block invert routine in preparation of ...
Cedric Nugteren
2016-12-18
Fixed a bug when using offsets in the direct GEMM kernels
Cedric Nugteren
2016-10-22
Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with speci...
Cedric Nugteren
2016-10-22
Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with speci...
Cedric Nugteren
[next]