index
:
debian-clblast
debian/sid
upstream/latest
Debian package for CLBlast.
gspr@nonempty.org
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
src
/
kernels
/
level3
/
xgemm_part1.opencl
Age
Commit message (
Expand
)
Author
2018-10-10
Fixed pre-processor warnings related to the subgroup shuffling
Cedric Nugteren
2018-09-15
Fixed an MSVC compilation error due to large strings
Cedric Nugteren
2018-07-16
moved a two-line macro to a single line
Tyler Sorensen
2018-07-14
Applied feedback from Cedric from first pull request
Tyler Sorensen
2018-07-11
added inline ptx to support shuffle on Nvidia GPUs
Tyler Sorensen
2018-04-24
Added Intel subgroup shuffle support to the 2D register caching GEMM kernel
Cedric Nugteren
2018-04-08
Fixed issues with the pre-processor
Cedric Nugteren
2018-04-07
Extended the GEMM tuner to be able to tune the new 'kernel 1'
Cedric Nugteren
2018-04-07
Fixed a compilation issue for complex datatypes and vload
Cedric Nugteren
2018-04-06
Fixed a compilation issue for complex datatypes and vload
Cedric Nugteren
2018-04-03
Added first version of 2D register tiling kernel with A and C transposed as well
Cedric Nugteren
2017-12-10
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
Cedric Nugteren
2017-12-09
Reformatted GEMM kernel to support array-to-register promotion
Cedric Nugteren
2017-12-07
Added register promotion to the main GEMM kernel
Cedric Nugteren
2017-12-03
Added GEMM (direct and in-direct) to the pre-processor testing; modified the ...
Cedric Nugteren
2017-10-14
Make local memory pointers a define in OpenCL; some fixes to the recently cha...
Cedric Nugteren
2017-07-08
Made the inline keyword in kernels optional currently only enabled for NVIDIA...
Cedric Nugteren
2016-09-12
Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ...
Cedric Nugteren
2016-06-08
Added global memory synchronisation for better cache performance on ARM Mali ...
Cedric Nugteren
2016-05-15
Added support for staggered/shuffled offsets for GEMM to improve performance ...
cnugteren
2016-02-08
Separated the GEMM kernel in two parts to reduce string length for MSVC
Cedric Nugteren