summaryrefslogtreecommitdiff
path: root/src/kernels/level3/xgemm_part1.opencl
AgeCommit message (Collapse)Author
2017-12-07Added register promotion to the main GEMM kernelCedric Nugteren
2017-12-03Added GEMM (direct and in-direct) to the pre-processor testing; modified the ↵Cedric Nugteren
loops in kernel accordingly
2017-10-14Make local memory pointers a define in OpenCL; some fixes to the recently ↵Cedric Nugteren
changed transpose kernel code
2017-07-08Made the inline keyword in kernels optional currently only enabled for ↵Cedric Nugteren
NVIDIA and ARM GPUs
2016-09-12Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ↵Cedric Nugteren
can't handle long strings
2016-06-08Added global memory synchronisation for better cache performance on ARM Mali ↵Cedric Nugteren
GPUs
2016-05-15Added support for staggered/shuffled offsets for GEMM to improve performance ↵cnugteren
for large power-of-2 kernels on AMD GPUs
2016-02-08Separated the GEMM kernel in two parts to reduce string length for MSVCCedric Nugteren