summaryrefslogtreecommitdiff
path: root/src/kernels/level3/xgemm_direct_part2.opencl
AgeCommit message (Collapse)Author
2017-12-03Added GEMM (direct and in-direct) to the pre-processor testing; modified the ↵Cedric Nugteren
loops in kernel accordingly
2017-10-14Make local memory pointers a define in OpenCL; some fixes to the recently ↵Cedric Nugteren
changed transpose kernel code
2017-07-08Made the inline keyword in kernels optional currently only enabled for ↵Cedric Nugteren
NVIDIA and ARM GPUs
2017-06-30Fixed an if-statement in the direct GEMM kernel causing a bug with specific ↵Cedric Nugteren
sets of input parameters
2016-12-18Fixed a bug when using offsets in the direct GEMM kernelsCedric Nugteren
2016-10-03Fixed a const-correctness issue with complex conjugation in the GEMM direct ↵Cedric Nugteren
kernel
2016-10-03Added functions to load from off-chip to local memory without vector loads ↵Cedric Nugteren
for the GEMM direct kernels
2016-10-03Re-organised GEMM direct kernel and added faster fall-back version for ↵Cedric Nugteren
incomplete rectangles
2016-10-02Specialised the GEMM direct kernel in four ways for ↵Cedric Nugteren
transposing/non-transposing: NN, NT, TN, TT
2016-10-02Split the GEMM direct kernel into two files; set the default tuning target ↵Cedric Nugteren
to 256-256-256