Age | Commit message (Collapse) | Author | |
---|---|---|---|
2017-12-07 | Added register promotion to the main GEMM kernel | Cedric Nugteren | |
2017-12-03 | Added GEMM (direct and in-direct) to the pre-processor testing; modified the ↵ | Cedric Nugteren | |
loops in kernel accordingly | |||
2017-10-14 | Make local memory pointers a define in OpenCL; some fixes to the recently ↵ | Cedric Nugteren | |
changed transpose kernel code | |||
2017-07-08 | Made the inline keyword in kernels optional currently only enabled for ↵ | Cedric Nugteren | |
NVIDIA and ARM GPUs | |||
2016-09-12 | Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ↵ | Cedric Nugteren | |
can't handle long strings | |||
2016-06-08 | Added global memory synchronisation for better cache performance on ARM Mali ↵ | Cedric Nugteren | |
GPUs | |||
2016-05-15 | Added support for staggered/shuffled offsets for GEMM to improve performance ↵ | cnugteren | |
for large power-of-2 kernels on AMD GPUs | |||
2016-02-08 | Separated the GEMM kernel in two parts to reduce string length for MSVC | Cedric Nugteren | |