summaryrefslogtreecommitdiff
path: root/src/kernels/level3/xgemm_part1.opencl
AgeCommit message (Expand)Author
2019-05-08Changed back to cl_intel_subgroups as suggestedCedric Nugteren
2019-05-07Enabled avc_motion_estimation extension for Intel subgroup shufflingCedric Nugteren
2018-10-10Fixed pre-processor warnings related to the subgroup shufflingCedric Nugteren
2018-09-15Fixed an MSVC compilation error due to large stringsCedric Nugteren
2018-07-16moved a two-line macro to a single lineTyler Sorensen
2018-07-14Applied feedback from Cedric from first pull requestTyler Sorensen
2018-07-11added inline ptx to support shuffle on Nvidia GPUsTyler Sorensen
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-08Fixed issues with the pre-processorCedric Nugteren
2018-04-07Extended the GEMM tuner to be able to tune the new 'kernel 1'Cedric Nugteren
2018-04-07Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-06Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-03Added first version of 2D register tiling kernel with A and C transposed as wellCedric Nugteren
2017-12-10Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limitCedric Nugteren
2017-12-09Reformatted GEMM kernel to support array-to-register promotionCedric Nugteren
2017-12-07Added register promotion to the main GEMM kernelCedric Nugteren
2017-12-03Added GEMM (direct and in-direct) to the pre-processor testing; modified the ...Cedric Nugteren
2017-10-14Make local memory pointers a define in OpenCL; some fixes to the recently cha...Cedric Nugteren
2017-07-08Made the inline keyword in kernels optional currently only enabled for NVIDIA...Cedric Nugteren
2016-09-12Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ...Cedric Nugteren
2016-06-08Added global memory synchronisation for better cache performance on ARM Mali ...Cedric Nugteren
2016-05-15Added support for staggered/shuffled offsets for GEMM to improve performance ...cnugteren
2016-02-08Separated the GEMM kernel in two parts to reduce string length for MSVCCedric Nugteren