summaryrefslogtreecommitdiff
path: root/src/kernels
AgeCommit message (Expand)Author
2023-05-07AMAX/AMIN integer testing and bug fixes (#457)Cedric Nugteren
2023-01-17Updated according to feedback from CNugterenAngus, Alexander
2023-01-03implemented changes to boost Adreno performance according to https://jira-dc....Angus, Alexander
2022-06-24Fix typo in commentCedric Nugteren
2022-04-22sum fixJustin Graham
2020-03-08Made it more likely (but no guarantees) for amax/amin to return the first indexCedric Nugteren
2019-09-04Fix out-of-bounds read/write in XhadFasteretomzak
2019-05-19Fixed a bug in the absolute-min index kernelCedric Nugteren
2019-05-08Changed back to cl_intel_subgroups as suggestedCedric Nugteren
2019-05-07Enabled avc_motion_estimation extension for Intel subgroup shufflingCedric Nugteren
2018-12-18Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernelKoichi Akabe
2018-11-19Remove unnecessary attribute of inline functionKoichi Akabe
2018-11-12Add kernel_mode option to im2col, col2im, and convgemm functionsKoichi Akabe
2018-11-07Changed col2im to append to the existing im-bufferCedric Nugteren
2018-11-01Added new col2im routine to the documentationCedric Nugteren
2018-10-30Fix col2im implementationKoichi Akabe
2018-10-23Added groundwork for col2im algorithm plus first non-working version of kerne...Cedric Nugteren
2018-10-17Fixed a bug with the pre-processing and the AXPY kernelCedric Nugteren
2018-10-15Fixed a bug in the XaxpyFaster kernel for specific parametersCedric Nugteren
2018-10-14Merge pull request #319 from CNugteren/convgemm_multi_kernelCedric Nugteren
2018-10-10Fixed pre-processor warnings related to the subgroup shufflingCedric Nugteren
2018-09-16Merge branch 'master' into convgemm_multi_kernelCedric Nugteren
2018-09-15Fixed an MSVC compilation error due to large stringsCedric Nugteren
2018-09-15Fixed issues with GEMMK=1 kernel and the pre-processorCedric Nugteren
2018-09-07Added xCONVGEMM as im2col plus a batched GEMM kernelCedric Nugteren
2018-07-29Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-07-28Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 kern...Cedric Nugteren
2018-07-27Fixed an issue with AMD GPUs and the new GEMMK == 1 kernelCedric Nugteren
2018-07-16moved a two-line macro to a single lineTyler Sorensen
2018-07-14Applied feedback from Cedric from first pull requestTyler Sorensen
2018-07-11added inline ptx to support shuffle on Nvidia GPUsTyler Sorensen
2018-06-03Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-05-21Further implemented single-kernel approach of convgemm; extended test to capt...Cedric Nugteren
2018-05-21Added method selection option to switch between im2col and single-kernel appr...Cedric Nugteren
2018-05-19Moved new convgemm kernel to levelx kernel folderCedric Nugteren
2018-05-19Second version of direct reading from image tensor for convgemm: also with lo...Cedric Nugteren
2018-05-17First version of direct reading from image tensor for convgemm: only for edge...Cedric Nugteren
2018-05-13Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm...Cedric Nugteren
2018-05-13Plugged in the code of strided-batched-gemm into convgemm in preparation of a...Cedric Nugteren
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-08Fixed issues with the pre-processorCedric Nugteren
2018-04-07Extended the GEMM tuner to be able to tune the new 'kernel 1'Cedric Nugteren
2018-04-07Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-06Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-03Added first version of 2D register tiling kernel with A and C transposed as wellCedric Nugteren
2018-03-23Removed arrays as function argument from GEMM kernels for Vivante OpenCL comp...Cedric Nugteren
2018-03-15Fixed a failing TRSM test using a CPU with Apple OpenCLCedric Nugteren
2018-03-15Fixed a failing TRSV test using a CPU with Apple OpenCLCedric Nugteren
2018-02-02Implemented the XHAD Hadamard product routineCedric Nugteren