summaryrefslogtreecommitdiff
path: root/src/kernels
AgeCommit message (Collapse)Author
2023-05-07AMAX/AMIN integer testing and bug fixes (#457)Cedric Nugteren
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result * Perform proper integer-output testing in XAMAX tests * A few changes towards getting it ready for a PR * Also fix compilation for clBLAS and cuBLAS references * Fix a bug that would only use the real part of complex numbers in the amax/amin routines * A few small fixes related to the AMAX tests
2023-01-17Updated according to feedback from CNugterenAngus, Alexander
2023-01-03implemented changes to boost Adreno performance according to ↵Angus, Alexander
https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2022-06-24Fix typo in commentCedric Nugteren
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-04-22sum fixJustin Graham
2020-03-08Made it more likely (but no guarantees) for amax/amin to return the first indexCedric Nugteren
2019-09-04Fix out-of-bounds read/write in XhadFasteretomzak
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.
2019-05-19Fixed a bug in the absolute-min index kernelCedric Nugteren
2019-05-08Changed back to cl_intel_subgroups as suggestedCedric Nugteren
2019-05-07Enabled avc_motion_estimation extension for Intel subgroup shufflingCedric Nugteren
2018-12-18Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernelKoichi Akabe
2018-11-19Remove unnecessary attribute of inline functionKoichi Akabe
2018-11-12Add kernel_mode option to im2col, col2im, and convgemm functionsKoichi Akabe
2018-11-07Changed col2im to append to the existing im-bufferCedric Nugteren
2018-11-01Added new col2im routine to the documentationCedric Nugteren
2018-10-30Fix col2im implementationKoichi Akabe
2018-10-23Added groundwork for col2im algorithm plus first non-working version of ↵Cedric Nugteren
kernel and test
2018-10-17Fixed a bug with the pre-processing and the AXPY kernelCedric Nugteren
2018-10-15Fixed a bug in the XaxpyFaster kernel for specific parametersCedric Nugteren
2018-10-14Merge pull request #319 from CNugteren/convgemm_multi_kernelCedric Nugteren
First im2col+GEMM implementation of convolution
2018-10-10Fixed pre-processor warnings related to the subgroup shufflingCedric Nugteren
2018-09-16Merge branch 'master' into convgemm_multi_kernelCedric Nugteren
2018-09-15Fixed an MSVC compilation error due to large stringsCedric Nugteren
2018-09-15Fixed issues with GEMMK=1 kernel and the pre-processorCedric Nugteren
2018-09-07Added xCONVGEMM as im2col plus a batched GEMM kernelCedric Nugteren
2018-07-29Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-07-28Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 ↵Cedric Nugteren
kernels to improve performance
2018-07-27Fixed an issue with AMD GPUs and the new GEMMK == 1 kernelCedric Nugteren
2018-07-16moved a two-line macro to a single lineTyler Sorensen
2018-07-14Applied feedback from Cedric from first pull requestTyler Sorensen
2018-07-11added inline ptx to support shuffle on Nvidia GPUsTyler Sorensen
2018-06-03Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-05-21Further implemented single-kernel approach of convgemm; extended test to ↵Cedric Nugteren
capture other parts of the kernel code
2018-05-21Added method selection option to switch between im2col and single-kernel ↵Cedric Nugteren
approach for convgemm
2018-05-19Moved new convgemm kernel to levelx kernel folderCedric Nugteren
2018-05-19Second version of direct reading from image tensor for convgemm: also with ↵Cedric Nugteren
local memory support now
2018-05-17First version of direct reading from image tensor for convgemm: only for ↵Cedric Nugteren
edge cases now
2018-05-13Created a dedicated convgemm GEMM kernel as a copy of the batched direct ↵Cedric Nugteren
gemm kernel
2018-05-13Plugged in the code of strided-batched-gemm into convgemm in preparation of ↵Cedric Nugteren
a new kernel
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-08Fixed issues with the pre-processorCedric Nugteren
2018-04-07Extended the GEMM tuner to be able to tune the new 'kernel 1'Cedric Nugteren
2018-04-07Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-06Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-03Added first version of 2D register tiling kernel with A and C transposed as wellCedric Nugteren
2018-03-23Removed arrays as function argument from GEMM kernels for Vivante OpenCL ↵Cedric Nugteren
compiler
2018-03-15Fixed a failing TRSM test using a CPU with Apple OpenCLCedric Nugteren
2018-03-15Fixed a failing TRSV test using a CPU with Apple OpenCLCedric Nugteren
2018-02-02Implemented the XHAD Hadamard product routineCedric Nugteren