summaryrefslogtreecommitdiff
path: root/src/kernels/common.opencl
AgeCommit message (Expand)Author
2023-01-03implemented changes to boost Adreno performance according to https://jira-dc....Angus, Alexander
2022-06-24Fix typo in commentCedric Nugteren
2018-07-28Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 kern...Cedric Nugteren
2017-12-03Added basic bracket parsing in defines and loop expressionsCedric Nugteren
2017-10-14Make local memory pointers a define in OpenCL; some fixes to the recently cha...Cedric Nugteren
2017-10-14Fixed several (not all) CUDA kernel compilation issuesCedric Nugteren
2017-07-08Made the inline keyword in kernels optional currently only enabled for NVIDIA...Cedric Nugteren
2017-04-07Uses float2 and double2 for base complex data-types instead of a custom struc...Cedric Nugteren
2017-02-26Fixes division in the kernel for inversion of complex numbersCedric Nugteren
2017-02-05Merge branch 'development' into triangular_solversCedric Nugteren
2017-02-05Fixed complex version of the TRSV kernelCedric Nugteren
2017-02-04Improved substition kernels a bit; added complex supportCedric Nugteren
2017-01-15Added a first version of the diagonal block invert routine in preparation of ...Cedric Nugteren
2017-01-07Always enables cl_khr_fp64 when running double-precision, not just for OpenCL...Cedric Nugteren
2016-09-21Merge branch 'development' into gemm_directCedric Nugteren
2016-09-04The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ...Cedric Nugteren
2016-07-16Created infrastructure to support a direct GEMM kernel; added correct but slo...Cedric Nugteren
2016-07-10Now passing alpha/beta to the kernel as arguments as before fp16 support; in ...Cedric Nugteren
2016-05-18Merged in latest changes from 0.7.1 releaseCedric Nugteren
2016-05-16Prepared GEMM and supporting kernels and tuners for half-precision supportCedric Nugteren
2016-05-14Set kernel arguments for AXPY as constant memory buffers, making it possible ...Cedric Nugteren
2016-05-13Initial experimental version of the half-precision HAXPY routineCedric Nugteren
2016-05-12Initial changes in preparation for half-precision fp16 supportCedric Nugteren
2016-04-27Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM an...Cedric Nugteren
2016-04-20Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routinescnugteren
2016-04-14Added support for the SASUM/DASUM/ScASUM/DzASUM routinescnugteren
2016-03-02Added preliminary support for xHER2 and xSYR2 routinesCedric Nugteren
2016-02-20Added XGER routine, kernel, and tunerCedric Nugteren
2015-09-14Added xDOT/xDOTU/xDOTC dot-product routinesCNugteren
2015-08-22Added the XSWAP, XSCAL and XCOPY level-1 routinesCNugteren
2015-07-19The kernel source string is now a routine's member variableCNugteren
2015-07-16Using mad() instruction for AMD devices like clBLAS doesCNugteren
2015-07-07Added option to set the imaginary part of the diagonal to zeroCNugteren
2015-07-02Added a set-to-one function for kernelsCNugteren
2015-06-16Added support for complex conjugate transposeCNugteren
2015-05-30Initial commit of preview versionCNugteren