summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2016-02-20Added support for xGERU and xGERC routinesCedric Nugteren
2016-02-20Added XGER routine, kernel, and tunerCedric Nugteren
2016-02-08Separated the GEMM kernel in two parts to reduce string length for MSVCCedric Nugteren
2016-02-08Split-up the XGEMV kernel in two partsCedric Nugteren
2016-02-07Added dictionary with short and long OpenCL vendor names to fix issues with I...Cedric Nugteren
2016-02-06Reduced the maximum workgroup-size for GEMV kernels furtherCNugteren
2016-02-06Reduced unrolling factor in xgemv kernel to reduce compilation timesCNugteren
2016-01-30Fixes for compilation under Visual StudioCNugteren
2016-01-30Added first auto-generated database headers from the Python database; only K4...Cedric Nugteren
2015-10-28Now sets local memory size in xgemv tuner properlyCNugteren
2015-10-25Fixed an arguments-related bug in the GEMV tunerCNugteren
2015-10-25Moved the tuner database script to a separate folderCNugteren
2015-10-13Added guards for routine-specific level-3 pad kernelsCNugteren
2015-10-12Routine names are now all default arguments defined in the headerCNugteren
2015-10-12Moved level3 kernel files to a subfolderCNugteren
2015-09-26Added TRMV/TBMV/TPMV routinesCNugteren
2015-09-19Added SBMV and SPMV routinesCNugteren
2015-09-19Added the HPMV routineCNugteren
2015-09-19Added infrastructure for packed matricesCNugteren
2015-09-19Added the HBMV routineCNugteren
2015-09-18Improved the organization and performance of level 2 routinesCNugteren
2015-09-18Added first version of banded matrix-vector multiplicationCNugteren
2015-09-17Added interface of all level 2 routinesCNugteren
2015-09-17Added script to generate API interface and implementation automaticallyCNugteren
2015-09-14Added xDOT/xDOTU/xDOTC dot-product routinesCNugteren
2015-09-14Added extra temporary buffer to tuners in preparation of Xdot routinesCNugteren
2015-09-14Added support for the dot buffer and offset argumentCNugteren
2015-08-22Added the XSWAP, XSCAL and XCOPY level-1 routinesCNugteren
2015-08-22Re-organized level1 xaxpy kernelCNugteren
2015-08-20Merge pull request #23 from CNugteren/tuner_databaseCedric Nugteren
2015-08-20Added initial version of tuner-database Python scriptCNugteren
2015-08-19Moved precision tester to utilitiesCNugteren
2015-08-19Added hotfix 8eeb7f721ff8811521147cfe5ae9796164286b77CNugteren
2015-08-13Merge pull request #21 from CNugteren/c_apiCedric Nugteren
2015-08-13Added all supported routines to the C APICNugteren
2015-08-13Fixed a complex data-type bug in the transpose kernelCNugteren
2015-08-13Added initial version of C API with just one routineCNugteren
2015-08-09Refactored the tuners, added JSON outputCNugteren
2015-08-04Added distinguished names for GEMV inherited HEMV/SYMVCNugteren
2015-08-03Abstracted loading of matrix A for GEMV kernelCNugteren
2015-07-31Added HEMV routineCNugteren
2015-07-31Added SYMV routineCNugteren
2015-07-27Now using the new Claduc C++11 OpenCL headerCNugteren
2015-07-22Added workgroup shuffle option to transpose kernel for AMD GPUsCNugteren
2015-07-21Transpose kernel now uses vectorized local memory loads and storesCNugteren
2015-07-19Triangular GEMM kernels are only compiled when neededCNugteren
2015-07-19Kernel caching is now based on a routine's nameCNugteren
2015-07-19The kernel source string is now a routine's member variableCNugteren
2015-07-16Fixed a bug when using the Xgemm kernel without local memoryCNugteren
2015-07-16Using mad() instruction for AMD devices like clBLAS doesCNugteren