summaryrefslogtreecommitdiff
path: root/src/kernels
AgeCommit message (Expand)Author
2016-09-25Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, ...Cedric Nugteren
2016-09-25Separated the tuning parameters of the new direct GEMM kernel from the indire...Cedric Nugteren
2016-09-25Added a first version of the direct version of GEMM with local memoryCedric Nugteren
2016-09-21Merge branch 'development' into gemm_directCedric Nugteren
2016-09-12Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ...Cedric Nugteren
2016-09-04The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ...Cedric Nugteren
2016-08-20Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvassch...Cedric Nugteren
2016-08-18Adapt opencl files for 1.1 OpenCLD. Van Assche
2016-07-26Merge branch 'development' into gemm_directCedric Nugteren
2016-07-23Fixe a bug in the new XgemvFastRot kernel related to local memory sizeCedric Nugteren
2016-07-23Further improvements to the XgemvFastRot kernel, properly enables coalescing nowCedric Nugteren
2016-07-23Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enab...Cedric Nugteren
2016-07-17Improved the GEMM direct kernel by adding register blocking. Still not fast t...Cedric Nugteren
2016-07-16Created infrastructure to support a direct GEMM kernel; added correct but slo...Cedric Nugteren
2016-07-10Now passing alpha/beta to the kernel as arguments as before fp16 support; in ...Cedric Nugteren
2016-06-16Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and...Cedric Nugteren
2016-06-14Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) a...Cedric Nugteren
2016-06-08Added global memory synchronisation for better cache performance on ARM Mali ...Cedric Nugteren
2016-05-22Prepared the GER kernels and tuner for half-precision supportCedric Nugteren
2016-05-22Prepared the GEMV kernels and tuner for half-precision supportCedric Nugteren
2016-05-18Merged in latest changes from 0.7.1 releaseCedric Nugteren
2016-05-16Prepared GEMM and supporting kernels and tuners for half-precision supportCedric Nugteren
2016-05-14Set kernel arguments for AXPY as constant memory buffers, making it possible ...Cedric Nugteren
2016-05-13Initial experimental version of the half-precision HAXPY routineCedric Nugteren
2016-05-12Initial changes in preparation for half-precision fp16 supportCedric Nugteren
2016-05-08Fixed errors in xAXPY and xSCAL tests on AMD hardwarecnugteren
2016-04-30Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAXCedric Nugteren
2016-04-27Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM an...Cedric Nugteren
2016-04-20Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routinescnugteren
2016-04-14Added support for the SASUM/DASUM/ScASUM/DzASUM routinescnugteren
2016-03-30Fixed the nrm2 kernel for complex data-typescnugteren
2016-03-28Added preliminary support for the xNRM2 routinesCedric Nugteren
2016-03-06Added preliminary support for xHPR2 and xSPR2 routinesCedric Nugteren
2016-03-02Added preliminary support for xHER2 and xSYR2 routinesCedric Nugteren
2016-02-28Fixed a couple of correctness bugs in the Xher kernelsCedric Nugteren
2016-02-28Added support for xHER, xHPR, xSYR, and xSPR routinesCedric Nugteren
2016-02-20Added support for xGERU and xGERC routinesCedric Nugteren
2016-02-20Added XGER routine, kernel, and tunerCedric Nugteren
2016-02-08Separated the GEMM kernel in two parts to reduce string length for MSVCCedric Nugteren
2016-02-08Split-up the XGEMV kernel in two partsCedric Nugteren
2016-02-06Reduced unrolling factor in xgemv kernel to reduce compilation timesCNugteren
2015-10-13Added guards for routine-specific level-3 pad kernelsCNugteren
2015-10-12Moved level3 kernel files to a subfolderCNugteren
2015-09-26Added TRMV/TBMV/TPMV routinesCNugteren
2015-09-19Added SBMV and SPMV routinesCNugteren
2015-09-19Added the HPMV routineCNugteren
2015-09-19Added the HBMV routineCNugteren
2015-09-18Improved the organization and performance of level 2 routinesCNugteren
2015-09-18Added first version of banded matrix-vector multiplicationCNugteren
2015-09-14Added xDOT/xDOTU/xDOTC dot-product routinesCNugteren