Age | Commit message (Expand) | Author |
2016-09-12 | Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ... | Cedric Nugteren |
2016-09-04 | The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ... | Cedric Nugteren |
2016-08-20 | Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvassch... | Cedric Nugteren |
2016-08-18 | Adapt opencl files for 1.1 OpenCL | D. Van Assche |
2016-07-23 | Fixe a bug in the new XgemvFastRot kernel related to local memory size | Cedric Nugteren |
2016-07-23 | Further improvements to the XgemvFastRot kernel, properly enables coalescing now | Cedric Nugteren |
2016-07-23 | Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enab... | Cedric Nugteren |
2016-07-10 | Now passing alpha/beta to the kernel as arguments as before fp16 support; in ... | Cedric Nugteren |
2016-06-16 | Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and... | Cedric Nugteren |
2016-06-14 | Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) a... | Cedric Nugteren |
2016-06-08 | Added global memory synchronisation for better cache performance on ARM Mali ... | Cedric Nugteren |
2016-05-22 | Prepared the GER kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-22 | Prepared the GEMV kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-18 | Merged in latest changes from 0.7.1 release | Cedric Nugteren |
2016-05-16 | Prepared GEMM and supporting kernels and tuners for half-precision support | Cedric Nugteren |
2016-05-14 | Set kernel arguments for AXPY as constant memory buffers, making it possible ... | Cedric Nugteren |
2016-05-13 | Initial experimental version of the half-precision HAXPY routine | Cedric Nugteren |
2016-05-12 | Initial changes in preparation for half-precision fp16 support | Cedric Nugteren |
2016-05-08 | Fixed errors in xAXPY and xSCAL tests on AMD hardware | cnugteren |
2016-04-30 | Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX | Cedric Nugteren |
2016-04-27 | Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM an... | Cedric Nugteren |
2016-04-20 | Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines | cnugteren |
2016-04-14 | Added support for the SASUM/DASUM/ScASUM/DzASUM routines | cnugteren |
2016-03-30 | Fixed the nrm2 kernel for complex data-types | cnugteren |
2016-03-28 | Added preliminary support for the xNRM2 routines | Cedric Nugteren |
2016-03-06 | Added preliminary support for xHPR2 and xSPR2 routines | Cedric Nugteren |
2016-03-02 | Added preliminary support for xHER2 and xSYR2 routines | Cedric Nugteren |
2016-02-28 | Fixed a couple of correctness bugs in the Xher kernels | Cedric Nugteren |
2016-02-28 | Added support for xHER, xHPR, xSYR, and xSPR routines | Cedric Nugteren |
2016-02-20 | Added support for xGERU and xGERC routines | Cedric Nugteren |
2016-02-20 | Added XGER routine, kernel, and tuner | Cedric Nugteren |
2016-02-08 | Separated the GEMM kernel in two parts to reduce string length for MSVC | Cedric Nugteren |
2016-02-08 | Split-up the XGEMV kernel in two parts | Cedric Nugteren |
2016-02-06 | Reduced unrolling factor in xgemv kernel to reduce compilation times | CNugteren |
2015-10-13 | Added guards for routine-specific level-3 pad kernels | CNugteren |
2015-10-12 | Moved level3 kernel files to a subfolder | CNugteren |
2015-09-26 | Added TRMV/TBMV/TPMV routines | CNugteren |
2015-09-19 | Added SBMV and SPMV routines | CNugteren |
2015-09-19 | Added the HPMV routine | CNugteren |
2015-09-19 | Added the HBMV routine | CNugteren |
2015-09-18 | Improved the organization and performance of level 2 routines | CNugteren |
2015-09-18 | Added first version of banded matrix-vector multiplication | CNugteren |
2015-09-14 | Added xDOT/xDOTU/xDOTC dot-product routines | CNugteren |
2015-08-22 | Added the XSWAP, XSCAL and XCOPY level-1 routines | CNugteren |
2015-08-22 | Re-organized level1 xaxpy kernel | CNugteren |
2015-08-13 | Fixed a complex data-type bug in the transpose kernel | CNugteren |
2015-08-04 | Added distinguished names for GEMV inherited HEMV/SYMV | CNugteren |
2015-08-03 | Abstracted loading of matrix A for GEMV kernel | CNugteren |
2015-07-22 | Added workgroup shuffle option to transpose kernel for AMD GPUs | CNugteren |
2015-07-21 | Transpose kernel now uses vectorized local memory loads and stores | CNugteren |