debian-clblast - Debian package for CLBlast.

Age	Commit message (Expand)	Author
2016-09-12	Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ...	Cedric Nugteren
2016-09-04	The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ...	Cedric Nugteren
2016-08-20	Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvassch...	Cedric Nugteren
2016-08-18	Adapt opencl files for 1.1 OpenCL	D. Van Assche
2016-07-23	Fixe a bug in the new XgemvFastRot kernel related to local memory size	Cedric Nugteren
2016-07-23	Further improvements to the XgemvFastRot kernel, properly enables coalescing now	Cedric Nugteren
2016-07-23	Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enab...	Cedric Nugteren
2016-07-10	Now passing alpha/beta to the kernel as arguments as before fp16 support; in ...	Cedric Nugteren
2016-06-16	Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and...	Cedric Nugteren
2016-06-14	Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) a...	Cedric Nugteren
2016-06-08	Added global memory synchronisation for better cache performance on ARM Mali ...	Cedric Nugteren
2016-05-22	Prepared the GER kernels and tuner for half-precision support	Cedric Nugteren
2016-05-22	Prepared the GEMV kernels and tuner for half-precision support	Cedric Nugteren
2016-05-18	Merged in latest changes from 0.7.1 release	Cedric Nugteren
2016-05-16	Prepared GEMM and supporting kernels and tuners for half-precision support	Cedric Nugteren
2016-05-14	Set kernel arguments for AXPY as constant memory buffers, making it possible ...	Cedric Nugteren
2016-05-13	Initial experimental version of the half-precision HAXPY routine	Cedric Nugteren
2016-05-12	Initial changes in preparation for half-precision fp16 support	Cedric Nugteren
2016-05-08	Fixed errors in xAXPY and xSCAL tests on AMD hardware	cnugteren
2016-04-30	Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX	Cedric Nugteren
2016-04-27	Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM an...	Cedric Nugteren
2016-04-20	Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines	cnugteren
2016-04-14	Added support for the SASUM/DASUM/ScASUM/DzASUM routines	cnugteren
2016-03-30	Fixed the nrm2 kernel for complex data-types	cnugteren
2016-03-28	Added preliminary support for the xNRM2 routines	Cedric Nugteren
2016-03-06	Added preliminary support for xHPR2 and xSPR2 routines	Cedric Nugteren
2016-03-02	Added preliminary support for xHER2 and xSYR2 routines	Cedric Nugteren
2016-02-28	Fixed a couple of correctness bugs in the Xher kernels	Cedric Nugteren
2016-02-28	Added support for xHER, xHPR, xSYR, and xSPR routines	Cedric Nugteren
2016-02-20	Added support for xGERU and xGERC routines	Cedric Nugteren
2016-02-20	Added XGER routine, kernel, and tuner	Cedric Nugteren
2016-02-08	Separated the GEMM kernel in two parts to reduce string length for MSVC	Cedric Nugteren
2016-02-08	Split-up the XGEMV kernel in two parts	Cedric Nugteren
2016-02-06	Reduced unrolling factor in xgemv kernel to reduce compilation times	CNugteren
2015-10-13	Added guards for routine-specific level-3 pad kernels	CNugteren
2015-10-12	Moved level3 kernel files to a subfolder	CNugteren
2015-09-26	Added TRMV/TBMV/TPMV routines	CNugteren
2015-09-19	Added SBMV and SPMV routines	CNugteren
2015-09-19	Added the HPMV routine	CNugteren
2015-09-19	Added the HBMV routine	CNugteren
2015-09-18	Improved the organization and performance of level 2 routines	CNugteren
2015-09-18	Added first version of banded matrix-vector multiplication	CNugteren
2015-09-14	Added xDOT/xDOTU/xDOTC dot-product routines	CNugteren
2015-08-22	Added the XSWAP, XSCAL and XCOPY level-1 routines	CNugteren
2015-08-22	Re-organized level1 xaxpy kernel	CNugteren
2015-08-13	Fixed a complex data-type bug in the transpose kernel	CNugteren
2015-08-04	Added distinguished names for GEMV inherited HEMV/SYMV	CNugteren
2015-08-03	Abstracted loading of matrix A for GEMV kernel	CNugteren
2015-07-22	Added workgroup shuffle option to transpose kernel for AMD GPUs	CNugteren
2015-07-21	Transpose kernel now uses vectorized local memory loads and stores	CNugteren