index
:
debian-clblast
debian/sid
upstream/latest
Debian package for CLBlast.
gspr@nonempty.org
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
src
/
kernels
Age
Commit message (
Collapse
)
Author
2016-06-16
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, ↵
Cedric Nugteren
and/or transposing
2016-06-14
Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) ↵
Cedric Nugteren
and renamed files and functions appropriately
2016-06-08
Added global memory synchronisation for better cache performance on ARM Mali ↵
Cedric Nugteren
GPUs
2016-05-22
Prepared the GER kernels and tuner for half-precision support
Cedric Nugteren
2016-05-22
Prepared the GEMV kernels and tuner for half-precision support
Cedric Nugteren
2016-05-18
Merged in latest changes from 0.7.1 release
Cedric Nugteren
2016-05-16
Prepared GEMM and supporting kernels and tuners for half-precision support
Cedric Nugteren
2016-05-14
Set kernel arguments for AXPY as constant memory buffers, making it possible ↵
Cedric Nugteren
to transfer half-precision values as well
2016-05-13
Initial experimental version of the half-precision HAXPY routine
Cedric Nugteren
2016-05-12
Initial changes in preparation for half-precision fp16 support
Cedric Nugteren
2016-05-08
Fixed errors in xAXPY and xSCAL tests on AMD hardware
cnugteren
2016-04-30
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
Cedric Nugteren
2016-04-27
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM ↵
Cedric Nugteren
and IxAMAX
2016-04-20
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
cnugteren
2016-04-14
Added support for the SASUM/DASUM/ScASUM/DzASUM routines
cnugteren
2016-03-30
Fixed the nrm2 kernel for complex data-types
cnugteren
2016-03-28
Added preliminary support for the xNRM2 routines
Cedric Nugteren
2016-03-06
Added preliminary support for xHPR2 and xSPR2 routines
Cedric Nugteren
2016-03-02
Added preliminary support for xHER2 and xSYR2 routines
Cedric Nugteren
2016-02-28
Fixed a couple of correctness bugs in the Xher kernels
Cedric Nugteren
2016-02-28
Added support for xHER, xHPR, xSYR, and xSPR routines
Cedric Nugteren
2016-02-20
Added support for xGERU and xGERC routines
Cedric Nugteren
2016-02-20
Added XGER routine, kernel, and tuner
Cedric Nugteren
2016-02-08
Separated the GEMM kernel in two parts to reduce string length for MSVC
Cedric Nugteren
2016-02-08
Split-up the XGEMV kernel in two parts
Cedric Nugteren
2016-02-06
Reduced unrolling factor in xgemv kernel to reduce compilation times
CNugteren
2015-10-13
Added guards for routine-specific level-3 pad kernels
CNugteren
2015-10-12
Moved level3 kernel files to a subfolder
CNugteren
2015-09-26
Added TRMV/TBMV/TPMV routines
CNugteren
2015-09-19
Added SBMV and SPMV routines
CNugteren
2015-09-19
Added the HPMV routine
CNugteren
2015-09-19
Added the HBMV routine
CNugteren
2015-09-18
Improved the organization and performance of level 2 routines
CNugteren
2015-09-18
Added first version of banded matrix-vector multiplication
CNugteren
2015-09-14
Added xDOT/xDOTU/xDOTC dot-product routines
CNugteren
2015-08-22
Added the XSWAP, XSCAL and XCOPY level-1 routines
CNugteren
2015-08-22
Re-organized level1 xaxpy kernel
CNugteren
2015-08-13
Fixed a complex data-type bug in the transpose kernel
CNugteren
2015-08-04
Added distinguished names for GEMV inherited HEMV/SYMV
CNugteren
2015-08-03
Abstracted loading of matrix A for GEMV kernel
CNugteren
2015-07-22
Added workgroup shuffle option to transpose kernel for AMD GPUs
CNugteren
2015-07-21
Transpose kernel now uses vectorized local memory loads and stores
CNugteren
2015-07-19
Triangular GEMM kernels are only compiled when needed
CNugteren
2015-07-19
The kernel source string is now a routine's member variable
CNugteren
2015-07-16
Fixed a bug when using the Xgemm kernel without local memory
CNugteren
2015-07-16
Using mad() instruction for AMD devices like clBLAS does
CNugteren
2015-07-12
Added the HEMM routine, tester, and client
CNugteren
2015-07-07
Added option to set the imaginary part of the diagonal to zero
CNugteren
2015-07-02
Added the TRMM routine, tester, and client
CNugteren
2015-07-02
Added a set-to-one function for kernels
CNugteren
[next]