summaryrefslogtreecommitdiff
path: root/src/kernels
AgeCommit message (Collapse)Author
2017-04-07Uses float2 and double2 for base complex data-types instead of a custom ↵Cedric Nugteren
struct; fixes bug on Apple OpenCL
2017-03-19Added an (optional) non-direct implementation of the batched GEMM routineCedric Nugteren
2017-03-19Added batched versions of the pad/copy/transpose kernelsCedric Nugteren
2017-03-11Added initial naive version of the batched GEMM routine based on the direct ↵Cedric Nugteren
GEMM kernel
2017-03-10Added proper testing of the alpha parameter; finalized the batched AXPY ↵Cedric Nugteren
implementation
2017-03-08Implemented a batched version of the AXPY kernelCedric Nugteren
2017-03-08Make batched routines based on offsets instead of a vector of cl_mem objects ↵Cedric Nugteren
- undoing many earlier changes
2017-03-04Added a proper data-preparation function for the TRSM testsCedric Nugteren
2017-02-26Fixed an out-of-bounds memory access when filling a matrix with a constantCedric Nugteren
2017-02-26Fixes division in the kernel for inversion of complex numbersCedric Nugteren
2017-02-25Added PrepareData function for TRSM to create proper test inputCedric Nugteren
2017-02-05Merge branch 'development' into triangular_solversCedric Nugteren
2017-02-05Fixed complex version of the TRSV kernelCedric Nugteren
2017-02-04Improved substition kernels a bit; added complex supportCedric Nugteren
2017-02-04Completed a first STRSV implementationCedric Nugteren
2017-01-29Added first (incomplete) version of TRSV routineCedric Nugteren
2017-01-18Added first version of the TRSM routine based on the diagonal invert kernelCedric Nugteren
2017-01-15Added a first version of the diagonal block invert routine in preparation of ↵Cedric Nugteren
TRSM
2017-01-07Always enables cl_khr_fp64 when running double-precision, not just for ↵Cedric Nugteren
OpenCL 1.1 or lower
2016-12-18Fixed a bug when using offsets in the direct GEMM kernelsCedric Nugteren
2016-10-22Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with ↵Cedric Nugteren
specific tuning parameters (2)
2016-10-22Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with ↵Cedric Nugteren
specific tuning parameters
2016-10-03Fixed a const-correctness issue with complex conjugation in the GEMM direct ↵Cedric Nugteren
kernel
2016-10-03Added functions to load from off-chip to local memory without vector loads ↵Cedric Nugteren
for the GEMM direct kernels
2016-10-03Re-organised GEMM direct kernel and added faster fall-back version for ↵Cedric Nugteren
incomplete rectangles
2016-10-02Specialised the GEMM direct kernel in four ways for ↵Cedric Nugteren
transposing/non-transposing: NN, NT, TN, TT
2016-10-02Split the GEMM direct kernel into two files; set the default tuning target ↵Cedric Nugteren
to 256-256-256
2016-10-01Added padding to the local memory of the GEMM direct kernelCedric Nugteren
2016-09-25Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, ↵Cedric Nugteren
NWGD and KWGD into one WGD parameter
2016-09-25Separated the tuning parameters of the new direct GEMM kernel from the ↵Cedric Nugteren
indirect version
2016-09-25Added a first version of the direct version of GEMM with local memoryCedric Nugteren
2016-09-21Merge branch 'development' into gemm_directCedric Nugteren
2016-09-12Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ↵Cedric Nugteren
can't handle long strings
2016-09-04The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ↵Cedric Nugteren
problems if C contains NaNs
2016-08-20Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into ↵Cedric Nugteren
dvasschemacq-master Conflicts: src/kernels/level1/xaxpy.opencl src/kernels/level2/xgemv.opencl src/kernels/level2/xgemv_fast.opencl src/kernels/level2/xger.opencl src/kernels/level2/xher.opencl src/kernels/level2/xher2.opencl src/kernels/level3/xgemm_part2.opencl
2016-08-18Adapt opencl files for 1.1 OpenCLD. Van Assche
In OpenCL 1.1 __kernel has to be before __attribute__, at least with Vivante compiler.
2016-07-26Merge branch 'development' into gemm_directCedric Nugteren
2016-07-23Fixe a bug in the new XgemvFastRot kernel related to local memory sizeCedric Nugteren
2016-07-23Further improvements to the XgemvFastRot kernel, properly enables coalescing nowCedric Nugteren
2016-07-23Improved the XgemvFastRot kernel by tiled loading of the input matrix A, ↵Cedric Nugteren
enabling better memory performance
2016-07-17Improved the GEMM direct kernel by adding register blocking. Still not fast ↵Cedric Nugteren
though
2016-07-16Created infrastructure to support a direct GEMM kernel; added correct but ↵Cedric Nugteren
slow reference kernel as a place-holder
2016-07-10Now passing alpha/beta to the kernel as arguments as before fp16 support; in ↵Cedric Nugteren
case of fp16 arguments are cast on host and in kernel
2016-06-16Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, ↵Cedric Nugteren
and/or transposing
2016-06-14Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) ↵Cedric Nugteren
and renamed files and functions appropriately
2016-06-08Added global memory synchronisation for better cache performance on ARM Mali ↵Cedric Nugteren
GPUs
2016-05-22Prepared the GER kernels and tuner for half-precision supportCedric Nugteren
2016-05-22Prepared the GEMV kernels and tuner for half-precision supportCedric Nugteren
2016-05-18Merged in latest changes from 0.7.1 releaseCedric Nugteren
2016-05-16Prepared GEMM and supporting kernels and tuners for half-precision supportCedric Nugteren