Age | Commit message (Expand) | Author |
2017-03-11 | Added initial naive version of the batched GEMM routine based on the direct G... | Cedric Nugteren |
2017-03-10 | Added proper testing of the alpha parameter; finalized the batched AXPY imple... | Cedric Nugteren |
2017-03-08 | Implemented a batched version of the AXPY kernel | Cedric Nugteren |
2017-03-08 | Make batched routines based on offsets instead of a vector of cl_mem objects ... | Cedric Nugteren |
2017-03-04 | Added a proper data-preparation function for the TRSM tests | Cedric Nugteren |
2017-02-26 | Fixed an out-of-bounds memory access when filling a matrix with a constant | Cedric Nugteren |
2017-02-26 | Fixes division in the kernel for inversion of complex numbers | Cedric Nugteren |
2017-02-25 | Added PrepareData function for TRSM to create proper test input | Cedric Nugteren |
2017-02-05 | Merge branch 'development' into triangular_solvers | Cedric Nugteren |
2017-02-05 | Fixed complex version of the TRSV kernel | Cedric Nugteren |
2017-02-04 | Improved substition kernels a bit; added complex support | Cedric Nugteren |
2017-02-04 | Completed a first STRSV implementation | Cedric Nugteren |
2017-01-29 | Added first (incomplete) version of TRSV routine | Cedric Nugteren |
2017-01-18 | Added first version of the TRSM routine based on the diagonal invert kernel | Cedric Nugteren |
2017-01-15 | Added a first version of the diagonal block invert routine in preparation of ... | Cedric Nugteren |
2017-01-07 | Always enables cl_khr_fp64 when running double-precision, not just for OpenCL... | Cedric Nugteren |
2016-12-18 | Fixed a bug when using offsets in the direct GEMM kernels | Cedric Nugteren |
2016-10-22 | Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with speci... | Cedric Nugteren |
2016-10-22 | Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with speci... | Cedric Nugteren |
2016-10-03 | Fixed a const-correctness issue with complex conjugation in the GEMM direct k... | Cedric Nugteren |
2016-10-03 | Added functions to load from off-chip to local memory without vector loads fo... | Cedric Nugteren |
2016-10-03 | Re-organised GEMM direct kernel and added faster fall-back version for incomp... | Cedric Nugteren |
2016-10-02 | Specialised the GEMM direct kernel in four ways for transposing/non-transposi... | Cedric Nugteren |
2016-10-02 | Split the GEMM direct kernel into two files; set the default tuning target to... | Cedric Nugteren |
2016-10-01 | Added padding to the local memory of the GEMM direct kernel | Cedric Nugteren |
2016-09-25 | Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, ... | Cedric Nugteren |
2016-09-25 | Separated the tuning parameters of the new direct GEMM kernel from the indire... | Cedric Nugteren |
2016-09-25 | Added a first version of the direct version of GEMM with local memory | Cedric Nugteren |
2016-09-21 | Merge branch 'development' into gemm_direct | Cedric Nugteren |
2016-09-12 | Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ... | Cedric Nugteren |
2016-09-04 | The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ... | Cedric Nugteren |
2016-08-20 | Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvassch... | Cedric Nugteren |
2016-08-18 | Adapt opencl files for 1.1 OpenCL | D. Van Assche |
2016-07-26 | Merge branch 'development' into gemm_direct | Cedric Nugteren |
2016-07-23 | Fixe a bug in the new XgemvFastRot kernel related to local memory size | Cedric Nugteren |
2016-07-23 | Further improvements to the XgemvFastRot kernel, properly enables coalescing now | Cedric Nugteren |
2016-07-23 | Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enab... | Cedric Nugteren |
2016-07-17 | Improved the GEMM direct kernel by adding register blocking. Still not fast t... | Cedric Nugteren |
2016-07-16 | Created infrastructure to support a direct GEMM kernel; added correct but slo... | Cedric Nugteren |
2016-07-10 | Now passing alpha/beta to the kernel as arguments as before fp16 support; in ... | Cedric Nugteren |
2016-06-16 | Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and... | Cedric Nugteren |
2016-06-14 | Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) a... | Cedric Nugteren |
2016-06-08 | Added global memory synchronisation for better cache performance on ARM Mali ... | Cedric Nugteren |
2016-05-22 | Prepared the GER kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-22 | Prepared the GEMV kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-18 | Merged in latest changes from 0.7.1 release | Cedric Nugteren |
2016-05-16 | Prepared GEMM and supporting kernels and tuners for half-precision support | Cedric Nugteren |
2016-05-14 | Set kernel arguments for AXPY as constant memory buffers, making it possible ... | Cedric Nugteren |
2016-05-13 | Initial experimental version of the half-precision HAXPY routine | Cedric Nugteren |
2016-05-12 | Initial changes in preparation for half-precision fp16 support | Cedric Nugteren |