Age | Commit message (Expand) | Author |
2016-06-30 | Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dll... | Cedric Nugteren |
2016-06-29 | Updated to version 6.0 of the CLCudaAPI header | Cedric Nugteren |
2016-06-28 | Made it possible to build the clients and tests on Windows using Visual Studio | CNugteren |
2016-06-27 | Fixes for the AppVeyor Windows build | Cedric Nugteren |
2016-06-19 | Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' ... | Cedric Nugteren |
2016-06-19 | Renamed all C++ source files to .cpp to match the .hpp extension better | Cedric Nugteren |
2016-06-18 | Moved all headers into the source tree, changed headers to .hpp extension | Cedric Nugteren |
2016-06-18 | Clean-up of the routine class, moved RunKernel to the routine/common file | Cedric Nugteren |
2016-06-18 | Removed the template from the Routine base-class | Cedric Nugteren |
2016-06-17 | Removed the precision argument from the routines in favor of a single templat... | Cedric Nugteren |
2016-06-17 | Removed the interface to the cache functions from the Routine class, calls th... | Cedric Nugteren |
2016-06-17 | Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine c... | Cedric Nugteren |
2016-06-17 | Moved the test-for-valid-buffers function from the Routine class to separate ... | Cedric Nugteren |
2016-06-16 | Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and... | Cedric Nugteren |
2016-06-15 | Added some constness to variables related to the GEMM routines | Cedric Nugteren |
2016-06-14 | Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) a... | Cedric Nugteren |
2016-06-14 | Moved device vendor and type checks to a common header | Cedric Nugteren |
2016-06-14 | Added support for FP16 on ARM Mali-T628 (officially not supported) | Cedric Nugteren |
2016-06-08 | Added global memory synchronisation for better cache performance on ARM Mali ... | Cedric Nugteren |
2016-05-26 | Added half-precision tests for the clBLAS reference through conversion to sin... | Cedric Nugteren |
2016-05-25 | Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM | Cedric Nugteren |
2016-05-24 | Added proper argument handling and displaying for half-precision data-types | Cedric Nugteren |
2016-05-22 | Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2 | Cedric Nugteren |
2016-05-22 | Fixed tuning results for half-precision; added first results for the xGER ker... | Cedric Nugteren |
2016-05-22 | Prepared the GER kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-22 | Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSB... | Cedric Nugteren |
2016-05-22 | Added first tuning results for the half-precision xGEMV kernels | Cedric Nugteren |
2016-05-22 | Prepared the GEMV kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-22 | Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASU... | Cedric Nugteren |
2016-05-22 | Added first tuning results for the half-precision xDOT kernels | Cedric Nugteren |
2016-05-22 | Added half-precision support for all level 1 routines | Cedric Nugteren |
2016-05-18 | Merged in latest changes from 0.7.1 release | Cedric Nugteren |
2016-05-16 | Added half precision tuning results for supporting kernels (pad, copy, transp... | Cedric Nugteren |
2016-05-16 | Prepared GEMM and supporting kernels and tuners for half-precision support | Cedric Nugteren |
2016-05-15 | Added header with conversions from and to half-precision floating-point | Cedric Nugteren |
2016-05-14 | Set kernel arguments for AXPY as constant memory buffers, making it possible ... | Cedric Nugteren |
2016-05-13 | Initial experimental version of the half-precision HAXPY routine | Cedric Nugteren |
2016-05-12 | Initial changes in preparation for half-precision fp16 support | Cedric Nugteren |
2016-05-08 | Fixed errors in xAXPY and xSCAL tests on AMD hardware | cnugteren |
2016-05-02 | Fixed the calculation of the required buffer sizes in case of subvectors and ... | Cedric Nugteren |
2016-05-01 | Made the default xDOT tuning size smaller | Cedric Nugteren |
2016-05-01 | Changed the index buffer of IxAMAX routines to unsigned int for proper buffer... | Cedric Nugteren |
2016-05-01 | Added a program cache (per-context) next to the per-device binary cache | Cedric Nugteren |
2016-04-30 | Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX | Cedric Nugteren |
2016-04-29 | Added FillCache: a function to pre-compile all kernels for a specific device | Cedric Nugteren |
2016-04-28 | Fixed the cache to store binaries instead of OpenCL programs | Cedric Nugteren |
2016-04-27 | Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM an... | Cedric Nugteren |
2016-04-27 | Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterp... | Cedric Nugteren |
2016-04-27 | Moved all cache-related functions to a separate file; added a ClearCompiledPr... | Cedric Nugteren |
2016-04-20 | Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines | cnugteren |