Age | Commit message (Expand) | Author |
2016-07-23 | Further improvements to the XgemvFastRot kernel, properly enables coalescing now | Cedric Nugteren |
2016-07-23 | Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enab... | Cedric Nugteren |
2016-07-16 | Fixed some more types and type conversions in the clpp11 interface to OpenCL | Cedric Nugteren |
2016-07-16 | Merge pull request #80 from gcp/getdevinfo_fixes | Cedric Nugteren |
2016-07-16 | Removed an unused variable from the copy-transpose-pad function | Cedric Nugteren |
2016-07-13 | Make sure the passed types are large enough. | Gian-Carlo Pascutto |
2016-07-10 | Now passing alpha/beta to the kernel as arguments as before fp16 support; in ... | Cedric Nugteren |
2016-07-10 | Added tuning results for AMD Oland and for Intel Graphics HD 530 | Cedric Nugteren |
2016-07-10 | Fixed a bug related to the cache and retrieval of programs based on the OpenC... | Cedric Nugteren |
2016-07-08 | Cache now compares cl_context instead of a pointer to a context; added verbos... | Cedric Nugteren |
2016-07-06 | Added a VERBOSE mode to debug performance: now prints details about compilati... | Cedric Nugteren |
2016-07-06 | Added an option to the performance clients to do a warm-up run before timing | Cedric Nugteren |
2016-07-03 | Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp) | Cedric Nugteren |
2016-07-02 | Ensure clGetKernelWorkGroupInfo return value fits. | Gian-Carlo Pascutto |
2016-07-02 | Fixed some memory leaks related to events not properly cleaned-up | Cedric Nugteren |
2016-06-30 | Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dll... | Cedric Nugteren |
2016-06-29 | Updated to version 6.0 of the CLCudaAPI header | Cedric Nugteren |
2016-06-28 | Made it possible to build the clients and tests on Windows using Visual Studio | CNugteren |
2016-06-27 | Fixes for the AppVeyor Windows build | Cedric Nugteren |
2016-06-19 | Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' ... | Cedric Nugteren |
2016-06-19 | Renamed all C++ source files to .cpp to match the .hpp extension better | Cedric Nugteren |
2016-06-18 | Moved all headers into the source tree, changed headers to .hpp extension | Cedric Nugteren |
2016-06-18 | Clean-up of the routine class, moved RunKernel to the routine/common file | Cedric Nugteren |
2016-06-18 | Removed the template from the Routine base-class | Cedric Nugteren |
2016-06-17 | Removed the precision argument from the routines in favor of a single templat... | Cedric Nugteren |
2016-06-17 | Removed the interface to the cache functions from the Routine class, calls th... | Cedric Nugteren |
2016-06-17 | Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine c... | Cedric Nugteren |
2016-06-17 | Moved the test-for-valid-buffers function from the Routine class to separate ... | Cedric Nugteren |
2016-06-16 | Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and... | Cedric Nugteren |
2016-06-15 | Added some constness to variables related to the GEMM routines | Cedric Nugteren |
2016-06-14 | Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) a... | Cedric Nugteren |
2016-06-14 | Moved device vendor and type checks to a common header | Cedric Nugteren |
2016-06-14 | Added support for FP16 on ARM Mali-T628 (officially not supported) | Cedric Nugteren |
2016-06-08 | Added global memory synchronisation for better cache performance on ARM Mali ... | Cedric Nugteren |
2016-05-26 | Added half-precision tests for the clBLAS reference through conversion to sin... | Cedric Nugteren |
2016-05-25 | Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM | Cedric Nugteren |
2016-05-24 | Added proper argument handling and displaying for half-precision data-types | Cedric Nugteren |
2016-05-22 | Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2 | Cedric Nugteren |
2016-05-22 | Fixed tuning results for half-precision; added first results for the xGER ker... | Cedric Nugteren |
2016-05-22 | Prepared the GER kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-22 | Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSB... | Cedric Nugteren |
2016-05-22 | Added first tuning results for the half-precision xGEMV kernels | Cedric Nugteren |
2016-05-22 | Prepared the GEMV kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-22 | Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASU... | Cedric Nugteren |
2016-05-22 | Added first tuning results for the half-precision xDOT kernels | Cedric Nugteren |
2016-05-22 | Added half-precision support for all level 1 routines | Cedric Nugteren |
2016-05-18 | Merged in latest changes from 0.7.1 release | Cedric Nugteren |
2016-05-16 | Added half precision tuning results for supporting kernels (pad, copy, transp... | Cedric Nugteren |
2016-05-16 | Prepared GEMM and supporting kernels and tuners for half-precision support | Cedric Nugteren |
2016-05-15 | Added header with conversions from and to half-precision floating-point | Cedric Nugteren |