summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2016-07-08Cache now compares cl_context instead of a pointer to a context; added ↵Cedric Nugteren
verbose print statements to the cache
2016-07-06Added a VERBOSE mode to debug performance: now prints details about ↵Cedric Nugteren
compilation and kernel execution to screen
2016-07-06Added an option to the performance clients to do a warm-up run before timingCedric Nugteren
2016-07-03Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)Cedric Nugteren
2016-07-02Ensure clGetKernelWorkGroupInfo return value fits.Gian-Carlo Pascutto
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account.
2016-07-02Fixed some memory leaks related to events not properly cleaned-upCedric Nugteren
2016-06-30Added declspec(dllexport) to ClearCache and FillCache, and added ↵Cedric Nugteren
declspec(dllimport) when not building the library
2016-06-29Updated to version 6.0 of the CLCudaAPI headerCedric Nugteren
2016-06-28Made it possible to build the clients and tests on Windows using Visual StudioCNugteren
2016-06-27Fixes for the AppVeyor Windows buildCedric Nugteren
2016-06-19Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' ↵Cedric Nugteren
(thanks to OursDesCavernes)
2016-06-19Renamed all C++ source files to .cpp to match the .hpp extension betterCedric Nugteren
2016-06-18Moved all headers into the source tree, changed headers to .hpp extensionCedric Nugteren
2016-06-18Clean-up of the routine class, moved RunKernel to the routine/common fileCedric Nugteren
2016-06-18Removed the template from the Routine base-classCedric Nugteren
2016-06-17Removed the precision argument from the routines in favor of a single ↵Cedric Nugteren
templated function
2016-06-17Removed the interface to the cache functions from the Routine class, calls ↵Cedric Nugteren
them directly now
2016-06-17Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine ↵Cedric Nugteren
class
2016-06-17Moved the test-for-valid-buffers function from the Routine class to separate ↵Cedric Nugteren
functions in a separate file
2016-06-16Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, ↵Cedric Nugteren
and/or transposing
2016-06-15Added some constness to variables related to the GEMM routinesCedric Nugteren
2016-06-14Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) ↵Cedric Nugteren
and renamed files and functions appropriately
2016-06-14Moved device vendor and type checks to a common headerCedric Nugteren
2016-06-14Added support for FP16 on ARM Mali-T628 (officially not supported)Cedric Nugteren
2016-06-08Added global memory synchronisation for better cache performance on ARM Mali ↵Cedric Nugteren
GPUs
2016-05-26Added half-precision tests for the clBLAS reference through conversion to ↵Cedric Nugteren
single-precision
2016-05-25Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMMCedric Nugteren
2016-05-24Added proper argument handling and displaying for half-precision data-typesCedric Nugteren
2016-05-22Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2Cedric Nugteren
2016-05-22Fixed tuning results for half-precision; added first results for the xGER ↵Cedric Nugteren
kernels
2016-05-22Prepared the GER kernels and tuner for half-precision supportCedric Nugteren
2016-05-22Added level-2 half-precision routines ↵Cedric Nugteren
HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
2016-05-22Added first tuning results for the half-precision xGEMV kernelsCedric Nugteren
2016-05-22Prepared the GEMV kernels and tuner for half-precision supportCedric Nugteren
2016-05-22Added level-1 half-precision routines ↵Cedric Nugteren
HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
2016-05-22Added first tuning results for the half-precision xDOT kernelsCedric Nugteren
2016-05-22Added half-precision support for all level 1 routinesCedric Nugteren
2016-05-18Merged in latest changes from 0.7.1 releaseCedric Nugteren
2016-05-16Added half precision tuning results for supporting kernels (pad, copy, ↵Cedric Nugteren
transpose, padtranspose)
2016-05-16Prepared GEMM and supporting kernels and tuners for half-precision supportCedric Nugteren
2016-05-15Added header with conversions from and to half-precision floating-pointCedric Nugteren
2016-05-14Set kernel arguments for AXPY as constant memory buffers, making it possible ↵Cedric Nugteren
to transfer half-precision values as well
2016-05-13Initial experimental version of the half-precision HAXPY routineCedric Nugteren
2016-05-12Initial changes in preparation for half-precision fp16 supportCedric Nugteren
2016-05-08Fixed errors in xAXPY and xSCAL tests on AMD hardwarecnugteren
2016-05-02Fixed the calculation of the required buffer sizes in case of subvectors and ↵Cedric Nugteren
submatrices
2016-05-01Made the default xDOT tuning size smallerCedric Nugteren
2016-05-01Changed the index buffer of IxAMAX routines to unsigned int for proper ↵Cedric Nugteren
buffersize checking
2016-05-01Added a program cache (per-context) next to the per-device binary cacheCedric Nugteren
2016-04-30Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAXCedric Nugteren