summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2016-07-25Moved the XgemvFast and XgemvFastRot tuning database into a separate fileCedric Nugteren
2016-07-24Merge branch 'development' into gemv_performanceCedric Nugteren
2016-07-24Minor improvements after merging in groundwork for custom tuning parameters a...Cedric Nugteren
2016-07-23Fixe a bug in the new XgemvFastRot kernel related to local memory sizeCedric Nugteren
2016-07-23Further improvements to the XgemvFastRot kernel, properly enables coalescing nowCedric Nugteren
2016-07-23Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enab...Cedric Nugteren
2016-07-22clblast::Database, clblast::Routine: implement "database overlays" provided b...Ivan Shapovalov
2016-07-22clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, su...Ivan Shapovalov
2016-07-22cl::Kernel: skip NULL entries in waitForEventsIvan Shapovalov
2016-07-22clblast::RunKernel, cl::Kernel: take const vector as waitForEventsIvan Shapovalov
2016-07-22xgemm: do not hardcode kernel requirements for internal matrix layoutIvan Shapovalov
2016-07-17Improved the GEMM direct kernel by adding register blocking. Still not fast t...Cedric Nugteren
2016-07-16Created infrastructure to support a direct GEMM kernel; added correct but slo...Cedric Nugteren
2016-07-16Fixed some more types and type conversions in the clpp11 interface to OpenCLCedric Nugteren
2016-07-16Merge pull request #80 from gcp/getdevinfo_fixesCedric Nugteren
2016-07-16Removed an unused variable from the copy-transpose-pad functionCedric Nugteren
2016-07-13Make sure the passed types are large enough.Gian-Carlo Pascutto
2016-07-10Now passing alpha/beta to the kernel as arguments as before fp16 support; in ...Cedric Nugteren
2016-07-10Added tuning results for AMD Oland and for Intel Graphics HD 530Cedric Nugteren
2016-07-10Fixed a bug related to the cache and retrieval of programs based on the OpenC...Cedric Nugteren
2016-07-08Cache now compares cl_context instead of a pointer to a context; added verbos...Cedric Nugteren
2016-07-06Added a VERBOSE mode to debug performance: now prints details about compilati...Cedric Nugteren
2016-07-06Added an option to the performance clients to do a warm-up run before timingCedric Nugteren
2016-07-03Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)Cedric Nugteren
2016-07-02Ensure clGetKernelWorkGroupInfo return value fits.Gian-Carlo Pascutto
2016-07-02Fixed some memory leaks related to events not properly cleaned-upCedric Nugteren
2016-06-30Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dll...Cedric Nugteren
2016-06-29Updated to version 6.0 of the CLCudaAPI headerCedric Nugteren
2016-06-28Made it possible to build the clients and tests on Windows using Visual StudioCNugteren
2016-06-27Fixes for the AppVeyor Windows buildCedric Nugteren
2016-06-19Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' ...Cedric Nugteren
2016-06-19Renamed all C++ source files to .cpp to match the .hpp extension betterCedric Nugteren
2016-06-18Moved all headers into the source tree, changed headers to .hpp extensionCedric Nugteren
2016-06-18Clean-up of the routine class, moved RunKernel to the routine/common fileCedric Nugteren
2016-06-18Removed the template from the Routine base-classCedric Nugteren
2016-06-17Removed the precision argument from the routines in favor of a single templat...Cedric Nugteren
2016-06-17Removed the interface to the cache functions from the Routine class, calls th...Cedric Nugteren
2016-06-17Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine c...Cedric Nugteren
2016-06-17Moved the test-for-valid-buffers function from the Routine class to separate ...Cedric Nugteren
2016-06-16Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and...Cedric Nugteren
2016-06-15Added some constness to variables related to the GEMM routinesCedric Nugteren
2016-06-14Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) a...Cedric Nugteren
2016-06-14Moved device vendor and type checks to a common headerCedric Nugteren
2016-06-14Added support for FP16 on ARM Mali-T628 (officially not supported)Cedric Nugteren
2016-06-08Added global memory synchronisation for better cache performance on ARM Mali ...Cedric Nugteren
2016-05-26Added half-precision tests for the clBLAS reference through conversion to sin...Cedric Nugteren
2016-05-25Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMMCedric Nugteren
2016-05-24Added proper argument handling and displaying for half-precision data-typesCedric Nugteren
2016-05-22Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2Cedric Nugteren
2016-05-22Fixed tuning results for half-precision; added first results for the xGER ker...Cedric Nugteren