summaryrefslogtreecommitdiff
path: root/src/routine.cc
AgeCommit message (Collapse)Author
2016-06-19Renamed all C++ source files to .cpp to match the .hpp extension betterCedric Nugteren
2016-06-18Moved all headers into the source tree, changed headers to .hpp extensionCedric Nugteren
2016-06-18Clean-up of the routine class, moved RunKernel to the routine/common fileCedric Nugteren
2016-06-18Removed the template from the Routine base-classCedric Nugteren
2016-06-17Removed the interface to the cache functions from the Routine class, calls ↵Cedric Nugteren
them directly now
2016-06-17Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine ↵Cedric Nugteren
class
2016-06-17Moved the test-for-valid-buffers function from the Routine class to separate ↵Cedric Nugteren
functions in a separate file
2016-06-16Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, ↵Cedric Nugteren
and/or transposing
2016-06-15Added some constness to variables related to the GEMM routinesCedric Nugteren
2016-06-14Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) ↵Cedric Nugteren
and renamed files and functions appropriately
2016-06-14Moved device vendor and type checks to a common headerCedric Nugteren
2016-06-08Added global memory synchronisation for better cache performance on ARM Mali ↵Cedric Nugteren
GPUs
2016-05-18Merged in latest changes from 0.7.1 releaseCedric Nugteren
2016-05-12Initial changes in preparation for half-precision fp16 supportCedric Nugteren
2016-05-02Fixed the calculation of the required buffer sizes in case of subvectors and ↵Cedric Nugteren
submatrices
2016-05-01Changed the index buffer of IxAMAX routines to unsigned int for proper ↵Cedric Nugteren
buffersize checking
2016-05-01Added a program cache (per-context) next to the per-device binary cacheCedric Nugteren
2016-04-28Fixed the cache to store binaries instead of OpenCL programsCedric Nugteren
2016-04-27Moved all cache-related functions to a separate file; added a ↵Cedric Nugteren
ClearCompiledProgramCache function to clear the cache
2016-04-09Events are now properly implemented using event waiting list and asking the ↵cnugteren
user to wait for event completion
2016-03-14Made the library thread-safe by guarding the kernel cache with a mutexCedric Nugteren
2015-09-19Added infrastructure for packed matricesCNugteren
2015-09-14Added support for the dot buffer and offset argumentCNugteren
2015-07-27Now using the new Claduc C++11 OpenCL headerCNugteren
2015-07-19Kernel caching is now based on a routine's nameCNugteren
2015-07-19The kernel source string is now a routine's member variableCNugteren
2015-07-16Using mad() instruction for AMD devices like clBLAS doesCNugteren
2015-07-13Updated interface of the PadCopyTransposeMatrix methodCNugteren
2015-07-08Added option to set the imaginary part of the diagonal to zeroCNugteren
2015-06-23Added a condition to update only lower/upper triangular parts in the un-pad ↵CNugteren
kernels
2015-06-18Now returns program from database by referenceCNugteren
2015-06-16Added support for complex conjugate transposeCNugteren
2015-05-30Initial commit of preview versionCNugteren