summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-08-18Added GCC 4.8 and updated CMakeCNugteren
2015-08-18Added initial .travis.yml fileCNugteren
2015-08-13Merge pull request #21 from CNugteren/c_apiCedric Nugteren
Added a plain C API
2015-08-13Added the plain C APICNugteren
2015-08-13Added all supported routines to the C APICNugteren
2015-08-13Fixed a complex data-type bug in the transpose kernelCNugteren
2015-08-13Added SGEMM example using the C APICNugteren
2015-08-13Added initial version of C API with just one routineCNugteren
2015-08-13Added argument m,n,k metadata to JSON filesCNugteren
2015-08-09Refactored the tuners, added JSON outputCNugteren
2015-08-04Merge pull request #19 from CNugteren/basic_level2_routinesCedric Nugteren
Level-2 routines: HEMV and SYMV
2015-08-04Added distinguished names for GEMV inherited HEMV/SYMVCNugteren
2015-08-03Abstracted loading of matrix A for GEMV kernelCNugteren
2015-07-31Added HEMV and SYMVCNugteren
2015-07-31Added HEMV and SYMVCNugteren
2015-07-31Added HEMV routineCNugteren
2015-07-31Added SYMV routineCNugteren
2015-07-31Merge pull request #18 from CNugteren/correctness_test_refactoringCedric Nugteren
Refactored the correctness tests
2015-07-31Refactored the correctness testsCNugteren
2015-07-31Merge pull request #17 from CNugteren/clblas_externalCedric Nugteren
Removed clBLAS sources
2015-07-31Updated documentation reflecting removal of clBLAS sourcesCNugteren
2015-07-31Removed clBLAS source code, now requires separate installationCNugteren
2015-07-27Moved the preferred options of clBLAS (no tests) to the CLBlast CMakeLists fileCNugteren
2015-07-27Merge pull request #16 from CNugteren/claduc_headerCedric Nugteren
Now using the new Claduc C++11 OpenCL header
2015-07-27Now using the new Claduc C++11 OpenCL headerCNugteren
2015-07-24Prepared the changelog for the next releaseCNugteren
2015-07-24Updated to version 0.3.0CNugteren
2015-07-24Merge pull request #14 from CNugteren/amd_performanceCedric Nugteren
Improved performance for AMD GPUs
2015-07-24Updated the docs to reflect the performance improvementsCNugteren
2015-07-23Updated the performance results, added HD7950CNugteren
2015-07-22Made the graph script robust against diagnostic system messagesCNugteren
2015-07-22Set the correct name for AMD OpenCL devicesCNugteren
2015-07-22Updated GEMM tuning results for TahitiCNugteren
2015-07-22Added workgroup shuffle option to transpose kernel for AMD GPUsCNugteren
2015-07-21Transpose kernel now uses vectorized local memory loads and storesCNugteren
2015-07-19Triangular GEMM kernels are only compiled when neededCNugteren
2015-07-19Kernel caching is now based on a routine's nameCNugteren
2015-07-19The kernel source string is now a routine's member variableCNugteren
2015-07-19Fixed complex performance on Intel IrisCNugteren
2015-07-16Fixed a bug when using the Xgemm kernel without local memoryCNugteren
2015-07-16Using mad() instruction for AMD devices like clBLAS doesCNugteren
2015-07-15Merge pull request #13 from CNugteren/bypass_pre_post_processingCedric Nugteren
Bypass pre/post-processing
2015-07-15Updated changelog with pre/post-processing bypassCNugteren
2015-07-15Changed performance graphs to default to column-majorCNugteren
2015-07-15Skips pre/post processing kernels if not neededCNugteren
2015-07-13Updated interface of the PadCopyTransposeMatrix methodCNugteren
2015-07-12Merge pull request #12 from CNugteren/level_subfoldersCedric Nugteren
Added subfolders for the level1/2/3 routines
2015-07-12Added subfolders for the level1/2/3 routinesCNugteren
2015-07-12Merge pull request #11 from CNugteren/level3_routines_2Cedric Nugteren
Added level-3 routines
2015-07-12Added HEMM, HERK, HER2K, and TRMMCNugteren