summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-06Fixed a performance overhead in database creation: it is again a static ↵Cedric Nugteren
variable now as it was before
2018-01-06Added CUDA interface to get temporary-buffer size for GEMM routineCedric Nugteren
2018-01-04Added a CUDA version of the GEMM temp-buffer optional argumentCedric Nugteren
2018-01-04Updated the generator script to automatically generate the temp-buffer codeCedric Nugteren
2018-01-03Updated the ROADMAPCedric Nugteren
2018-01-03Added the temp-buffer to the GEMM testers and clientsCedric Nugteren
2018-01-03Added a queue argument to the get-size function when running the tests/clientsCedric Nugteren
2018-01-01Merge pull request #236 from CNugteren/trsm_compilationCedric Nugteren
Fixed compilation of TRSM/Invert for AMD APP
2017-12-31Fixed the issue with AMD's APP compiler not being able to compile the invert ↵Cedric Nugteren
kernel
2017-12-31Revert "Added a simple test to check compilation of the invert kernels ↵Cedric Nugteren
(issue with AMD APP)" This reverts commit 0eb9b35481531d5ddc7e22371a44a12dc0e69c50.
2017-12-31Revert "Added options to disable parts of the invert kernel to find out ↵Cedric Nugteren
where the AMD compiler crashes" This reverts commit 407ed52cec41445f02e85cb45d08f590960216bb.
2017-12-31Made plotting script more flexible: extra argument to set the comparison libraryCedric Nugteren
2017-12-31Changed the invert kernel slightly; added part1a/part1b disable-definesCedric Nugteren
2017-12-30Fixed ifdef's into ifndef'sCedric Nugteren
2017-12-30Added options to disable parts of the invert kernel to find out where the ↵Cedric Nugteren
AMD compiler crashes
2017-12-30Added optional temp-buffer argument to C++ interface of GEMMCedric Nugteren
2017-12-28Added interface to compute the required temporary buffer size for GEMMCedric Nugteren
2017-12-28Factored out argument processing from the GEMM routineCedric Nugteren
2017-12-28Refactored GEMM code in preparation of separate temp-buffer computationCedric Nugteren
2017-12-27Merge pull request #234 from CNugteren/database_compilation_splitCedric Nugteren
Database compilation split
2017-12-27Added a simple test to check compilation of the invert kernels (issue with ↵Cedric Nugteren
AMD APP)
2017-12-27Simplified invert kernel a littleCedric Nugteren
2017-12-27Split the database into multiple small compilation unitsCedric Nugteren
2017-12-26Made the database-vector a non-static memberCedric Nugteren
2017-12-24Fixes for the CUDA backend of CLBlastCedric Nugteren
2017-12-24Fixed linking of the preprocessor test for MSVCCedric Nugteren
2017-12-24Added a note that the ArrayFire Jenkins servers are down, being switched to ↵Cedric Nugteren
buildbot
2017-12-23Fixed unused variable warnings showing up with ClangCedric Nugteren
2017-12-23Updated the tuning results for the IvyBridge M GT2 GPUCedric Nugteren
2017-12-23Added defines to disable OpenCL deprecation warningsCedric Nugteren
2017-12-23Fixed a warning under MSVCCedric Nugteren
2017-12-23Merge pull request #232 from CNugteren/feature/more_tunersCedric Nugteren
First tuners for the TRSV (block size) and TRSM (invert kernel) routines
2017-12-23Now calling main TRSV routine again to fix compilation in MSVCCedric Nugteren
2017-12-23Split the invert kernel in two parts to prevent error C1091 in MSVC 2013Cedric Nugteren
2017-12-23Updated the database to use the new TRSV and Invert tunersCedric Nugteren
2017-12-23Added TRSV block-size tunerCedric Nugteren
2017-12-21Fixed AppVeyor issueCedric Nugteren
2017-12-21Fixed AppVeyor issueCedric Nugteren
2017-12-21Merge branch 'master' into feature/more_tunersCedric Nugteren
2017-12-20Made plotting script more resilient to missing dataCedric Nugteren
2017-12-20Added tuning results for Apple AMD Radeon Pro 580Cedric Nugteren
2017-12-20Added try-except to database script parser to skip invalid filesCedric Nugteren
2017-12-19Added skeleton for a tuner for the invert kernelCedric Nugteren
2017-12-18Reformatted tuning code to make compilation fasterCedric Nugteren
2017-12-17Fixed an issue with the tuner: it was using platform vendor rather than ↵Cedric Nugteren
device vendor
2017-12-17Merge pull request #230 from CNugteren/kernel_preprocessorCedric Nugteren
Added an OpenCL kernel preprocessor
2017-12-17Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 ↵Cedric Nugteren
results based on kernel pre-processor
2017-12-17Fixed an unnecessary overflow issue on 32-bit systemsCedric Nugteren
2017-12-16Updated the known issuesCedric Nugteren
2017-12-10Fixed for error C1091 in MSVC 2013Cedric Nugteren