summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2018-01-04Added a CUDA version of the GEMM temp-buffer optional argumentCedric Nugteren
2018-01-04Updated the generator script to automatically generate the temp-buffer codeCedric Nugteren
2017-12-30Added optional temp-buffer argument to C++ interface of GEMMCedric Nugteren
2017-12-28Added interface to compute the required temporary buffer size for GEMMCedric Nugteren
2017-12-28Factored out argument processing from the GEMM routineCedric Nugteren
2017-12-28Refactored GEMM code in preparation of separate temp-buffer computationCedric Nugteren
2017-12-27Split the database into multiple small compilation unitsCedric Nugteren
2017-12-26Made the database-vector a non-static memberCedric Nugteren
2017-12-24Fixes for the CUDA backend of CLBlastCedric Nugteren
2017-12-23Fixed unused variable warnings showing up with ClangCedric Nugteren
2017-12-23Updated the tuning results for the IvyBridge M GT2 GPUCedric Nugteren
2017-12-23Added defines to disable OpenCL deprecation warningsCedric Nugteren
2017-12-23Fixed a warning under MSVCCedric Nugteren
2017-12-23Now calling main TRSV routine again to fix compilation in MSVCCedric Nugteren
2017-12-23Split the invert kernel in two parts to prevent error C1091 in MSVC 2013Cedric Nugteren
2017-12-23Updated the database to use the new TRSV and Invert tunersCedric Nugteren
2017-12-23Added TRSV block-size tunerCedric Nugteren
2017-12-21Merge branch 'master' into feature/more_tunersCedric Nugteren
2017-12-20Added tuning results for Apple AMD Radeon Pro 580Cedric Nugteren
2017-12-19Added skeleton for a tuner for the invert kernelCedric Nugteren
2017-12-18Reformatted tuning code to make compilation fasterCedric Nugteren
2017-12-17Fixed an issue with the tuner: it was using platform vendor rather than devic...Cedric Nugteren
2017-12-17Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results...Cedric Nugteren
2017-12-17Fixed an unnecessary overflow issue on 32-bit systemsCedric Nugteren
2017-12-10Fixed for error C1091 in MSVC 2013Cedric Nugteren
2017-12-10Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limitCedric Nugteren
2017-12-10Fixed a missing includeCedric Nugteren
2017-12-10Fixed an issue in the tuners to prevent error -14 from persisting (CL_EXEC_ST...Cedric Nugteren
2017-12-10Fixed an Android compilation issueCedric Nugteren
2017-12-09Completed kernel modifications for pre-processor of all other kernelsCedric Nugteren
2017-12-09Made the pre-processor run by default for ARM and Qualcomm GPUsCedric Nugteren
2017-12-09Modified the direct GEMM kernel to support array-to-register promotionCedric Nugteren
2017-12-09Reformatted GEMM kernel to support array-to-register promotionCedric Nugteren
2017-12-09Fixed defines parsing and substituting in pre-processor; fixed some variable ...Cedric Nugteren
2017-12-07Added register promotion to the main GEMM kernelCedric Nugteren
2017-12-05Improved array-to-register promotion, now handling function calls as wellCedric Nugteren
2017-12-03Added GEMM (direct and in-direct) to the pre-processor testing; modified the ...Cedric Nugteren
2017-12-03Added basic bracket parsing in defines and loop expressionsCedric Nugteren
2017-12-03Reformated transpose kernels for the pre-processor; extended the amount of testsCedric Nugteren
2017-12-03Improved array to register promotion in the pre-processorCedric Nugteren
2017-11-30Improved the pre-processor's handling of defines; added a special nested defi...Cedric Nugteren
2017-11-30Integrated pre-processor in compilation flow, default is still disabledCedric Nugteren
2017-11-29Reformatted unrollable kernel loops and added the new promote_to_registers pr...Cedric Nugteren
2017-11-29Extended the preprocessor tests to include CopyFast and CopyPadCedric Nugteren
2017-11-29Improves the array-to-register promotion in the pre-processorCedric Nugteren
2017-11-28Improved the kernel pre-processor in various waysCedric Nugteren
2017-11-27Added simple implementation of array-to-register promotionCedric Nugteren
2017-11-26Improved the for-loop pre-processingCedric Nugteren
2017-11-25Implemented first simple pre-processor: defines parser and loop unrolling bas...Cedric Nugteren
2017-11-25Moved string splitting functions; added string character removal functionCedric Nugteren