summaryrefslogtreecommitdiff
path: root/scripts
AgeCommit message (Collapse)Author
2018-01-06Added CUDA interface to get temporary-buffer size for GEMM routineCedric Nugteren
2018-01-04Added a CUDA version of the GEMM temp-buffer optional argumentCedric Nugteren
2018-01-04Updated the generator script to automatically generate the temp-buffer codeCedric Nugteren
2017-12-28Added interface to compute the required temporary buffer size for GEMMCedric Nugteren
2017-12-27Split the database into multiple small compilation unitsCedric Nugteren
2017-12-20Made plotting script more resilient to missing dataCedric Nugteren
2017-12-20Added tuning results for Apple AMD Radeon Pro 580Cedric Nugteren
2017-12-20Added try-except to database script parser to skip invalid filesCedric Nugteren
2017-11-20Made the database script properly handle multiple entries for a single deviceCedric Nugteren
2017-11-19Minor fix to the database scriptCedric Nugteren
2017-11-19Some fixed for the new auto-tuner to be compatible with the Python scriptsCedric Nugteren
2017-11-06Improved the way the database defaults are computedCedric Nugteren
2017-11-02Integrated the GEMM routine tuner for kernel selection; added first tuning ↵Cedric Nugteren
results
2017-11-02Fixed a bug in database compression/decompressionCedric Nugteren
2017-10-14Various fixes to make the host code and sample compile with the CUDA APICedric Nugteren
2017-10-12CUDA API now takes context and device in instead of streamCedric Nugteren
2017-10-11Added first (untested) version of a CUDA APICedric Nugteren
2017-10-09Fixed the Python generator script w.r.t. the recent change of testing ↵Cedric Nugteren
direct/in-direct GEMM kernels separately
2017-10-08Moved non-routine-specific API functions and includes to separate filesCedric Nugteren
2017-09-16Improved compilation time of the tuner databaseCedric Nugteren
2017-09-14Added architecture layer in the tuning database for better performance on ↵Cedric Nugteren
unseen devices
2017-09-12Added database compress and de-compress functionsCedric Nugteren
2017-09-11Database now works with new format of clblast_[property]Cedric Nugteren
2017-09-06Split the database files over multiple directories and files; first step ↵Cedric Nugteren
towards separate compilation
2017-07-02Added interface and stubs for the im2col routineCedric Nugteren
2017-06-25Fixed some Clang and MSVC warningsCedric Nugteren
2017-06-21Fixes some compilation issues related to the database structure changeCedric Nugteren
2017-06-20Changed the structure of the database to reduce compilation time and save memoryCedric Nugteren
2017-05-24changing "wb" to "w" when saving json file (text mode) - compatibility for ↵Grigori Fursin
Python 3
2017-05-12Added the IxAMIN routines: absolute minimum version of IxAMAXCedric Nugteren
2017-05-11Minor naming fixes to the benchmark scriptCedric Nugteren
2017-04-23Added an option to the database script to remove tuning results from the ↵Cedric Nugteren
database
2017-04-23Re-added Titan X (Pascal) tuning results based on more averaging when tuningCedric Nugteren
2017-04-21Merge branch 'development' into benchmarkingCedric Nugteren
2017-04-21Removed the words SUMMARY from the title of the benchmark script when ↵Cedric Nugteren
benchmarking the summary
2017-04-20Updated the settings for the batched benchmarksCedric Nugteren
2017-04-17Fixed a namespace clash with CUDA FP16 for the half-datatypeCedric Nugteren
2017-04-17Added proper handling of mismatched arguments in the database scriptCedric Nugteren
2017-04-16Set proper settings for the benchmarks of batched routinesCedric Nugteren
2017-04-16Merge branch 'development' into benchmarkingCedric Nugteren
2017-04-16Added settings for benchmarking batched routinesCedric Nugteren
2017-04-14Added a benchmark-all script to run multiple benchmarks automaticallyCedric Nugteren
2017-04-14Tuned the num-runs settings for the benchmarksCedric Nugteren
2017-04-14Added output-folder for benchmarking and removed the requirement on XCedric Nugteren
2017-04-14Made the number of runs a benchmark-specific setting in the benchmark scriptsCedric Nugteren
2017-04-13Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now ↵Cedric Nugteren
works
2017-04-11Made compilation of the cuBLAS wrapper work properlyCedric Nugteren
2017-04-10Merge branch 'development' into cublas_referenceCedric Nugteren
Conflicts: scripts/generator/generator.py
2017-04-10Removed const-vector-of-const-objects from the database class to remain ↵Cedric Nugteren
according to the C++11 standard
2017-04-06Completed the cuBLAS wrapperCedric Nugteren