summaryrefslogtreecommitdiff
path: root/scripts
AgeCommit message (Collapse)Author
2016-09-12Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are ↵Cedric Nugteren
now automatically taken from 32-bit if there are no entries at all
2016-09-11Complete re-write of the database script. Changed Pandas for the much faster ↵Cedric Nugteren
and convienient plain JSON/dict data-type
2016-09-10Updated database based on exhaustive tuning results for GEMM for the R9 ↵Cedric Nugteren
M370X GPU
2016-09-10Updated the database script to remove duplicate entries: keeps only the ↵Cedric Nugteren
best-performing cases for a specific parameters combination
2016-09-04Refactored the Python C++ generator script; now confirms to the PEP8 styleguideCedric Nugteren
2016-09-03Added tuning results for Intel Broadwell 5500 GT2 GPUCedric Nugteren
2016-09-03Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to ↵Cedric Nugteren
handle duplicate entries of different runs
2016-08-21Also changed the default-default for unknown device types to use the same ↵Cedric Nugteren
method as for known device groups
2016-08-21Updated the changelog; refactored the database-get-bests code a bitCedric Nugteren
2016-08-15Updated the database script to calculate the relative best performance of ↵Cedric Nugteren
tuning results common for a device/vendor type
2016-08-09Improved the speed of the new common-best defaults method for the database ↵Cedric Nugteren
generation
2016-08-07Added a first version of the database's common-best default calculationCedric Nugteren
2016-07-25Moved the XgemvFast and XgemvFastRot tuning database into a separate fileCedric Nugteren
2016-07-24Refactored the Python database script: separated functionality in modules, ↵Cedric Nugteren
now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up
2016-07-03Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)Cedric Nugteren
2016-07-02Prints the current pandas version and reports the minimum required versionCedric Nugteren
2016-06-30Added declspec(dllexport) to ClearCache and FillCache, and added ↵Cedric Nugteren
declspec(dllimport) when not building the library
2016-06-27Moved the performance graph scripts to the 'scripts' subfolderCedric Nugteren
2016-06-19Minor fix to the database scriptCedric Nugteren
2016-06-19Renamed all C++ source files to .cpp to match the .hpp extension betterCedric Nugteren
2016-06-18Moved all headers into the source tree, changed headers to .hpp extensionCedric Nugteren
2016-06-18Clean-up of the routine class, moved RunKernel to the routine/common fileCedric Nugteren
2016-06-16Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, ↵Cedric Nugteren
and/or transposing
2016-06-13Improved API documentation and added documentation for level-2 and level-3 ↵Cedric Nugteren
routines
2016-06-10Added documentation for the matrix-update level-2 family of routinesCedric Nugteren
2016-06-02Added return value to the test binaries (0: success, 1: failure), allowing ↵Cedric Nugteren
it to work under CTest properly
2016-05-26Added half-precision tests for the clBLAS reference through conversion to ↵Cedric Nugteren
single-precision
2016-05-26Added half-precision tests for the CBLAS reference through conversion to ↵Cedric Nugteren
single-precison
2016-05-25Added possibility to run the performance client with half-precisionCedric Nugteren
2016-05-25Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMMCedric Nugteren
2016-05-22Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2Cedric Nugteren
2016-05-22Added level-2 half-precision routines ↵Cedric Nugteren
HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
2016-05-22Added level-1 half-precision routines ↵Cedric Nugteren
HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
2016-05-18Merged in latest changes from 0.7.1 releaseCedric Nugteren
2016-05-13Initial experimental version of the half-precision HAXPY routineCedric Nugteren
2016-05-12Initial changes in preparation for half-precision fp16 supportCedric Nugteren
2016-05-08Fixed an issue where the xAMAX tester would incorrectly report failures when ↵cnugteren
testing against CBLAS
2016-05-08Fixed an issue where the xNRM2 and xASUM testers would incorrectly report ↵cnugteren
failures for complex inputs
2016-05-08Added preliminary generated API documentationCedric Nugteren
2016-05-04Fixed an issue with linking against the ATLAS BLAS libraryCedric Nugteren
2016-05-01Added tuning results for AMD Pitcairn (R9 270X)Cedric Nugteren
2016-05-01Updated tuning database for reduction/dot kernels based on the new tuner; ↵Cedric Nugteren
partially repopulated the database
2016-05-01Changed the index buffer of IxAMAX routines to unsigned int for proper ↵Cedric Nugteren
buffersize checking
2016-04-30Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAXCedric Nugteren
2016-04-29Added FillCache: a function to pre-compile all kernels for a specific deviceCedric Nugteren
2016-04-27Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM ↵Cedric Nugteren
and IxAMAX
2016-04-27Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute ↵Cedric Nugteren
counterparts of xASUM and IxAMAX)
2016-04-27Moved all cache-related functions to a separate file; added a ↵Cedric Nugteren
ClearCompiledProgramCache function to clear the cache
2016-04-27All CLBlast enum constants now have the same raw values as in the cblas standardCedric Nugteren
2016-04-20Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routinescnugteren