summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-15Added header with conversions from and to half-precision floating-pointCedric Nugteren
2016-05-15Updated the performance graph for the Radeon M370X AMD GPUcnugteren
2016-05-15Added new tuning results for SGEMM and updated the performance graph for the ↵cnugteren
Radeon M370X AMD GPU
2016-05-15Removed comparison to CBLAS for the graph scriptscnugteren
2016-05-15Fixed a bug in the xGEMM routine related to the event incorrectly setcnugteren
2016-05-15Fixed the arguments in the performance graphs to reflect the changes in enum ↵cnugteren
values
2016-05-15Added support for staggered/shuffled offsets for GEMM to improve performance ↵cnugteren
for large power-of-2 kernels on AMD GPUs
2016-05-14Set kernel arguments for AXPY as constant memory buffers, making it possible ↵Cedric Nugteren
to transfer half-precision values as well
2016-05-13Initial experimental version of the half-precision HAXPY routineCedric Nugteren
2016-05-12Initial changes in preparation for half-precision fp16 supportCedric Nugteren
2016-05-10Fixed links in the READMECedric Nugteren
2016-05-08Prepared the changelog for the next releaseCedric Nugteren
2016-05-08Fixes for compilation of the tests under Visual Studio 2015CNugteren
2016-05-08Updated to version 0.7.0Cedric Nugteren
2016-05-08Fixed an issue where the xAMAX tester would incorrectly report failures when ↵cnugteren
testing against CBLAS
2016-05-08Fixed an issue where the xNRM2 and xASUM testers would incorrectly report ↵cnugteren
failures for complex inputs
2016-05-08Fixed errors in xAXPY and xSCAL tests on AMD hardwarecnugteren
2016-05-08Fixed an issue with computing the GFLOPS numbers for the xGEMM performance ↵cnugteren
tests for non-square matrices
2016-05-08Added preliminary generated API documentationCedric Nugteren
2016-05-07Added an option to the tests to control whether to test against clBLAS or a ↵Cedric Nugteren
CPU BLAS library
2016-05-05Added printing of indices when testing in verbose modeCedric Nugteren
2016-05-05Merge pull request #57 from dividiti/developmentCedric Nugteren
Locate the C BLAS library before the F77 one.
2016-05-05Locate the C BLAS library before the F77 one.Anton Lokhmotov
2016-05-04Fixed an issue with linking against the ATLAS BLAS libraryCedric Nugteren
2016-05-02Added tuning results for AMD Hawaii (R9 290X)Cedric Nugteren
2016-05-02Fixed the calculation of the required buffer sizes in case of subvectors and ↵Cedric Nugteren
submatrices
2016-05-01Added tuning results for AMD Pitcairn (R9 270X)Cedric Nugteren
2016-05-01Updated tuning database for reduction/dot kernels based on the new tuner; ↵Cedric Nugteren
partially repopulated the database
2016-05-01Made the default xDOT tuning size smallerCedric Nugteren
2016-05-01Changed the index buffer of IxAMAX routines to unsigned int for proper ↵Cedric Nugteren
buffersize checking
2016-05-01Added a program cache (per-context) next to the per-device binary cacheCedric Nugteren
2016-04-30Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAXCedric Nugteren
2016-04-29Added an example to demonstrate the use of the ClearCache and FillCache ↵Cedric Nugteren
functions
2016-04-29Added FillCache: a function to pre-compile all kernels for a specific deviceCedric Nugteren
2016-04-29Added sample C programs for the SASUM and DGEMV routinesCedric Nugteren
2016-04-28Fixed the cache to store binaries instead of OpenCL programsCedric Nugteren
2016-04-27Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM ↵Cedric Nugteren
and IxAMAX
2016-04-27Added missing namespace to the SGEMM exampleCedric Nugteren
2016-04-27Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute ↵Cedric Nugteren
counterparts of xASUM and IxAMAX)
2016-04-27Moved all cache-related functions to a separate file; added a ↵Cedric Nugteren
ClearCompiledProgramCache function to clear the cache
2016-04-27Relaxed the absolute error margin for floating-point value comparisons to 1e-4Cedric Nugteren
2016-04-27Added a '-verbose' option to the test binaries to report errors in more ↵Cedric Nugteren
detail if needed
2016-04-27All CLBlast enum constants now have the same raw values as in the cblas standardCedric Nugteren
2016-04-20Merge branch 'level1_routines' into developmentcnugteren
2016-04-20Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routinescnugteren
2016-04-20Added prototype for ixAMAX routinescnugteren
2016-04-14Updated the reduction-kernel tuner to also tune the epiloguecnugteren
2016-04-14Added support for the SASUM/DASUM/ScASUM/DzASUM routinescnugteren
2016-04-13Added prototype for xASUM routinescnugteren
2016-04-11Fixed the way the defaults are calculated in the database; added warning for ↵cnugteren
non-matching tuner arguments