summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-16Removed dependency on CLTuneCedric Nugteren
2017-11-16Added printing of the best parameters for the new tunerCedric Nugteren
2017-11-15Added first version of integrated and re-written auto-tunerCedric Nugteren
2017-11-15Added kernel timing functionality to the utilitiesCedric Nugteren
2017-11-15Added exception handle with catch-allCedric Nugteren
2017-11-13Made the exception dispatch function optionally silentCedric Nugteren
2017-11-13Moved square-difference utility function for use in the tunersCedric Nugteren
2017-11-11Factored out the creation of the OpenCL header and the program compilationCedric Nugteren
2017-11-09Added tuning results for the GeForce GTX750TiCedric Nugteren
2017-11-08Updated to CLBlast version 1.2.0Cedric Nugteren
2017-11-08Fixed an FP16 issue in the homatcopy test; added a comment about improper ↵Cedric Nugteren
testing of integer returning functions for FP16
2017-11-07Merge pull request #212 from CNugteren/kernel_selection_tunerCedric Nugteren
GEMM kernel selection tuner
2017-11-07Updated the roadmapCedric Nugteren
2017-11-07Added various GEMM routine tuning resultsCedric Nugteren
2017-11-06Improved the way the database defaults are computedCedric Nugteren
2017-11-06Changed GEMM routine tuner's scoring to use L2 measure instead for better ↵Cedric Nugteren
averaging
2017-11-02Integrated the GEMM routine tuner for kernel selection; added first tuning ↵Cedric Nugteren
results
2017-11-02Fixed a bug in database compression/decompressionCedric Nugteren
2017-10-30Added collecting and printing of scores for the kernel-selection tunerCedric Nugteren
2017-10-30Merge branch 'binary_cache_platform_dependent'Cedric Nugteren
2017-10-29Added platform ID to the binary program cache to prevent issues with ↵Cedric Nugteren
multi-platform systems
2017-10-29Merge pull request #208 from CNugteren/android_supportCedric Nugteren
Added Android support
2017-10-29Made it possible to compile the CLBlast performance clients for Android with ↵Cedric Nugteren
the NDK
2017-10-29Added Android support using the GNU C++ STL library and the GCC toolchainCedric Nugteren
2017-10-28Merge branch 'master' into android_supportCedric Nugteren
2017-10-28Added initial version of a GEMM kernel selection tunerCedric Nugteren
2017-10-28Moved timing function to a separate fileCedric Nugteren
2017-10-27Fixed a bug when using the matrix A-offset argument for the TRSM routineCedric Nugteren
2017-10-27Reduced TRSM block-size for better numerical stabilityCedric Nugteren
2017-10-27Added GEMV synchronisation for the TRSV routine: similar bug as in TRSMCedric Nugteren
2017-10-27Added a DTRSM C++ interface exampleCedric Nugteren
2017-10-25Fixed small bug in (unused) invert testerCedric Nugteren
2017-10-25Updated roadmap with links to issues and statusCedric Nugteren
2017-10-25Fixed a bug in TRSM routine due to missing event synchronisations after GEMM ↵Cedric Nugteren
calls
2017-10-23Merge pull request #206 from matze/use-gnuinstall-dirsCedric Nugteren
Use GNUInstallDirs to determine install paths
2017-10-23Use GNUInstallDirs to determine install pathsMatthias Vogelgesang
The GNUInstallDirs module* provides variables that match the install directories for GNU Software and allows users to override them. Without hardcoding paths packagers can choose library paths according to distribution policies (i.e. lib, lib64, lib<arch>, ...). * https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-20Added first version of a roadmapCedric Nugteren
2017-10-20Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core ↵Cedric Nugteren
i5-4570
2017-10-20Merge pull request #204 from CNugteren/cuda_apiCedric Nugteren
Cuda API to CLBlast
2017-10-18Moved CUmodule code from Kernel to Program class to not require ↵Cedric Nugteren
re-compilation every time
2017-10-17Fix an incompatibility with CUDA's FP16 definitionCedric Nugteren
2017-10-17Made buffers of batched routines read/write (was: read-only)Cedric Nugteren
2017-10-17CUDA kernel compilation fixesCedric Nugteren
2017-10-16Added CUDA API documentationCedric Nugteren
2017-10-16Made all CUDA kernel launches synchronous; removed exception raisingCedric Nugteren
2017-10-15Added a missing OpenCL-to-CUDA function translationCedric Nugteren
2017-10-15Fixed a small copy-paste typoCedric Nugteren
2017-10-15Modified test interfaces such that they support either OpenCL or CUDACedric Nugteren
2017-10-15Fixes for the CUDA API: first tests pass and the client runsCedric Nugteren
2017-10-15Added the SM-compute-arch version to the nv compile optionsCedric Nugteren