summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-07Added various GEMM routine tuning resultsCedric Nugteren
2017-11-06Improved the way the database defaults are computedCedric Nugteren
2017-11-06Changed GEMM routine tuner's scoring to use L2 measure instead for better ↵Cedric Nugteren
averaging
2017-11-02Integrated the GEMM routine tuner for kernel selection; added first tuning ↵Cedric Nugteren
results
2017-11-02Fixed a bug in database compression/decompressionCedric Nugteren
2017-10-30Added collecting and printing of scores for the kernel-selection tunerCedric Nugteren
2017-10-28Added initial version of a GEMM kernel selection tunerCedric Nugteren
2017-10-28Moved timing function to a separate fileCedric Nugteren
2017-10-27Fixed a bug when using the matrix A-offset argument for the TRSM routineCedric Nugteren
2017-10-27Reduced TRSM block-size for better numerical stabilityCedric Nugteren
2017-10-27Added GEMV synchronisation for the TRSV routine: similar bug as in TRSMCedric Nugteren
2017-10-27Added a DTRSM C++ interface exampleCedric Nugteren
2017-10-25Fixed small bug in (unused) invert testerCedric Nugteren
2017-10-25Updated roadmap with links to issues and statusCedric Nugteren
2017-10-25Fixed a bug in TRSM routine due to missing event synchronisations after GEMM ↵Cedric Nugteren
calls
2017-10-23Merge pull request #206 from matze/use-gnuinstall-dirsCedric Nugteren
Use GNUInstallDirs to determine install paths
2017-10-23Use GNUInstallDirs to determine install pathsMatthias Vogelgesang
The GNUInstallDirs module* provides variables that match the install directories for GNU Software and allows users to override them. Without hardcoding paths packagers can choose library paths according to distribution policies (i.e. lib, lib64, lib<arch>, ...). * https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-20Added first version of a roadmapCedric Nugteren
2017-10-20Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core ↵Cedric Nugteren
i5-4570
2017-10-20Merge pull request #204 from CNugteren/cuda_apiCedric Nugteren
Cuda API to CLBlast
2017-10-18Moved CUmodule code from Kernel to Program class to not require ↵Cedric Nugteren
re-compilation every time
2017-10-17Fix an incompatibility with CUDA's FP16 definitionCedric Nugteren
2017-10-17Made buffers of batched routines read/write (was: read-only)Cedric Nugteren
2017-10-17CUDA kernel compilation fixesCedric Nugteren
2017-10-16Added CUDA API documentationCedric Nugteren
2017-10-16Made all CUDA kernel launches synchronous; removed exception raisingCedric Nugteren
2017-10-15Added a missing OpenCL-to-CUDA function translationCedric Nugteren
2017-10-15Fixed a small copy-paste typoCedric Nugteren
2017-10-15Modified test interfaces such that they support either OpenCL or CUDACedric Nugteren
2017-10-15Fixes for the CUDA API: first tests pass and the client runsCedric Nugteren
2017-10-15Added the SM-compute-arch version to the nv compile optionsCedric Nugteren
2017-10-15Prepared test and client infrastructure for use with the CUDA APICedric Nugteren
2017-10-15Various fixes to make the first CUDA examples workCedric Nugteren
2017-10-14Fixed a kernel/attribute order bug in the direct GEMM kernelsCedric Nugteren
2017-10-14Make local memory pointers a define in OpenCL; some fixes to the recently ↵Cedric Nugteren
changed transpose kernel code
2017-10-14Made transpose kernel struct init proper according to the C standardCedric Nugteren
2017-10-14Added an option to choose whether to override the MSVC flags from /MT to /MD ↵Cedric Nugteren
(default ON)
2017-10-14Fixed several (not all) CUDA kernel compilation issuesCedric Nugteren
2017-10-14Added DAXPY example for the CUDA APICedric Nugteren
2017-10-14Various fixes to make the host code and sample compile with the CUDA APICedric Nugteren
2017-10-14Added first untested CUDA sampleCedric Nugteren
2017-10-14Added OpenCL to CUDA translation header for the kernelsCedric Nugteren
2017-10-12CUDA API now takes context and device in instead of streamCedric Nugteren
2017-10-11Added first (untested) version of a CUDA APICedric Nugteren
2017-10-09Fixed the Python generator script w.r.t. the recent change of testing ↵Cedric Nugteren
direct/in-direct GEMM kernels separately
2017-10-09Removed include of clpp11.hpp in places other than utilities.hppCedric Nugteren
2017-10-09Made the half-precision header OpenCL-independentCedric Nugteren
2017-10-08Moved non-routine-specific API functions and includes to separate filesCedric Nugteren
2017-10-08Merge pull request #198 from CNugteren/cuda_api_preparationCedric Nugteren
Cuda API preparation
2017-10-08Moved the remaining OpenCL specific host code to the clpp11.h header where ↵Cedric Nugteren
it belongs