summaryrefslogtreecommitdiff
path: root/src/routines
AgeCommit message (Expand)Author
2018-01-07Added API and tests for new GemmStridedBatched routineCedric Nugteren
2018-01-06Reduced duplicate code in the batched GEMM implementationCedric Nugteren
2018-01-06Fixed the CUDA interface: replaced nullptr with 0Cedric Nugteren
2017-12-30Added optional temp-buffer argument to C++ interface of GEMMCedric Nugteren
2017-12-28Added interface to compute the required temporary buffer size for GEMMCedric Nugteren
2017-12-28Factored out argument processing from the GEMM routineCedric Nugteren
2017-12-28Refactored GEMM code in preparation of separate temp-buffer computationCedric Nugteren
2017-12-23Split the invert kernel in two parts to prevent error C1091 in MSVC 2013Cedric Nugteren
2017-12-23Updated the database to use the new TRSV and Invert tunersCedric Nugteren
2017-12-23Added TRSV block-size tunerCedric Nugteren
2017-12-10Fixed for error C1091 in MSVC 2013Cedric Nugteren
2017-12-10Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limitCedric Nugteren
2017-11-17Moved compilation function to separate file; removed dependency of tuners of ...Cedric Nugteren
2017-11-11Factored out the creation of the OpenCL header and the program compilationCedric Nugteren
2017-11-02Integrated the GEMM routine tuner for kernel selection; added first tuning re...Cedric Nugteren
2017-10-27Fixed a bug when using the matrix A-offset argument for the TRSM routineCedric Nugteren
2017-10-27Reduced TRSM block-size for better numerical stabilityCedric Nugteren
2017-10-27Added GEMV synchronisation for the TRSV routine: similar bug as in TRSMCedric Nugteren
2017-10-25Fixed a bug in TRSM routine due to missing event synchronisations after GEMM ...Cedric Nugteren
2017-10-17Made buffers of batched routines read/write (was: read-only)Cedric Nugteren
2017-10-09Removed include of clpp11.hpp in places other than utilities.hppCedric Nugteren
2017-10-08Moved non-routine-specific API functions and includes to separate filesCedric Nugteren
2017-10-07Fixed a small typoCedric Nugteren
2017-10-03Gemm in-direct implementation now uses only 1 larger instead of max 3 optiona...Cedric Nugteren
2017-09-19Fixed type conversion warnings under MSVC 2013Cedric Nugteren
2017-08-31Fixed a bug in im2col: process only valid channel IDsCedric Nugteren
2017-08-31Fixed a bug in im2col confusing first and second workgroup size; made im2col ...Cedric Nugteren
2017-08-24Merge branch 'master' into im_to_colCedric Nugteren
2017-08-24Completed im2col implementationCedric Nugteren
2017-08-21Merge pull request #173 from mcian/PSO_paramsCedric Nugteren
2017-08-19First version of im2col kernel, unoptimized but workingCedric Nugteren
2017-08-12Merge branch 'master' into im_to_colCedric Nugteren
2017-08-12Moved functions from the header to the .cpp file to prevent compiling the sam...Cedric Nugteren
2017-08-09Use cltune::SearchMethod enum instead of int valuesmcian
2017-07-31Restore direct GEMM to previous versionmcian
2017-07-25Minor optimization for the direct GEMM kernel: don't ceil m and n unnecessari...Cedric Nugteren
2017-07-12Relaxed requirement on a_ld and b_ld for batched GEMMCedric Nugteren
2017-07-02Added interface and stubs for the im2col routineCedric Nugteren
2017-06-18Fixed an overflow bug on 32-bit systems when chosing a GEMM kernelCedric Nugteren
2017-05-15Fixed an TRSM issue caused by incorrect block size calculationCedric Nugteren
2017-05-12Added the IxAMIN routines: absolute minimum version of IxAMAXCedric Nugteren
2017-05-12Fixed a bug in the TRSM routine; tests now passCedric Nugteren
2017-04-14Added a new Xaxpy kernel in between the regular and fast version inCedric Nugteren
2017-04-07Added some missing const-nessCedric Nugteren
2017-03-19Added an (optional) non-direct implementation of the batched GEMM routineCedric Nugteren
2017-03-19Added batched versions of the pad/copy/transpose kernelsCedric Nugteren
2017-03-11Added initial naive version of the batched GEMM routine based on the direct G...Cedric Nugteren
2017-03-10Added API and test infrastructure for the batched GEMM routineCedric Nugteren
2017-03-08Implemented a batched version of the AXPY kernelCedric Nugteren
2017-03-08Make batched routines based on offsets instead of a vector of cl_mem objects ...Cedric Nugteren