summaryrefslogtreecommitdiff
path: root/src/routines
AgeCommit message (Expand)Author
2018-06-01Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when bar...Cedric Nugteren
2018-05-31Added error-checking for half-empty local work group sizes; fixed a minor TRS...Cedric Nugteren
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-05-30Widened Apple OpenCL check, added way to debug too-large-workgroups issueCedric Nugteren
2018-05-27Added a check to return 'NotImplemented' error code in case of systems with <...Cedric Nugteren
2018-05-27Made FillMatrix and FillVector functions take a configurable local workgroup ...Cedric Nugteren
2018-05-01Now stores a shared_ptr to the Program class in the cacheCedric Nugteren
2018-04-18Expressed HER2K as two HERK callsCedric Nugteren
2018-04-18Expressed SYR2K as two SYRK callsCedric Nugteren
2018-04-17Updated HERK and SYRK to follow the GEMM style and functions to make it work ...Cedric Nugteren
2018-04-15Fixed some failing tests for GEMM and batched GEMM routinesCedric Nugteren
2018-04-13Made GEMM rotation expectations kernel-specificCedric Nugteren
2018-03-15Fixed a failing TRSM test using a CPU with Apple OpenCLCedric Nugteren
2018-03-15Fixed a failing TRSV test using a CPU with Apple OpenCLCedric Nugteren
2018-02-02Implemented the XHAD Hadamard product routineCedric Nugteren
2018-01-31Created the API and stubs for the HAD (hadamard-product) routinesCedric Nugteren
2018-01-26Fixed an event synchronisation issue in the batched gemm routinesCedric Nugteren
2018-01-18Made the batched routines also chose direct/indirect kernel like the main GEM...Cedric Nugteren
2018-01-08Implemented the in-direct version of the strided-batched GEMM kernelCedric Nugteren
2018-01-07Implemented direct version of strided-batched GEMM kernelCedric Nugteren
2018-01-07Added API and tests for new GemmStridedBatched routineCedric Nugteren
2018-01-06Reduced duplicate code in the batched GEMM implementationCedric Nugteren
2018-01-06Fixed the CUDA interface: replaced nullptr with 0Cedric Nugteren
2017-12-30Added optional temp-buffer argument to C++ interface of GEMMCedric Nugteren
2017-12-28Added interface to compute the required temporary buffer size for GEMMCedric Nugteren
2017-12-28Factored out argument processing from the GEMM routineCedric Nugteren
2017-12-28Refactored GEMM code in preparation of separate temp-buffer computationCedric Nugteren
2017-12-23Split the invert kernel in two parts to prevent error C1091 in MSVC 2013Cedric Nugteren
2017-12-23Updated the database to use the new TRSV and Invert tunersCedric Nugteren
2017-12-23Added TRSV block-size tunerCedric Nugteren
2017-12-10Fixed for error C1091 in MSVC 2013Cedric Nugteren
2017-12-10Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limitCedric Nugteren
2017-11-17Moved compilation function to separate file; removed dependency of tuners of ...Cedric Nugteren
2017-11-11Factored out the creation of the OpenCL header and the program compilationCedric Nugteren
2017-11-02Integrated the GEMM routine tuner for kernel selection; added first tuning re...Cedric Nugteren
2017-10-27Fixed a bug when using the matrix A-offset argument for the TRSM routineCedric Nugteren
2017-10-27Reduced TRSM block-size for better numerical stabilityCedric Nugteren
2017-10-27Added GEMV synchronisation for the TRSV routine: similar bug as in TRSMCedric Nugteren
2017-10-25Fixed a bug in TRSM routine due to missing event synchronisations after GEMM ...Cedric Nugteren
2017-10-17Made buffers of batched routines read/write (was: read-only)Cedric Nugteren
2017-10-09Removed include of clpp11.hpp in places other than utilities.hppCedric Nugteren
2017-10-08Moved non-routine-specific API functions and includes to separate filesCedric Nugteren
2017-10-07Fixed a small typoCedric Nugteren
2017-10-03Gemm in-direct implementation now uses only 1 larger instead of max 3 optiona...Cedric Nugteren
2017-09-19Fixed type conversion warnings under MSVC 2013Cedric Nugteren
2017-08-31Fixed a bug in im2col: process only valid channel IDsCedric Nugteren
2017-08-31Fixed a bug in im2col confusing first and second workgroup size; made im2col ...Cedric Nugteren
2017-08-24Merge branch 'master' into im_to_colCedric Nugteren
2017-08-24Completed im2col implementationCedric Nugteren
2017-08-21Merge pull request #173 from mcian/PSO_paramsCedric Nugteren