index
:
debian-clblast
debian/sid
upstream/latest
Debian package for CLBlast.
gspr@nonempty.org
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
src
/
routines
Age
Commit message (
Expand
)
Author
2018-01-26
Fixed an event synchronisation issue in the batched gemm routines
Cedric Nugteren
2018-01-18
Made the batched routines also chose direct/indirect kernel like the main GEM...
Cedric Nugteren
2018-01-08
Implemented the in-direct version of the strided-batched GEMM kernel
Cedric Nugteren
2018-01-07
Implemented direct version of strided-batched GEMM kernel
Cedric Nugteren
2018-01-07
Added API and tests for new GemmStridedBatched routine
Cedric Nugteren
2018-01-06
Reduced duplicate code in the batched GEMM implementation
Cedric Nugteren
2018-01-06
Fixed the CUDA interface: replaced nullptr with 0
Cedric Nugteren
2017-12-30
Added optional temp-buffer argument to C++ interface of GEMM
Cedric Nugteren
2017-12-28
Added interface to compute the required temporary buffer size for GEMM
Cedric Nugteren
2017-12-28
Factored out argument processing from the GEMM routine
Cedric Nugteren
2017-12-28
Refactored GEMM code in preparation of separate temp-buffer computation
Cedric Nugteren
2017-12-23
Split the invert kernel in two parts to prevent error C1091 in MSVC 2013
Cedric Nugteren
2017-12-23
Updated the database to use the new TRSV and Invert tuners
Cedric Nugteren
2017-12-23
Added TRSV block-size tuner
Cedric Nugteren
2017-12-10
Fixed for error C1091 in MSVC 2013
Cedric Nugteren
2017-12-10
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
Cedric Nugteren
2017-11-17
Moved compilation function to separate file; removed dependency of tuners of ...
Cedric Nugteren
2017-11-11
Factored out the creation of the OpenCL header and the program compilation
Cedric Nugteren
2017-11-02
Integrated the GEMM routine tuner for kernel selection; added first tuning re...
Cedric Nugteren
2017-10-27
Fixed a bug when using the matrix A-offset argument for the TRSM routine
Cedric Nugteren
2017-10-27
Reduced TRSM block-size for better numerical stability
Cedric Nugteren
2017-10-27
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
Cedric Nugteren
2017-10-25
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM ...
Cedric Nugteren
2017-10-17
Made buffers of batched routines read/write (was: read-only)
Cedric Nugteren
2017-10-09
Removed include of clpp11.hpp in places other than utilities.hpp
Cedric Nugteren
2017-10-08
Moved non-routine-specific API functions and includes to separate files
Cedric Nugteren
2017-10-07
Fixed a small typo
Cedric Nugteren
2017-10-03
Gemm in-direct implementation now uses only 1 larger instead of max 3 optiona...
Cedric Nugteren
2017-09-19
Fixed type conversion warnings under MSVC 2013
Cedric Nugteren
2017-08-31
Fixed a bug in im2col: process only valid channel IDs
Cedric Nugteren
2017-08-31
Fixed a bug in im2col confusing first and second workgroup size; made im2col ...
Cedric Nugteren
2017-08-24
Merge branch 'master' into im_to_col
Cedric Nugteren
2017-08-24
Completed im2col implementation
Cedric Nugteren
2017-08-21
Merge pull request #173 from mcian/PSO_params
Cedric Nugteren
2017-08-19
First version of im2col kernel, unoptimized but working
Cedric Nugteren
2017-08-12
Merge branch 'master' into im_to_col
Cedric Nugteren
2017-08-12
Moved functions from the header to the .cpp file to prevent compiling the sam...
Cedric Nugteren
2017-08-09
Use cltune::SearchMethod enum instead of int values
mcian
2017-07-31
Restore direct GEMM to previous version
mcian
2017-07-25
Minor optimization for the direct GEMM kernel: don't ceil m and n unnecessari...
Cedric Nugteren
2017-07-12
Relaxed requirement on a_ld and b_ld for batched GEMM
Cedric Nugteren
2017-07-02
Added interface and stubs for the im2col routine
Cedric Nugteren
2017-06-18
Fixed an overflow bug on 32-bit systems when chosing a GEMM kernel
Cedric Nugteren
2017-05-15
Fixed an TRSM issue caused by incorrect block size calculation
Cedric Nugteren
2017-05-12
Added the IxAMIN routines: absolute minimum version of IxAMAX
Cedric Nugteren
2017-05-12
Fixed a bug in the TRSM routine; tests now pass
Cedric Nugteren
2017-04-14
Added a new Xaxpy kernel in between the regular and fast version in
Cedric Nugteren
2017-04-07
Added some missing const-ness
Cedric Nugteren
2017-03-19
Added an (optional) non-direct implementation of the batched GEMM routine
Cedric Nugteren
2017-03-19
Added batched versions of the pad/copy/transpose kernels
Cedric Nugteren
[next]