index
:
debian-clblast
debian/sid
upstream/latest
Debian package for CLBlast.
gspr@nonempty.org
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
src
Age
Commit message (
Expand
)
Author
2017-05-11
Re-added random tuning for GEMM after accidental removal
Cedric Nugteren
2017-04-23
Re-added Titan X (Pascal) tuning results based on more averaging when tuning
Cedric Nugteren
2017-04-22
Increased the default number of runs for the tuner from 2 up to 10 for fast k...
Cedric Nugteren
2017-04-22
Fixed the direct vs indirect setting for NVIDIA GPUs
Cedric Nugteren
2017-04-21
Increased the default number of runs for GEMV tuning; updated GEMV tuning res...
Cedric Nugteren
2017-04-20
Tuned the direct versus indirect GEMM kernel trade-off point for NVIDIA GPUs
Cedric Nugteren
2017-04-17
Fixed a namespace clash with CUDA FP16 for the half-datatype
Cedric Nugteren
2017-04-16
Merge branch 'development' into benchmarking
Cedric Nugteren
2017-04-14
Added a new Xaxpy kernel in between the regular and fast version in
Cedric Nugteren
2017-04-13
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now w...
Cedric Nugteren
2017-04-10
Merge branch 'development' into cublas_reference
Cedric Nugteren
2017-04-10
Fixed a compilation issue under MSVC and GCC
Cedric Nugteren
2017-04-10
Removed const-vector-of-const-objects from the database class to remain accor...
Cedric Nugteren
2017-04-07
Added a special override database for the Apple CPU implementation on OS X: t...
Cedric Nugteren
2017-04-07
Uses float2 and double2 for base complex data-types instead of a custom struc...
Cedric Nugteren
2017-04-07
Added some missing const-ness
Cedric Nugteren
2017-04-02
Layed the groundwork for cuBLAS comparisons in the clients
Cedric Nugteren
2017-04-01
Separated host-device and device-host memory copies from execution of the CBL...
Cedric Nugteren
2017-03-19
Added an (optional) non-direct implementation of the batched GEMM routine
Cedric Nugteren
2017-03-19
Added batched versions of the pad/copy/transpose kernels
Cedric Nugteren
2017-03-14
Added the possibility to tune batched kernels
Cedric Nugteren
2017-03-11
Added initial naive version of the batched GEMM routine based on the direct G...
Cedric Nugteren
2017-03-10
Added API and test infrastructure for the batched GEMM routine
Cedric Nugteren
2017-03-10
Added proper testing of the alpha parameter; finalized the batched AXPY imple...
Cedric Nugteren
2017-03-10
Fixed a small compilation bug for MSVC related to a floating-point constant
Cedric Nugteren
2017-03-08
Implemented a batched version of the AXPY kernel
Cedric Nugteren
2017-03-08
Make batched routines based on offsets instead of a vector of cl_mem objects ...
Cedric Nugteren
2017-03-05
Minor fixes to the client w.r.t. the addition of the batch count
Cedric Nugteren
2017-03-05
Added first naive version of the batched AXPY routine
Cedric Nugteren
2017-03-05
Adjusted the test-infrastructure to support testing of batched-versions of ro...
Cedric Nugteren
2017-03-05
Changed the way the test-data is generated: now using a single MT generator a...
Cedric Nugteren
2017-03-05
Prepared generator for batched routines; added batched AXPY routine interface
Cedric Nugteren
2017-03-04
Added tuning results for the Radeon HD6750M GPU (Apple OpenCL)
Cedric Nugteren
2017-03-04
Added a proper data-preparation function for the TRSM tests
Cedric Nugteren
2017-03-01
Added proper support for the b_offset argument in TRSM
Cedric Nugteren
2017-02-27
Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorr...
Cedric Nugteren
2017-02-26
Split the GEMM kernel further up to prevent C1091 in MSVC
Cedric Nugteren
2017-02-26
Merge branch 'development' into triangular_solvers
Cedric Nugteren
2017-02-26
Fixed an out-of-bounds memory access when filling a matrix with a constant
Cedric Nugteren
2017-02-26
Removed half-precision support from the TRSM routine; too unstable
Cedric Nugteren
2017-02-26
Fixes division in the kernel for inversion of complex numbers
Cedric Nugteren
2017-02-25
Added PrepareData function for TRSM to create proper test input
Cedric Nugteren
2017-02-24
Implemented a simple row-major to col-major problem conversion for TRSM
Cedric Nugteren
2017-02-22
Fixed a few issues with the TRSM routine; some tests still failing
Cedric Nugteren
2017-02-19
Added data-preparation function for the TRSV tests and special nan/inf checks...
Cedric Nugteren
2017-02-18
Added tuning parameters for the AMD RX480 GPU (Ellesmere)
Cedric Nugteren
2017-02-18
Fixed the naming of the C API of OverrideParameters and fixed the description
Cedric Nugteren
2017-02-16
Added a C interface to the OverrideParameters function; added some in-line co...
Cedric Nugteren
2017-02-16
Added input-sanity checks for the OverrideParameters function
Cedric Nugteren
2017-02-13
Added first version of the OverrideParameters function
Cedric Nugteren
[prev]
[next]