Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-03-08 | Made it more likely (but no guarantees) for amax/amin to return the first index | Cedric Nugteren | |
2019-09-04 | Fix out-of-bounds read/write in XhadFaster | etomzak | |
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd. | |||
2019-05-19 | Fixed a bug in the absolute-min index kernel | Cedric Nugteren | |
2018-10-17 | Fixed a bug with the pre-processing and the AXPY kernel | Cedric Nugteren | |
2018-10-15 | Fixed a bug in the XaxpyFaster kernel for specific parameters | Cedric Nugteren | |
2018-02-02 | Implemented the XHAD Hadamard product routine | Cedric Nugteren | |
2017-12-09 | Completed kernel modifications for pre-processor of all other kernels | Cedric Nugteren | |
2017-12-03 | Added GEMM (direct and in-direct) to the pre-processor testing; modified the ↵ | Cedric Nugteren | |
loops in kernel accordingly | |||
2017-11-29 | Reformatted unrollable kernel loops and added the new promote_to_registers ↵ | Cedric Nugteren | |
pragma for several kernels | |||
2017-11-25 | Implemented first simple pre-processor: defines parser and loop unrolling ↵ | Cedric Nugteren | |
based on assumptions | |||
2017-07-08 | Made the inline keyword in kernels optional currently only enabled for ↵ | Cedric Nugteren | |
NVIDIA and ARM GPUs | |||
2017-05-12 | Added the IxAMIN routines: absolute minimum version of IxAMAX | Cedric Nugteren | |
2017-04-14 | Added a new Xaxpy kernel in between the regular and fast version in | Cedric Nugteren | |
2017-03-10 | Added proper testing of the alpha parameter; finalized the batched AXPY ↵ | Cedric Nugteren | |
implementation | |||
2017-03-08 | Implemented a batched version of the AXPY kernel | Cedric Nugteren | |
2017-03-08 | Make batched routines based on offsets instead of a vector of cl_mem objects ↵ | Cedric Nugteren | |
- undoing many earlier changes | |||
2016-08-20 | Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into ↵ | Cedric Nugteren | |
dvasschemacq-master Conflicts: src/kernels/level1/xaxpy.opencl src/kernels/level2/xgemv.opencl src/kernels/level2/xgemv_fast.opencl src/kernels/level2/xger.opencl src/kernels/level2/xher.opencl src/kernels/level2/xher2.opencl src/kernels/level3/xgemm_part2.opencl | |||
2016-08-18 | Adapt opencl files for 1.1 OpenCL | D. Van Assche | |
In OpenCL 1.1 __kernel has to be before __attribute__, at least with Vivante compiler. | |||
2016-07-10 | Now passing alpha/beta to the kernel as arguments as before fp16 support; in ↵ | Cedric Nugteren | |
case of fp16 arguments are cast on host and in kernel | |||
2016-05-14 | Set kernel arguments for AXPY as constant memory buffers, making it possible ↵ | Cedric Nugteren | |
to transfer half-precision values as well | |||
2016-05-13 | Initial experimental version of the half-precision HAXPY routine | Cedric Nugteren | |
2016-05-08 | Fixed errors in xAXPY and xSCAL tests on AMD hardware | cnugteren | |
2016-04-30 | Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX | Cedric Nugteren | |
2016-04-27 | Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM ↵ | Cedric Nugteren | |
and IxAMAX | |||
2016-04-20 | Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines | cnugteren | |
2016-04-14 | Added support for the SASUM/DASUM/ScASUM/DzASUM routines | cnugteren | |
2016-03-30 | Fixed the nrm2 kernel for complex data-types | cnugteren | |
2016-03-28 | Added preliminary support for the xNRM2 routines | Cedric Nugteren | |
2015-09-14 | Added xDOT/xDOTU/xDOTC dot-product routines | CNugteren | |
2015-08-22 | Added the XSWAP, XSCAL and XCOPY level-1 routines | CNugteren | |
2015-08-22 | Re-organized level1 xaxpy kernel | CNugteren | |