summaryrefslogtreecommitdiff
path: root/src/kernels/level1
AgeCommit message (Collapse)Author
2020-03-08Made it more likely (but no guarantees) for amax/amin to return the first indexCedric Nugteren
2019-09-04Fix out-of-bounds read/write in XhadFasteretomzak
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.
2019-05-19Fixed a bug in the absolute-min index kernelCedric Nugteren
2018-10-17Fixed a bug with the pre-processing and the AXPY kernelCedric Nugteren
2018-10-15Fixed a bug in the XaxpyFaster kernel for specific parametersCedric Nugteren
2018-02-02Implemented the XHAD Hadamard product routineCedric Nugteren
2017-12-09Completed kernel modifications for pre-processor of all other kernelsCedric Nugteren
2017-12-03Added GEMM (direct and in-direct) to the pre-processor testing; modified the ↵Cedric Nugteren
loops in kernel accordingly
2017-11-29Reformatted unrollable kernel loops and added the new promote_to_registers ↵Cedric Nugteren
pragma for several kernels
2017-11-25Implemented first simple pre-processor: defines parser and loop unrolling ↵Cedric Nugteren
based on assumptions
2017-07-08Made the inline keyword in kernels optional currently only enabled for ↵Cedric Nugteren
NVIDIA and ARM GPUs
2017-05-12Added the IxAMIN routines: absolute minimum version of IxAMAXCedric Nugteren
2017-04-14Added a new Xaxpy kernel in between the regular and fast version inCedric Nugteren
2017-03-10Added proper testing of the alpha parameter; finalized the batched AXPY ↵Cedric Nugteren
implementation
2017-03-08Implemented a batched version of the AXPY kernelCedric Nugteren
2017-03-08Make batched routines based on offsets instead of a vector of cl_mem objects ↵Cedric Nugteren
- undoing many earlier changes
2016-08-20Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into ↵Cedric Nugteren
dvasschemacq-master Conflicts: src/kernels/level1/xaxpy.opencl src/kernels/level2/xgemv.opencl src/kernels/level2/xgemv_fast.opencl src/kernels/level2/xger.opencl src/kernels/level2/xher.opencl src/kernels/level2/xher2.opencl src/kernels/level3/xgemm_part2.opencl
2016-08-18Adapt opencl files for 1.1 OpenCLD. Van Assche
In OpenCL 1.1 __kernel has to be before __attribute__, at least with Vivante compiler.
2016-07-10Now passing alpha/beta to the kernel as arguments as before fp16 support; in ↵Cedric Nugteren
case of fp16 arguments are cast on host and in kernel
2016-05-14Set kernel arguments for AXPY as constant memory buffers, making it possible ↵Cedric Nugteren
to transfer half-precision values as well
2016-05-13Initial experimental version of the half-precision HAXPY routineCedric Nugteren
2016-05-08Fixed errors in xAXPY and xSCAL tests on AMD hardwarecnugteren
2016-04-30Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAXCedric Nugteren
2016-04-27Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM ↵Cedric Nugteren
and IxAMAX
2016-04-20Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routinescnugteren
2016-04-14Added support for the SASUM/DASUM/ScASUM/DzASUM routinescnugteren
2016-03-30Fixed the nrm2 kernel for complex data-typescnugteren
2016-03-28Added preliminary support for the xNRM2 routinesCedric Nugteren
2015-09-14Added xDOT/xDOTU/xDOTC dot-product routinesCNugteren
2015-08-22Added the XSWAP, XSCAL and XCOPY level-1 routinesCNugteren
2015-08-22Re-organized level1 xaxpy kernelCNugteren