summaryrefslogtreecommitdiff
path: root/src/kernels
AgeCommit message (Expand)Author
2018-01-07Implemented direct version of strided-batched GEMM kernelCedric Nugteren
2017-12-31Revert "Added options to disable parts of the invert kernel to find out where...Cedric Nugteren
2017-12-31Changed the invert kernel slightly; added part1a/part1b disable-definesCedric Nugteren
2017-12-30Fixed ifdef's into ifndef'sCedric Nugteren
2017-12-30Added options to disable parts of the invert kernel to find out where the AMD...Cedric Nugteren
2017-12-27Simplified invert kernel a littleCedric Nugteren
2017-12-23Split the invert kernel in two parts to prevent error C1091 in MSVC 2013Cedric Nugteren
2017-12-19Added skeleton for a tuner for the invert kernelCedric Nugteren
2017-12-10Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limitCedric Nugteren
2017-12-09Completed kernel modifications for pre-processor of all other kernelsCedric Nugteren
2017-12-09Modified the direct GEMM kernel to support array-to-register promotionCedric Nugteren
2017-12-09Reformatted GEMM kernel to support array-to-register promotionCedric Nugteren
2017-12-09Fixed defines parsing and substituting in pre-processor; fixed some variable ...Cedric Nugteren
2017-12-07Added register promotion to the main GEMM kernelCedric Nugteren
2017-12-05Improved array-to-register promotion, now handling function calls as wellCedric Nugteren
2017-12-03Added GEMM (direct and in-direct) to the pre-processor testing; modified the ...Cedric Nugteren
2017-12-03Added basic bracket parsing in defines and loop expressionsCedric Nugteren
2017-12-03Reformated transpose kernels for the pre-processor; extended the amount of testsCedric Nugteren
2017-11-29Reformatted unrollable kernel loops and added the new promote_to_registers pr...Cedric Nugteren
2017-11-25Implemented first simple pre-processor: defines parser and loop unrolling bas...Cedric Nugteren
2017-10-17CUDA kernel compilation fixesCedric Nugteren
2017-10-15Added a missing OpenCL-to-CUDA function translationCedric Nugteren
2017-10-15Fixes for the CUDA API: first tests pass and the client runsCedric Nugteren
2017-10-14Fixed a kernel/attribute order bug in the direct GEMM kernelsCedric Nugteren
2017-10-14Make local memory pointers a define in OpenCL; some fixes to the recently cha...Cedric Nugteren
2017-10-14Made transpose kernel struct init proper according to the C standardCedric Nugteren
2017-10-14Fixed several (not all) CUDA kernel compilation issuesCedric Nugteren
2017-10-14Various fixes to make the host code and sample compile with the CUDA APICedric Nugteren
2017-10-14Added OpenCL to CUDA translation header for the kernelsCedric Nugteren
2017-10-03Gemm in-direct implementation now uses only 1 larger instead of max 3 optiona...Cedric Nugteren
2017-09-05Fixed a modulo and division issue manifesting on Apple OpenCL for im2colCedric Nugteren
2017-08-31Fixed a bug in im2col: process only valid channel IDsCedric Nugteren
2017-08-31Fixed a bug in im2col confusing first and second workgroup size; made im2col ...Cedric Nugteren
2017-08-24Completed im2col implementationCedric Nugteren
2017-08-19First version of im2col kernel, unoptimized but workingCedric Nugteren
2017-07-08Made the inline keyword in kernels optional currently only enabled for NVIDIA...Cedric Nugteren
2017-06-30Fixed an if-statement in the direct GEMM kernel causing a bug with specific s...Cedric Nugteren
2017-05-14Fixed a missing synchronization barrier in the invert kernel; fixes TRSM testsCedric Nugteren
2017-05-12Added the IxAMIN routines: absolute minimum version of IxAMAXCedric Nugteren
2017-04-14Added a new Xaxpy kernel in between the regular and fast version inCedric Nugteren
2017-04-07Uses float2 and double2 for base complex data-types instead of a custom struc...Cedric Nugteren
2017-03-19Added an (optional) non-direct implementation of the batched GEMM routineCedric Nugteren
2017-03-19Added batched versions of the pad/copy/transpose kernelsCedric Nugteren
2017-03-11Added initial naive version of the batched GEMM routine based on the direct G...Cedric Nugteren
2017-03-10Added proper testing of the alpha parameter; finalized the batched AXPY imple...Cedric Nugteren
2017-03-08Implemented a batched version of the AXPY kernelCedric Nugteren
2017-03-08Make batched routines based on offsets instead of a vector of cl_mem objects ...Cedric Nugteren
2017-03-04Added a proper data-preparation function for the TRSM testsCedric Nugteren
2017-02-26Fixed an out-of-bounds memory access when filling a matrix with a constantCedric Nugteren
2017-02-26Fixes division in the kernel for inversion of complex numbersCedric Nugteren