summaryrefslogtreecommitdiff
path: root/src/kernels/level3
AgeCommit message (Collapse)Author
2023-01-17Updated according to feedback from CNugterenAngus, Alexander
2023-01-03implemented changes to boost Adreno performance according to ↵Angus, Alexander
https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2019-05-08Changed back to cl_intel_subgroups as suggestedCedric Nugteren
2019-05-07Enabled avc_motion_estimation extension for Intel subgroup shufflingCedric Nugteren
2018-10-10Fixed pre-processor warnings related to the subgroup shufflingCedric Nugteren
2018-09-15Fixed an MSVC compilation error due to large stringsCedric Nugteren
2018-09-15Fixed issues with GEMMK=1 kernel and the pre-processorCedric Nugteren
2018-07-27Fixed an issue with AMD GPUs and the new GEMMK == 1 kernelCedric Nugteren
2018-07-16moved a two-line macro to a single lineTyler Sorensen
2018-07-14Applied feedback from Cedric from first pull requestTyler Sorensen
2018-07-11added inline ptx to support shuffle on Nvidia GPUsTyler Sorensen
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-08Fixed issues with the pre-processorCedric Nugteren
2018-04-07Extended the GEMM tuner to be able to tune the new 'kernel 1'Cedric Nugteren
2018-04-07Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-06Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-03Added first version of 2D register tiling kernel with A and C transposed as wellCedric Nugteren
2018-03-23Removed arrays as function argument from GEMM kernels for Vivante OpenCL ↵Cedric Nugteren
compiler
2018-03-15Fixed a failing TRSM test using a CPU with Apple OpenCLCedric Nugteren
2018-01-08Implemented the in-direct version of the strided-batched GEMM kernelCedric Nugteren
2018-01-07Implemented direct version of strided-batched GEMM kernelCedric Nugteren
2017-12-31Revert "Added options to disable parts of the invert kernel to find out ↵Cedric Nugteren
where the AMD compiler crashes" This reverts commit 407ed52cec41445f02e85cb45d08f590960216bb.
2017-12-31Changed the invert kernel slightly; added part1a/part1b disable-definesCedric Nugteren
2017-12-30Fixed ifdef's into ifndef'sCedric Nugteren
2017-12-30Added options to disable parts of the invert kernel to find out where the ↵Cedric Nugteren
AMD compiler crashes
2017-12-27Simplified invert kernel a littleCedric Nugteren
2017-12-23Split the invert kernel in two parts to prevent error C1091 in MSVC 2013Cedric Nugteren
2017-12-19Added skeleton for a tuner for the invert kernelCedric Nugteren
2017-12-10Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limitCedric Nugteren
2017-12-09Completed kernel modifications for pre-processor of all other kernelsCedric Nugteren
2017-12-09Modified the direct GEMM kernel to support array-to-register promotionCedric Nugteren
2017-12-09Reformatted GEMM kernel to support array-to-register promotionCedric Nugteren
2017-12-09Fixed defines parsing and substituting in pre-processor; fixed some variable ↵Cedric Nugteren
names in kernels
2017-12-07Added register promotion to the main GEMM kernelCedric Nugteren
2017-12-03Added GEMM (direct and in-direct) to the pre-processor testing; modified the ↵Cedric Nugteren
loops in kernel accordingly
2017-12-03Reformated transpose kernels for the pre-processor; extended the amount of testsCedric Nugteren
2017-11-29Reformatted unrollable kernel loops and added the new promote_to_registers ↵Cedric Nugteren
pragma for several kernels
2017-10-14Fixed a kernel/attribute order bug in the direct GEMM kernelsCedric Nugteren
2017-10-14Make local memory pointers a define in OpenCL; some fixes to the recently ↵Cedric Nugteren
changed transpose kernel code
2017-10-14Made transpose kernel struct init proper according to the C standardCedric Nugteren
2017-10-03Gemm in-direct implementation now uses only 1 larger instead of max 3 ↵Cedric Nugteren
optional temporary buffers
2017-07-08Made the inline keyword in kernels optional currently only enabled for ↵Cedric Nugteren
NVIDIA and ARM GPUs
2017-06-30Fixed an if-statement in the direct GEMM kernel causing a bug with specific ↵Cedric Nugteren
sets of input parameters
2017-05-14Fixed a missing synchronization barrier in the invert kernel; fixes TRSM testsCedric Nugteren
2017-03-19Added an (optional) non-direct implementation of the batched GEMM routineCedric Nugteren
2017-03-19Added batched versions of the pad/copy/transpose kernelsCedric Nugteren
2017-03-11Added initial naive version of the batched GEMM routine based on the direct ↵Cedric Nugteren
GEMM kernel
2017-03-04Added a proper data-preparation function for the TRSM testsCedric Nugteren
2017-02-26Fixed an out-of-bounds memory access when filling a matrix with a constantCedric Nugteren