summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2018-06-03Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-06-03Fixes for CUDA version of CLBlastCedric Nugteren
2018-06-01Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when bar...Cedric Nugteren
2018-05-31Added error-checking for half-empty local work group sizes; fixed a minor TRS...Cedric Nugteren
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-05-30Widened Apple OpenCL check, added way to debug too-large-workgroups issueCedric Nugteren
2018-05-29Added Apple OpenCL TRSV block size override; removed failing old Intel GPU te...Cedric Nugteren
2018-05-27Merge pull request #287 from CNugteren/apple-opencl-limitations-fixesCedric Nugteren
2018-05-27Added a check to return 'NotImplemented' error code in case of systems with <...Cedric Nugteren
2018-05-27Made FillMatrix and FillVector functions take a configurable local workgroup ...Cedric Nugteren
2018-05-23Added an option in the clients to output timing statistics: minimum, mean, an...Cedric Nugteren
2018-05-21Further implemented single-kernel approach of convgemm; extended test to capt...Cedric Nugteren
2018-05-21Added method selection option to switch between im2col and single-kernel appr...Cedric Nugteren
2018-05-19Moved new convgemm kernel to levelx kernel folderCedric Nugteren
2018-05-19Second version of direct reading from image tensor for convgemm: also with lo...Cedric Nugteren
2018-05-19Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-05-19Added an option to run the routine tuner for a single specific GEMM sizeCedric Nugteren
2018-05-19Fixed compilation issuesCedric Nugteren
2018-05-19The GEMM routine tuner now loads kernel JSON tuning results from disk if avai...Cedric Nugteren
2018-05-18Merge branch 'master' into canary_buffer_overflow_protectionCedric Nugteren
2018-05-17Added a canary region for overflow detection to the tunersCedric Nugteren
2018-05-17First version of direct reading from image tensor for convgemm: only for edge...Cedric Nugteren
2018-05-13Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm...Cedric Nugteren
2018-05-13Plugged in the code of strided-batched-gemm into convgemm in preparation of a...Cedric Nugteren
2018-05-09Changed temporary convgemm implementation to use batched-strided GEMMCedric Nugteren
2018-05-09Implemented convolution as im2col + GEMMCedric Nugteren
2018-05-06Added convgemm skeleton, test infrastructure, and first reference implementationCedric Nugteren
2018-05-05Added interface of batched convolution as GEMMCedric Nugteren
2018-05-01Now stores a shared_ptr to the Program class in the cacheCedric Nugteren
2018-04-29Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroupsCedric Nugteren
2018-04-26Fixed an access violation when compiled with Visual Studio upon releasing the...Cedric Nugteren
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-24Added a define to enable subgroup shuffling if supported by the deviceCedric Nugteren
2018-04-20Fixes for the CUDA APICedric Nugteren
2018-04-18Expressed HER2K as two HERK callsCedric Nugteren
2018-04-18Expressed SYR2K as two SYRK callsCedric Nugteren
2018-04-17Updated HERK and SYRK to follow the GEMM style and functions to make it work ...Cedric Nugteren
2018-04-15Fixed some failing tests for GEMM and batched GEMM routinesCedric Nugteren
2018-04-15Updated tuning results for the Skylake ULT GT2 GPU with the new kernelCedric Nugteren
2018-04-13Made GEMM rotation expectations kernel-specificCedric Nugteren
2018-04-10Updated database with defaults of GEMMK=0 and KREG=1Cedric Nugteren
2018-04-08Extended the maximum number of tuning parameters from 14 to 16Cedric Nugteren
2018-04-08Fixed issues with the pre-processorCedric Nugteren
2018-04-07Merge branch 'master' into CLBlast-228-2d-register-gemm-kernelCedric Nugteren
2018-04-07Added tuning results for NVIDIA GeForce 970Cedric Nugteren
2018-04-07Added tuning results for NVIDIA GeForce 920MXCedric Nugteren
2018-04-07Added tuning results for Intel HD Graphics 620Cedric Nugteren
2018-04-07Extended the GEMM tuner to be able to tune the new 'kernel 1'Cedric Nugteren
2018-04-07Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-06Fixed a compilation issue for complex datatypes and vloadCedric Nugteren