summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-03Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-06-03Updated to CLBlast version 1.4.0Cedric Nugteren
2018-06-03Added list of tuners to be run by 'alltuners' targetCedric Nugteren
2018-06-03Fixes for CUDA version of CLBlastCedric Nugteren
2018-06-02Added MKL as an alternative for CBLAS for correctness and performance ↵Cedric Nugteren
comparisons
2018-06-01Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when ↵Cedric Nugteren
barriers are present
2018-05-31Added error-checking for half-empty local work group sizes; fixed a minor ↵Cedric Nugteren
TRSV global worksize issue
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-05-30Widened Apple OpenCL check, added way to debug too-large-workgroups issueCedric Nugteren
2018-05-29Added Apple OpenCL TRSV block size override; removed failing old Intel GPU ↵Cedric Nugteren
test from README
2018-05-27Merge pull request #287 from CNugteren/apple-opencl-limitations-fixesCedric Nugteren
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27Merge pull request #286 from CNugteren/runtime_statistics_in_clientCedric Nugteren
Runtime statistics in client
2018-05-27Added a check to return 'NotImplemented' error code in case of systems with ↵Cedric Nugteren
< 16 LWGS for TSRV and TRSM
2018-05-27Made FillMatrix and FillVector functions take a configurable local workgroup ↵Cedric Nugteren
size
2018-05-27Added maximum time reporting to the client statisticsCedric Nugteren
2018-05-23Added an option in the clients to output timing statistics: minimum, mean, ↵Cedric Nugteren
and standard-deviation
2018-05-21Further implemented single-kernel approach of convgemm; extended test to ↵Cedric Nugteren
capture other parts of the kernel code
2018-05-21Added method selection option to switch between im2col and single-kernel ↵Cedric Nugteren
approach for convgemm
2018-05-19Merge pull request #285 from CNugteren/size_specific_routine_tunerCedric Nugteren
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19Moved new convgemm kernel to levelx kernel folderCedric Nugteren
2018-05-19Second version of direct reading from image tensor for convgemm: also with ↵Cedric Nugteren
local memory support now
2018-05-19Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-05-19Added an option to run the routine tuner for a single specific GEMM sizeCedric Nugteren
2018-05-19Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_diskCedric Nugteren
Routine tuners read kernel JSON from disk
2018-05-19Fixed compilation issuesCedric Nugteren
2018-05-19The GEMM routine tuner now loads kernel JSON tuning results from disk if ↵Cedric Nugteren
available; now run part of alltuners target
2018-05-19Fixed a bug in loading xgemm-direct JSON data from diskCedric Nugteren
2018-05-18Merge pull request #283 from CNugteren/canary_buffer_overflow_protectionCedric Nugteren
Canary buffer overflow protection
2018-05-18Merge branch 'master' into canary_buffer_overflow_protectionCedric Nugteren
2018-05-17Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvementsCedric Nugteren
Better cache behaviour of OpenCL programs
2018-05-17Updated the roadmapCedric Nugteren
2018-05-17Updated README with IWOCL talk and GPU zoo acknowledgmentCedric Nugteren
2018-05-17Added documentation on some details of the GEMM implementationCedric Nugteren
2018-05-17Fixed a few issues with canary region testingCedric Nugteren
2018-05-17Added a canary region for overflow detection to the correctness testsCedric Nugteren
2018-05-17Added a canary region for overflow detection to the tunersCedric Nugteren
2018-05-17First version of direct reading from image tensor for convgemm: only for ↵Cedric Nugteren
edge cases now
2018-05-13Created a dedicated convgemm GEMM kernel as a copy of the batched direct ↵Cedric Nugteren
gemm kernel
2018-05-13Plugged in the code of strided-batched-gemm into convgemm in preparation of ↵Cedric Nugteren
a new kernel
2018-05-09Changed temporary convgemm implementation to use batched-strided GEMMCedric Nugteren
2018-05-09Fixed the performance client for convgemm and added GFLOPS measurementsCedric Nugteren
2018-05-09Merge pull request #279 from umar456/ci_linksCedric Nugteren
Update ci links to use doman names and build names instead of IP/id
2018-05-09Updated the documentation for convgemm to include data layout (NCHW)Cedric Nugteren
2018-05-09Implemented convolution as im2col + GEMMCedric Nugteren
2018-05-09Split channels/strides testing values off from kernel sizes for more flexibilityCedric Nugteren
2018-05-08Update ci links to use doman names and build names instead of IP/idUmar Arshad
Updates the README badges to point to the domain name instead of IP addresses. Also updates the names of the builds to the name of the build instead of the id of the build.
2018-05-06Added convgemm skeleton, test infrastructure, and first reference implementationCedric Nugteren
2018-05-05Added interface of batched convolution as GEMMCedric Nugteren
2018-05-01Updated README with new badges and paper citationCedric Nugteren
2018-05-01Now stores a shared_ptr to the Program class in the cacheCedric Nugteren