summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-29Removed complex numbers support for CONVGEMMCedric Nugteren
2018-07-29Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-07-28Merge pull request #304 from ↵Cedric Nugteren
CNugteren/CLBlast-300-fix-staggered-indices-AMD-GEMMK1 Fix staggered indices on AMD GPUs for GEMMK == 1 kernel
2018-07-28Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 ↵Cedric Nugteren
kernels to improve performance
2018-07-27Fixed an issue with AMD GPUs and the new GEMMK == 1 kernelCedric Nugteren
2018-07-25Added code to report the average tuning resultsCedric Nugteren
2018-07-23Merge pull request #297 from tyler-utah/masterCedric Nugteren
inline PTX to support subgroup shuffle for Nvidia GPUs
2018-07-16moved a two-line macro to a single lineTyler Sorensen
2018-07-14forgot to add test cases back in, oopsTyler Sorensen
2018-07-14Applied feedback from Cedric from first pull requestTyler Sorensen
2018-07-14Updated to CLBlast version 1.4.1Cedric Nugteren
2018-07-13Added tuning results for Intel i5-4970SCedric Nugteren
2018-07-13Added device-name removal code to handle POCL naming conventionCedric Nugteren
2018-07-13Added tuning results for GeForce GTX 1070 TiCedric Nugteren
2018-07-13Added tuning results for HD Graphics 6000 Broadwell GT3Cedric Nugteren
2018-07-11restored some of the changed tuning files for xgemmTyler Sorensen
2018-07-11added inline ptx to support shuffle on Nvidia GPUsTyler Sorensen
2018-07-06Updated changelogCedric Nugteren
2018-07-06Merge pull request #296 from alycm/CLBlast-291-eliminate-temporary-programCedric Nugteren
Eliminate a temporary Program object
2018-07-06Eliminate a temporary Program objectAlastair Murray
This was causing a crash for me because the temporary Program destructor called clReleaseProgram on the cl_program with Program, and then clBuildProgram was called on the same cl_program (belonging to the Program owned by the shared_ptr, but it's the same cl_program).
2018-06-28Merge pull request #295 from CNugteren/CLBlast-292-no-cl-program-release-windowsCedric Nugteren
Disabled calls to clReleaseProgram under Windows
2018-06-28Disabled calls to clReleaseProgram under Windows to avoid segfaults when the ↵Cedric Nugteren
OpenCL driver unloads first
2018-06-03Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-06-03Updated to CLBlast version 1.4.0Cedric Nugteren
2018-06-03Added list of tuners to be run by 'alltuners' targetCedric Nugteren
2018-06-03Fixes for CUDA version of CLBlastCedric Nugteren
2018-06-02Added MKL as an alternative for CBLAS for correctness and performance ↵Cedric Nugteren
comparisons
2018-06-01Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when ↵Cedric Nugteren
barriers are present
2018-05-31Added error-checking for half-empty local work group sizes; fixed a minor ↵Cedric Nugteren
TRSV global worksize issue
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-05-30Widened Apple OpenCL check, added way to debug too-large-workgroups issueCedric Nugteren
2018-05-29Added Apple OpenCL TRSV block size override; removed failing old Intel GPU ↵Cedric Nugteren
test from README
2018-05-27Merge pull request #287 from CNugteren/apple-opencl-limitations-fixesCedric Nugteren
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27Merge pull request #286 from CNugteren/runtime_statistics_in_clientCedric Nugteren
Runtime statistics in client
2018-05-27Added a check to return 'NotImplemented' error code in case of systems with ↵Cedric Nugteren
< 16 LWGS for TSRV and TRSM
2018-05-27Made FillMatrix and FillVector functions take a configurable local workgroup ↵Cedric Nugteren
size
2018-05-27Added maximum time reporting to the client statisticsCedric Nugteren
2018-05-23Added an option in the clients to output timing statistics: minimum, mean, ↵Cedric Nugteren
and standard-deviation
2018-05-21Further implemented single-kernel approach of convgemm; extended test to ↵Cedric Nugteren
capture other parts of the kernel code
2018-05-21Added method selection option to switch between im2col and single-kernel ↵Cedric Nugteren
approach for convgemm
2018-05-19Merge pull request #285 from CNugteren/size_specific_routine_tunerCedric Nugteren
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19Moved new convgemm kernel to levelx kernel folderCedric Nugteren
2018-05-19Second version of direct reading from image tensor for convgemm: also with ↵Cedric Nugteren
local memory support now
2018-05-19Merge branch 'master' into CLBlast-267-convgemmCedric Nugteren
2018-05-19Added an option to run the routine tuner for a single specific GEMM sizeCedric Nugteren
2018-05-19Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_diskCedric Nugteren
Routine tuners read kernel JSON from disk
2018-05-19Fixed compilation issuesCedric Nugteren
2018-05-19The GEMM routine tuner now loads kernel JSON tuning results from disk if ↵Cedric Nugteren
available; now run part of alltuners target
2018-05-19Fixed a bug in loading xgemm-direct JSON data from diskCedric Nugteren
2018-05-18Merge pull request #283 from CNugteren/canary_buffer_overflow_protectionCedric Nugteren
Canary buffer overflow protection