summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-13Added device-name removal code to handle POCL naming conventionCedric Nugteren
2018-07-13Added tuning results for GeForce GTX 1070 TiCedric Nugteren
2018-07-13Added tuning results for HD Graphics 6000 Broadwell GT3Cedric Nugteren
2018-07-06Updated changelogCedric Nugteren
2018-07-06Merge pull request #296 from alycm/CLBlast-291-eliminate-temporary-programCedric Nugteren
Eliminate a temporary Program object
2018-07-06Eliminate a temporary Program objectAlastair Murray
This was causing a crash for me because the temporary Program destructor called clReleaseProgram on the cl_program with Program, and then clBuildProgram was called on the same cl_program (belonging to the Program owned by the shared_ptr, but it's the same cl_program).
2018-06-28Merge pull request #295 from CNugteren/CLBlast-292-no-cl-program-release-windowsCedric Nugteren
Disabled calls to clReleaseProgram under Windows
2018-06-28Disabled calls to clReleaseProgram under Windows to avoid segfaults when the ↵Cedric Nugteren
OpenCL driver unloads first
2018-06-03Updated to CLBlast version 1.4.0Cedric Nugteren
2018-06-03Added list of tuners to be run by 'alltuners' targetCedric Nugteren
2018-06-03Fixes for CUDA version of CLBlastCedric Nugteren
2018-06-02Added MKL as an alternative for CBLAS for correctness and performance ↵Cedric Nugteren
comparisons
2018-06-01Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when ↵Cedric Nugteren
barriers are present
2018-05-31Added error-checking for half-empty local work group sizes; fixed a minor ↵Cedric Nugteren
TRSV global worksize issue
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-05-30Widened Apple OpenCL check, added way to debug too-large-workgroups issueCedric Nugteren
2018-05-29Added Apple OpenCL TRSV block size override; removed failing old Intel GPU ↵Cedric Nugteren
test from README
2018-05-27Merge pull request #287 from CNugteren/apple-opencl-limitations-fixesCedric Nugteren
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27Merge pull request #286 from CNugteren/runtime_statistics_in_clientCedric Nugteren
Runtime statistics in client
2018-05-27Added a check to return 'NotImplemented' error code in case of systems with ↵Cedric Nugteren
< 16 LWGS for TSRV and TRSM
2018-05-27Made FillMatrix and FillVector functions take a configurable local workgroup ↵Cedric Nugteren
size
2018-05-27Added maximum time reporting to the client statisticsCedric Nugteren
2018-05-23Added an option in the clients to output timing statistics: minimum, mean, ↵Cedric Nugteren
and standard-deviation
2018-05-19Merge pull request #285 from CNugteren/size_specific_routine_tunerCedric Nugteren
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19Added an option to run the routine tuner for a single specific GEMM sizeCedric Nugteren
2018-05-19Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_diskCedric Nugteren
Routine tuners read kernel JSON from disk
2018-05-19Fixed compilation issuesCedric Nugteren
2018-05-19The GEMM routine tuner now loads kernel JSON tuning results from disk if ↵Cedric Nugteren
available; now run part of alltuners target
2018-05-19Fixed a bug in loading xgemm-direct JSON data from diskCedric Nugteren
2018-05-18Merge pull request #283 from CNugteren/canary_buffer_overflow_protectionCedric Nugteren
Canary buffer overflow protection
2018-05-18Merge branch 'master' into canary_buffer_overflow_protectionCedric Nugteren
2018-05-17Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvementsCedric Nugteren
Better cache behaviour of OpenCL programs
2018-05-17Updated the roadmapCedric Nugteren
2018-05-17Updated README with IWOCL talk and GPU zoo acknowledgmentCedric Nugteren
2018-05-17Added documentation on some details of the GEMM implementationCedric Nugteren
2018-05-17Fixed a few issues with canary region testingCedric Nugteren
2018-05-17Added a canary region for overflow detection to the correctness testsCedric Nugteren
2018-05-17Added a canary region for overflow detection to the tunersCedric Nugteren
2018-05-09Merge pull request #279 from umar456/ci_linksCedric Nugteren
Update ci links to use doman names and build names instead of IP/id
2018-05-08Update ci links to use doman names and build names instead of IP/idUmar Arshad
Updates the README badges to point to the domain name instead of IP addresses. Also updates the names of the builds to the name of the build instead of the id of the build.
2018-05-01Updated README with new badges and paper citationCedric Nugteren
2018-05-01Now stores a shared_ptr to the Program class in the cacheCedric Nugteren
2018-04-29Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroupsCedric Nugteren
Intel subgroup shuffling
2018-04-29Updated the changelogCedric Nugteren
2018-04-29Updated the roadmapCedric Nugteren
2018-04-26Fixed an access violation when compiled with Visual Studio upon releasing ↵Cedric Nugteren
the OpenCL program
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-24Added a define to enable subgroup shuffling if supported by the deviceCedric Nugteren
2018-04-21Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernelCedric Nugteren
Added 2D-register-caching GEMM kernel
2018-04-20Fixes for the CUDA APICedric Nugteren