summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2018-07-13Added tuning results for HD Graphics 6000 Broadwell GT3Cedric Nugteren
2018-07-06Updated changelogCedric Nugteren
2018-07-06Merge pull request #296 from alycm/CLBlast-291-eliminate-temporary-programCedric Nugteren
2018-07-06Eliminate a temporary Program objectAlastair Murray
2018-06-28Merge pull request #295 from CNugteren/CLBlast-292-no-cl-program-release-windowsCedric Nugteren
2018-06-28Disabled calls to clReleaseProgram under Windows to avoid segfaults when the ...Cedric Nugteren
2018-06-03Updated to CLBlast version 1.4.0Cedric Nugteren
2018-06-03Added list of tuners to be run by 'alltuners' targetCedric Nugteren
2018-06-03Fixes for CUDA version of CLBlastCedric Nugteren
2018-06-02Added MKL as an alternative for CBLAS for correctness and performance compari...Cedric Nugteren
2018-06-01Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when bar...Cedric Nugteren
2018-05-31Added error-checking for half-empty local work group sizes; fixed a minor TRS...Cedric Nugteren
2018-05-31Some potential fixes for error -54 when launching TRSV and TRSM kernelsCedric Nugteren
2018-05-30Widened Apple OpenCL check, added way to debug too-large-workgroups issueCedric Nugteren
2018-05-29Added Apple OpenCL TRSV block size override; removed failing old Intel GPU te...Cedric Nugteren
2018-05-27Merge pull request #287 from CNugteren/apple-opencl-limitations-fixesCedric Nugteren
2018-05-27Merge pull request #286 from CNugteren/runtime_statistics_in_clientCedric Nugteren
2018-05-27Added a check to return 'NotImplemented' error code in case of systems with <...Cedric Nugteren
2018-05-27Made FillMatrix and FillVector functions take a configurable local workgroup ...Cedric Nugteren
2018-05-27Added maximum time reporting to the client statisticsCedric Nugteren
2018-05-23Added an option in the clients to output timing statistics: minimum, mean, an...Cedric Nugteren
2018-05-19Merge pull request #285 from CNugteren/size_specific_routine_tunerCedric Nugteren
2018-05-19Added an option to run the routine tuner for a single specific GEMM sizeCedric Nugteren
2018-05-19Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_diskCedric Nugteren
2018-05-19Fixed compilation issuesCedric Nugteren
2018-05-19The GEMM routine tuner now loads kernel JSON tuning results from disk if avai...Cedric Nugteren
2018-05-19Fixed a bug in loading xgemm-direct JSON data from diskCedric Nugteren
2018-05-18Merge pull request #283 from CNugteren/canary_buffer_overflow_protectionCedric Nugteren
2018-05-18Merge branch 'master' into canary_buffer_overflow_protectionCedric Nugteren
2018-05-17Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvementsCedric Nugteren
2018-05-17Updated the roadmapCedric Nugteren
2018-05-17Updated README with IWOCL talk and GPU zoo acknowledgmentCedric Nugteren
2018-05-17Added documentation on some details of the GEMM implementationCedric Nugteren
2018-05-17Fixed a few issues with canary region testingCedric Nugteren
2018-05-17Added a canary region for overflow detection to the correctness testsCedric Nugteren
2018-05-17Added a canary region for overflow detection to the tunersCedric Nugteren
2018-05-09Merge pull request #279 from umar456/ci_linksCedric Nugteren
2018-05-08Update ci links to use doman names and build names instead of IP/idUmar Arshad
2018-05-01Updated README with new badges and paper citationCedric Nugteren
2018-05-01Now stores a shared_ptr to the Program class in the cacheCedric Nugteren
2018-04-29Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroupsCedric Nugteren
2018-04-29Updated the changelogCedric Nugteren
2018-04-29Updated the roadmapCedric Nugteren
2018-04-26Fixed an access violation when compiled with Visual Studio upon releasing the...Cedric Nugteren
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-24Added a define to enable subgroup shuffling if supported by the deviceCedric Nugteren
2018-04-21Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernelCedric Nugteren
2018-04-20Fixes for the CUDA APICedric Nugteren
2018-04-18Expressed HER2K as two HERK callsCedric Nugteren
2018-04-18Expressed SYR2K as two SYRK callsCedric Nugteren