summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2018-09-15Fixed issues with GEMMK=1 kernel and the pre-processorCedric Nugteren
2018-09-15Added pre-processor test for GEMMK=1 kernelCedric Nugteren
2018-09-03Merge pull request #316 from ranocha/patch-1Cedric Nugteren
2018-09-03Add Julia WrapperHendrik Ranocha
2018-08-14Merge pull request #312 from CNugteren/CLBlast-311-missing-event-in-trsv-trsmCedric Nugteren
2018-08-13Made last operation in TRSV and TRSM asynchronous, making the events not nullCedric Nugteren
2018-08-13Small refactoring of events in TRSV substitution routineCedric Nugteren
2018-08-09Merge pull request #310 from CNugteren/CLBlast-307-netlib-api-static-opencl-varsCedric Nugteren
2018-08-07Name change of setting to NETLIB_PERSISTENT_OPENCLCedric Nugteren
2018-08-05Added an option to compile the Netlib API with static OpenCL device and contextCedric Nugteren
2018-08-02Merge pull request #309 from CNugteren/CLBlast-306-omatcopy-conjugateCedric Nugteren
2018-07-31Merge pull request #308 from CNugteren/CLBlast-301-weird-AMD-Hainan-bugCedric Nugteren
2018-07-31Fixed issue with not performing complex conjugation under certain cases when ...Cedric Nugteren
2018-07-31Fixed the tests of OMATCOPY to include proper complex conjugationCedric Nugteren
2018-07-31Fixed an error reporting issue related to the canary regionCedric Nugteren
2018-07-31Added note about AMD southern islands GPU issue and the required workaroundCedric Nugteren
2018-07-31Added Beignet 1.2.1 requirement to the README for IvyBridge GPUsCedric Nugteren
2018-07-31Updated the tuning results for Intel IvyBridge M GT2Cedric Nugteren
2018-07-30Merge pull request #305 from CNugteren/CLBlast-303-tuner-check-local-sizeCedric Nugteren
2018-07-29Fixed a wrong event issue causing error -57Cedric Nugteren
2018-07-28Added print statements to indicate the 4 stages of GEMM tuningCedric Nugteren
2018-07-28The tuners now also check for valid local thread configurations and skip inva...Cedric Nugteren
2018-07-28Merge pull request #304 from CNugteren/CLBlast-300-fix-staggered-indices-AMD-...Cedric Nugteren
2018-07-28Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 kern...Cedric Nugteren
2018-07-27Fixed an issue with AMD GPUs and the new GEMMK == 1 kernelCedric Nugteren
2018-07-27Fixed a bug: forgot to initialize the shared pointer for the null kernelCedric Nugteren
2018-07-27Renamed AMD SI workaround definesCedric Nugteren
2018-07-25Added workaround for weird AMD SI Hainan bugCedric Nugteren
2018-07-25Added code to report the average tuning resultsCedric Nugteren
2018-07-23Merge pull request #297 from tyler-utah/masterCedric Nugteren
2018-07-16moved a two-line macro to a single lineTyler Sorensen
2018-07-14forgot to add test cases back in, oopsTyler Sorensen
2018-07-14Applied feedback from Cedric from first pull requestTyler Sorensen
2018-07-14Updated to CLBlast version 1.4.1Cedric Nugteren
2018-07-13Added tuning results for Intel i5-4970SCedric Nugteren
2018-07-13Added device-name removal code to handle POCL naming conventionCedric Nugteren
2018-07-13Added tuning results for GeForce GTX 1070 TiCedric Nugteren
2018-07-13Added tuning results for HD Graphics 6000 Broadwell GT3Cedric Nugteren
2018-07-11restored some of the changed tuning files for xgemmTyler Sorensen
2018-07-11added inline ptx to support shuffle on Nvidia GPUsTyler Sorensen
2018-07-06Updated changelogCedric Nugteren
2018-07-06Merge pull request #296 from alycm/CLBlast-291-eliminate-temporary-programCedric Nugteren
2018-07-06Eliminate a temporary Program objectAlastair Murray
2018-06-28Merge pull request #295 from CNugteren/CLBlast-292-no-cl-program-release-windowsCedric Nugteren
2018-06-28Disabled calls to clReleaseProgram under Windows to avoid segfaults when the ...Cedric Nugteren
2018-06-03Updated to CLBlast version 1.4.0Cedric Nugteren
2018-06-03Added list of tuners to be run by 'alltuners' targetCedric Nugteren
2018-06-03Fixes for CUDA version of CLBlastCedric Nugteren
2018-06-02Added MKL as an alternative for CBLAS for correctness and performance compari...Cedric Nugteren
2018-06-01Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when bar...Cedric Nugteren