summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-05-27Added maximum time reporting to the client statisticsCedric Nugteren
2018-05-23Added an option in the clients to output timing statistics: minimum, mean, ↵Cedric Nugteren
and standard-deviation
2018-05-19Merge pull request #285 from CNugteren/size_specific_routine_tunerCedric Nugteren
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19Added an option to run the routine tuner for a single specific GEMM sizeCedric Nugteren
2018-05-19Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_diskCedric Nugteren
Routine tuners read kernel JSON from disk
2018-05-19Fixed compilation issuesCedric Nugteren
2018-05-19The GEMM routine tuner now loads kernel JSON tuning results from disk if ↵Cedric Nugteren
available; now run part of alltuners target
2018-05-19Fixed a bug in loading xgemm-direct JSON data from diskCedric Nugteren
2018-05-18Merge pull request #283 from CNugteren/canary_buffer_overflow_protectionCedric Nugteren
Canary buffer overflow protection
2018-05-18Merge branch 'master' into canary_buffer_overflow_protectionCedric Nugteren
2018-05-17Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvementsCedric Nugteren
Better cache behaviour of OpenCL programs
2018-05-17Updated the roadmapCedric Nugteren
2018-05-17Updated README with IWOCL talk and GPU zoo acknowledgmentCedric Nugteren
2018-05-17Added documentation on some details of the GEMM implementationCedric Nugteren
2018-05-17Fixed a few issues with canary region testingCedric Nugteren
2018-05-17Added a canary region for overflow detection to the correctness testsCedric Nugteren
2018-05-17Added a canary region for overflow detection to the tunersCedric Nugteren
2018-05-09Merge pull request #279 from umar456/ci_linksCedric Nugteren
Update ci links to use doman names and build names instead of IP/id
2018-05-08Update ci links to use doman names and build names instead of IP/idUmar Arshad
Updates the README badges to point to the domain name instead of IP addresses. Also updates the names of the builds to the name of the build instead of the id of the build.
2018-05-01Updated README with new badges and paper citationCedric Nugteren
2018-05-01Now stores a shared_ptr to the Program class in the cacheCedric Nugteren
2018-04-29Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroupsCedric Nugteren
Intel subgroup shuffling
2018-04-29Updated the changelogCedric Nugteren
2018-04-29Updated the roadmapCedric Nugteren
2018-04-26Fixed an access violation when compiled with Visual Studio upon releasing ↵Cedric Nugteren
the OpenCL program
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-24Added a define to enable subgroup shuffling if supported by the deviceCedric Nugteren
2018-04-21Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernelCedric Nugteren
Added 2D-register-caching GEMM kernel
2018-04-20Fixes for the CUDA APICedric Nugteren
2018-04-18Expressed HER2K as two HERK callsCedric Nugteren
2018-04-18Expressed SYR2K as two SYRK callsCedric Nugteren
2018-04-17Updated HERK and SYRK to follow the GEMM style and functions to make it work ↵Cedric Nugteren
with the new kernel
2018-04-15Fixed some failing tests for GEMM and batched GEMM routinesCedric Nugteren
2018-04-15Updated tuning results for the Skylake ULT GT2 GPU with the new kernelCedric Nugteren
2018-04-13Made GEMM rotation expectations kernel-specificCedric Nugteren
2018-04-10Updated database with defaults of GEMMK=0 and KREG=1Cedric Nugteren
2018-04-10Made it possible to add tuning parameters to the database using the scriptCedric Nugteren
2018-04-10Fixed a bug in the compression part of the database scriptCedric Nugteren
2018-04-08Extended the maximum number of tuning parameters from 14 to 16Cedric Nugteren
2018-04-08Fixed issues with the pre-processorCedric Nugteren
2018-04-07Merge branch 'master' into CLBlast-228-2d-register-gemm-kernelCedric Nugteren
2018-04-07Added tuning results for NVIDIA GeForce 970Cedric Nugteren
2018-04-07Added tuning results for NVIDIA GeForce 920MXCedric Nugteren
2018-04-07Fixed a python3 import error issue with the database scriptCedric Nugteren
2018-04-07Added tuning results for Intel HD Graphics 620Cedric Nugteren
2018-04-07Extended the GEMM tuner to be able to tune the new 'kernel 1'Cedric Nugteren
2018-04-07Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-06Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-03Added first version of 2D register tiling kernel with A and C transposed as wellCedric Nugteren
2018-03-30Updated pyclblast to 1.1.0 and uploaded to PyPiCedric Nugteren