summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2018-05-09Split channels/strides testing values off from kernel sizes for more flexibilityCedric Nugteren
2018-05-06Added convgemm skeleton, test infrastructure, and first reference implementationCedric Nugteren
2018-05-05Added interface of batched convolution as GEMMCedric Nugteren
2018-05-01Updated README with new badges and paper citationCedric Nugteren
2018-04-29Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroupsCedric Nugteren
2018-04-29Updated the changelogCedric Nugteren
2018-04-29Updated the roadmapCedric Nugteren
2018-04-26Fixed an access violation when compiled with Visual Studio upon releasing the...Cedric Nugteren
2018-04-24Added Intel subgroup shuffle support to the 2D register caching GEMM kernelCedric Nugteren
2018-04-24Added a define to enable subgroup shuffling if supported by the deviceCedric Nugteren
2018-04-21Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernelCedric Nugteren
2018-04-20Fixes for the CUDA APICedric Nugteren
2018-04-18Expressed HER2K as two HERK callsCedric Nugteren
2018-04-18Expressed SYR2K as two SYRK callsCedric Nugteren
2018-04-17Updated HERK and SYRK to follow the GEMM style and functions to make it work ...Cedric Nugteren
2018-04-15Fixed some failing tests for GEMM and batched GEMM routinesCedric Nugteren
2018-04-15Updated tuning results for the Skylake ULT GT2 GPU with the new kernelCedric Nugteren
2018-04-13Made GEMM rotation expectations kernel-specificCedric Nugteren
2018-04-10Updated database with defaults of GEMMK=0 and KREG=1Cedric Nugteren
2018-04-10Made it possible to add tuning parameters to the database using the scriptCedric Nugteren
2018-04-10Fixed a bug in the compression part of the database scriptCedric Nugteren
2018-04-08Extended the maximum number of tuning parameters from 14 to 16Cedric Nugteren
2018-04-08Fixed issues with the pre-processorCedric Nugteren
2018-04-07Merge branch 'master' into CLBlast-228-2d-register-gemm-kernelCedric Nugteren
2018-04-07Added tuning results for NVIDIA GeForce 970Cedric Nugteren
2018-04-07Added tuning results for NVIDIA GeForce 920MXCedric Nugteren
2018-04-07Fixed a python3 import error issue with the database scriptCedric Nugteren
2018-04-07Added tuning results for Intel HD Graphics 620Cedric Nugteren
2018-04-07Extended the GEMM tuner to be able to tune the new 'kernel 1'Cedric Nugteren
2018-04-07Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-06Fixed a compilation issue for complex datatypes and vloadCedric Nugteren
2018-04-03Added first version of 2D register tiling kernel with A and C transposed as wellCedric Nugteren
2018-03-30Updated pyclblast to 1.1.0 and uploaded to PyPiCedric Nugteren
2018-03-30Merge pull request #255 from kodonnell/py_overrideCedric Nugteren
2018-03-30Added argument checking for the GEMM tuner: expects m/n to be multiples of MW...Cedric Nugteren
2018-03-30Updated the roadmapCedric Nugteren
2018-03-30Merge branch 'CLBlast-227-vivante-compiler-errors'Cedric Nugteren
2018-03-27mergedkodonell
2018-03-27got the generator thing workingkodonell
2018-03-27moved override_parameters example out of sgemm examplekodonell
2018-03-26tidying up pyclblast override_parameters api, and added examplekodonell
2018-03-23Removed arrays as function argument from GEMM kernels for Vivante OpenCL comp...Cedric Nugteren
2018-03-22Merge pull request #269 from CNugteren/CLBlast-266-local-mem-constraintCedric Nugteren
2018-03-22Added the OpenCL local memory size constraint to the tunersCedric Nugteren
2018-03-21Re-added support for local memory size constraint checking in the tunerCedric Nugteren
2018-03-15Fixed a failing TRSM test using a CPU with Apple OpenCLCedric Nugteren
2018-03-15Fixed a failing TRSV test using a CPU with Apple OpenCLCedric Nugteren
2018-03-15Fixed breaking preprocessor test on certain platforms due to empty kernel stringCedric Nugteren
2018-03-15Added queue-finish commands to PyCLBlast samples and testsCedric Nugteren
2018-03-11Merge pull request #262 from CNugteren/CLBlast-237-tuning-apiCedric Nugteren