summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2020-02-17Catches all exceptions of the tunersCedric Nugteren
2019-12-09Reduce TestMatrix calls for xgemmstridedbatched.Tarmo Räntilä
2019-12-09Reduce TestMatrix calls for xgemmbatched.Tarmo Räntilä
2019-09-04Fix out-of-bounds read/write in XhadFasteretomzak
2019-05-19Fixed a bug in the absolute-min index kernelCedric Nugteren
2019-05-11Added a function to set the OpenCL kernel standard, either 1.1 or 1.2Cedric Nugteren
2019-05-08Changed back to cl_intel_subgroups as suggestedCedric Nugteren
2019-05-07Added a host-code check to make sure the avc_motion_estimation is availableCedric Nugteren
2019-05-07Enabled avc_motion_estimation extension for Intel subgroup shufflingCedric Nugteren
2019-05-03Remove assert for extention not available in macOSUmar Arshad
2019-02-09Added tuning parameters for Tesla P100 16GBCedric Nugteren
2019-02-09Added tuning parameters for Xeon E5-2630 v3 and v4Cedric Nugteren
2019-01-23Added fp32 to fp16 conversion function in Python to make haxpy example workCedric Nugteren
2019-01-22Added a (non-working) sample of half precision AXPY in PythonCedric Nugteren
2019-01-22Updated pyclblast README, updated to 1.2.0 for half-precision supportCedric Nugteren
2019-01-22Added experimental support for half-precision in pyclblastCedric Nugteren
2019-01-19Merge pull request #345 from CNugteren/convolution-fixes-and-tunerCedric Nugteren
2019-01-19Added a few more initial Intel tuning parameters for convgemmCedric Nugteren
2019-01-05Added a check to prevent the stride of matrix C being set to 0 for the stride...Cedric Nugteren
2018-12-31Added convgemm to the CLBlast database, added initial parameters for Skylake GPUCedric Nugteren
2018-12-31Added support for the convgemm tuner in the tuner databaseCedric Nugteren
2018-12-31Added the forgotten batch dimension to the tuner to get correct kernel execut...Cedric Nugteren
2018-12-18Fix the xconvgemm tunerKoichi Akabe
2018-12-18Added first version of a tuner for the ConvGemm direct kernelCedric Nugteren
2018-12-18Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernelKoichi Akabe
2018-11-30Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernelCedric Nugteren
2018-11-19Remove unnecessary attribute of inline functionKoichi Akabe
2018-11-12Add kernel_mode option to im2col, col2im, and convgemm functionsKoichi Akabe
2018-11-07Changed col2im to append to the existing im-bufferCedric Nugteren
2018-11-01Added new col2im routine to the documentationCedric Nugteren
2018-10-30Fix col2im implementationKoichi Akabe
2018-10-23Added groundwork for col2im algorithm plus first non-working version of kerne...Cedric Nugteren
2018-10-22Some name changes in im2col codeCedric Nugteren
2018-10-17Fixed a bug with the pre-processing and the AXPY kernelCedric Nugteren
2018-10-15Fixed a bug in the XaxpyFaster kernel for specific parametersCedric Nugteren
2018-10-14Merge pull request #319 from CNugteren/convgemm_multi_kernelCedric Nugteren
2018-10-13Made tuning API more flexible: disregards any extra parameter valuesCedric Nugteren
2018-10-10Fixed pre-processor warnings related to the subgroup shufflingCedric Nugteren
2018-09-16Merge branch 'master' into convgemm_multi_kernelCedric Nugteren
2018-09-15Fixed an MSVC compilation error due to large stringsCedric Nugteren
2018-09-15Fixed an MSVC compilation error due to large stringsCedric Nugteren
2018-09-15Disabled Intel subgroup shuffling for double-precisionCedric Nugteren
2018-09-15Fixed issues with GEMMK=1 kernel and the pre-processorCedric Nugteren
2018-09-07Added xCONVGEMM as im2col plus a batched GEMM kernelCedric Nugteren
2018-08-13Made last operation in TRSV and TRSM asynchronous, making the events not nullCedric Nugteren
2018-08-13Small refactoring of events in TRSV substitution routineCedric Nugteren
2018-08-07Name change of setting to NETLIB_PERSISTENT_OPENCLCedric Nugteren
2018-08-05Added an option to compile the Netlib API with static OpenCL device and contextCedric Nugteren
2018-08-02Merge pull request #309 from CNugteren/CLBlast-306-omatcopy-conjugateCedric Nugteren
2018-07-31Merge pull request #308 from CNugteren/CLBlast-301-weird-AMD-Hainan-bugCedric Nugteren