debian-clblast - Debian package for CLBlast.

Age	Commit message (Collapse)	Author
2020-02-17	Catches all exceptions of the tuners	Cedric Nugteren

2019-12-15	Merge pull request #372 from trantila/master	Cedric Nugteren
	Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-09	Reduce TestMatrix calls for xgemmstridedbatched.	Tarmo Räntilä
	Replace the looped test by a single one with the offset of the last batch.
2019-12-09	Reduce TestMatrix calls for xgemmbatched.	Tarmo Räntilä
	Replace the looped test by a single one with the maximal found offset.
2019-09-06	Added notion of fixes in XhadFaster	Cedric Nugteren

2019-09-06	Merge pull request #368 from etomzak/master	Cedric Nugteren
	Fix out-of-bounds read/write in XhadFaster
2019-09-04	Fix out-of-bounds read/write in XhadFaster	etomzak
	Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.
2019-05-19	Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin	Cedric Nugteren
	Fixed a bug in the absolute-min index kernel
2019-05-19	Fixed a bug in the absolute-min index kernel	Cedric Nugteren

2019-05-16	Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix	Cedric Nugteren
	intel shuffle extension fix
2019-05-11	Added a function to set the OpenCL kernel standard, either 1.1 or 1.2	Cedric Nugteren

2019-05-08	Changed back to cl_intel_subgroups as suggested	Cedric Nugteren

2019-05-07	Added a host-code check to make sure the avc_motion_estimation is available	Cedric Nugteren

2019-05-07	Enabled avc_motion_estimation extension for Intel subgroup shuffling	Cedric Nugteren

2019-05-06	Merge pull request #356 from umar456/osx_assert	Cedric Nugteren
	Remove assert for extention not available in macOS
2019-05-03	Remove assert for extention not available in macOS	Umar Arshad
	The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.
2019-02-09	Added tuning parameters for Tesla P100 16GB	Cedric Nugteren

2019-02-09	Added tuning parameters for Xeon E5-2630 v3 and v4	Cedric Nugteren

2019-01-26	Merge pull request #348 from ↵	Cedric Nugteren
	CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support
2019-01-23	Added fp32 to fp16 conversion function in Python to make haxpy example work	Cedric Nugteren

2019-01-22	Added a (non-working) sample of half precision AXPY in Python	Cedric Nugteren

2019-01-22	Updated pyclblast README, updated to 1.2.0 for half-precision support	Cedric Nugteren

2019-01-22	Added experimental support for half-precision in pyclblast	Cedric Nugteren

2019-01-19	Merge pull request #345 from CNugteren/convolution-fixes-and-tuner	Cedric Nugteren
	Convolution with single kernel
2019-01-19	Added documentation on the convgemm routine	Cedric Nugteren

2019-01-19	Added a few more initial Intel tuning parameters for convgemm	Cedric Nugteren

2019-01-05	Added a check to prevent the stride of matrix C being set to 0 for the ↵	Cedric Nugteren
	strided-batched-GEMM routine
2018-12-31	Added convgemm to the CLBlast database, added initial parameters for Skylake GPU	Cedric Nugteren

2018-12-31	Added support for the convgemm tuner in the tuner database	Cedric Nugteren

2018-12-31	Added the forgotten batch dimension to the tuner to get correct kernel ↵	Cedric Nugteren
	executions
2018-12-23	Merge pull request #343 from vbkaisetsu/feature/convgemm-single	Cedric Nugteren
	Fix single kernel version of convgemm
2018-12-22	Merge branch 'master' into convolution-fixes-and-tuner	Cedric Nugteren

2018-12-21	Update changelog	Koichi Akabe

2018-12-18	Update the documentation	Koichi Akabe

2018-12-18	Fix the xconvgemm tuner	Koichi Akabe

2018-12-18	Added first version of a tuner for the ConvGemm direct kernel	Cedric Nugteren

2018-12-18	Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel	Koichi Akabe

2018-12-17	Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests	Cedric Nugteren
	Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
2018-12-17	Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm	Koichi Akabe

2018-12-04	Updated to version 1.5.0	Cedric Nugteren

2018-12-01	Updated the roadmap document	Cedric Nugteren

2018-12-01	Added a FAQ document	Cedric Nugteren

2018-12-01	Merge pull request #341 from ↵	Cedric Nugteren
	CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG Fixed an issue for the GEMMK == 1 kernel
2018-11-30	Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel	Cedric Nugteren

2018-11-19	Merge pull request #335 from vbkaisetsu/patch-1	Cedric Nugteren
	Remove unnecessary qualifier of inline function
2018-11-19	Remove unnecessary attribute of inline function	Koichi Akabe

2018-11-17	Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip	Cedric Nugteren
	Add im2colflip and col2imflip functions
2018-11-12	Add kernel_mode option to im2col, col2im, and convgemm functions	Koichi Akabe

2018-11-09	Merge pull request #331 from CNugteren/CLBlast-270-col2im	Cedric Nugteren
	Implements col2im routine
2018-11-07	Changed col2im to append to the existing im-buffer	Cedric Nugteren