debian-clblast - Debian package for CLBlast.

Age	Commit message (Collapse)	Author
2020-05-10	Made sure that the global workgroup size is a multiple of the local size in ↵	Cedric Nugteren
	the tuners
2020-05-10	Added logging of local/global workgroup sizes when run the tuners	Cedric Nugteren

2020-05-10	Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines	Cedric Nugteren
	PyCLBlast: add missing batched routines
2020-05-10	Updated PyCLBlast version number	Cedric Nugteren

2020-05-10	Added a sample to demonstrate a batched routine	Cedric Nugteren

2020-05-10	Added pyclblast bindings for the 3 batched routines	Cedric Nugteren

2020-05-04	Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner	Cedric Nugteren
	Move queue creation out of the tuner loop
2020-05-03	Move queue creation out of the tuner loop	Cedric Nugteren

2020-03-15	Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin	Cedric Nugteren
	Change amax/amin behaviour
2020-03-08	Update API documentation	Cedric Nugteren

2020-03-08	Made it more likely (but no guarantees) for amax/amin to return the first index	Cedric Nugteren

2020-03-08	Added sample to play around with XAMAX routine	Cedric Nugteren

2020-03-08	Silenced a new OpenCL warning message	Cedric Nugteren

2020-02-18	Updated to version 1.5.1	Cedric Nugteren

2020-02-18	Merge pull request #376 from CNugteren/fix_tuner_exception_catching	Cedric Nugteren
	Catches all exceptions of the tuners
2020-02-17	Catches all exceptions of the tuners	Cedric Nugteren

2019-12-15	Merge pull request #372 from trantila/master	Cedric Nugteren
	Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-09	Reduce TestMatrix calls for xgemmstridedbatched.	Tarmo Räntilä
	Replace the looped test by a single one with the offset of the last batch.
2019-12-09	Reduce TestMatrix calls for xgemmbatched.	Tarmo Räntilä
	Replace the looped test by a single one with the maximal found offset.
2019-09-06	Added notion of fixes in XhadFaster	Cedric Nugteren

2019-09-06	Merge pull request #368 from etomzak/master	Cedric Nugteren
	Fix out-of-bounds read/write in XhadFaster
2019-09-04	Fix out-of-bounds read/write in XhadFaster	etomzak
	Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.
2019-05-19	Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin	Cedric Nugteren
	Fixed a bug in the absolute-min index kernel
2019-05-19	Fixed a bug in the absolute-min index kernel	Cedric Nugteren

2019-05-16	Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix	Cedric Nugteren
	intel shuffle extension fix
2019-05-11	Added a function to set the OpenCL kernel standard, either 1.1 or 1.2	Cedric Nugteren

2019-05-08	Changed back to cl_intel_subgroups as suggested	Cedric Nugteren

2019-05-07	Added a host-code check to make sure the avc_motion_estimation is available	Cedric Nugteren

2019-05-07	Enabled avc_motion_estimation extension for Intel subgroup shuffling	Cedric Nugteren

2019-05-06	Merge pull request #356 from umar456/osx_assert	Cedric Nugteren
	Remove assert for extention not available in macOS
2019-05-03	Remove assert for extention not available in macOS	Umar Arshad
	The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.
2019-02-09	Added tuning parameters for Tesla P100 16GB	Cedric Nugteren

2019-02-09	Added tuning parameters for Xeon E5-2630 v3 and v4	Cedric Nugteren

2019-01-26	Merge pull request #348 from ↵	Cedric Nugteren
	CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support
2019-01-23	Added fp32 to fp16 conversion function in Python to make haxpy example work	Cedric Nugteren

2019-01-22	Added a (non-working) sample of half precision AXPY in Python	Cedric Nugteren

2019-01-22	Updated pyclblast README, updated to 1.2.0 for half-precision support	Cedric Nugteren

2019-01-22	Added experimental support for half-precision in pyclblast	Cedric Nugteren

2019-01-19	Merge pull request #345 from CNugteren/convolution-fixes-and-tuner	Cedric Nugteren
	Convolution with single kernel
2019-01-19	Added documentation on the convgemm routine	Cedric Nugteren

2019-01-19	Added a few more initial Intel tuning parameters for convgemm	Cedric Nugteren

2019-01-05	Added a check to prevent the stride of matrix C being set to 0 for the ↵	Cedric Nugteren
	strided-batched-GEMM routine
2018-12-31	Added convgemm to the CLBlast database, added initial parameters for Skylake GPU	Cedric Nugteren

2018-12-31	Added support for the convgemm tuner in the tuner database	Cedric Nugteren

2018-12-31	Added the forgotten batch dimension to the tuner to get correct kernel ↵	Cedric Nugteren
	executions
2018-12-23	Merge pull request #343 from vbkaisetsu/feature/convgemm-single	Cedric Nugteren
	Fix single kernel version of convgemm
2018-12-22	Merge branch 'master' into convolution-fixes-and-tuner	Cedric Nugteren

2018-12-21	Update changelog	Koichi Akabe

2018-12-18	Update the documentation	Koichi Akabe

2018-12-18	Fix the xconvgemm tuner	Koichi Akabe