debian-clblast - Debian package for CLBlast.

Age	Commit message (Collapse)	Author
2023-05-07	AMAX/AMIN integer testing and bug fixes (#457)	Cedric Nugteren
	* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result * Perform proper integer-output testing in XAMAX tests * A few changes towards getting it ready for a PR * Also fix compilation for clBLAS and cuBLAS references * Fix a bug that would only use the real part of complex numbers in the amax/amin routines * A few small fixes related to the AMAX tests
2023-01-17	Updated according to feedback from CNugteren	Angus, Alexander

2023-01-03	implemented changes to boost Adreno performance according to ↵	Angus, Alexander
	https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2022-06-24	Fix typo in comment	Cedric Nugteren
	Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-04-22	sum fix	Justin Graham

2020-03-08	Made it more likely (but no guarantees) for amax/amin to return the first index	Cedric Nugteren

2019-09-04	Fix out-of-bounds read/write in XhadFaster	etomzak
	Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.
2019-05-19	Fixed a bug in the absolute-min index kernel	Cedric Nugteren

2019-05-08	Changed back to cl_intel_subgroups as suggested	Cedric Nugteren

2019-05-07	Enabled avc_motion_estimation extension for Intel subgroup shuffling	Cedric Nugteren

2018-12-18	Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel	Koichi Akabe

2018-11-19	Remove unnecessary attribute of inline function	Koichi Akabe

2018-11-12	Add kernel_mode option to im2col, col2im, and convgemm functions	Koichi Akabe

2018-11-07	Changed col2im to append to the existing im-buffer	Cedric Nugteren

2018-11-01	Added new col2im routine to the documentation	Cedric Nugteren

2018-10-30	Fix col2im implementation	Koichi Akabe

2018-10-23	Added groundwork for col2im algorithm plus first non-working version of ↵	Cedric Nugteren
	kernel and test
2018-10-17	Fixed a bug with the pre-processing and the AXPY kernel	Cedric Nugteren

2018-10-15	Fixed a bug in the XaxpyFaster kernel for specific parameters	Cedric Nugteren

2018-10-14	Merge pull request #319 from CNugteren/convgemm_multi_kernel	Cedric Nugteren
	First im2col+GEMM implementation of convolution
2018-10-10	Fixed pre-processor warnings related to the subgroup shuffling	Cedric Nugteren

2018-09-16	Merge branch 'master' into convgemm_multi_kernel	Cedric Nugteren

2018-09-15	Fixed an MSVC compilation error due to large strings	Cedric Nugteren

2018-09-15	Fixed issues with GEMMK=1 kernel and the pre-processor	Cedric Nugteren

2018-09-07	Added xCONVGEMM as im2col plus a batched GEMM kernel	Cedric Nugteren

2018-07-29	Merge branch 'master' into CLBlast-267-convgemm	Cedric Nugteren

2018-07-28	Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 ↵	Cedric Nugteren
	kernels to improve performance
2018-07-27	Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel	Cedric Nugteren

2018-07-16	moved a two-line macro to a single line	Tyler Sorensen

2018-07-14	Applied feedback from Cedric from first pull request	Tyler Sorensen

2018-07-11	added inline ptx to support shuffle on Nvidia GPUs	Tyler Sorensen

2018-06-03	Merge branch 'master' into CLBlast-267-convgemm	Cedric Nugteren

2018-05-31	Some potential fixes for error -54 when launching TRSV and TRSM kernels	Cedric Nugteren

2018-05-21	Further implemented single-kernel approach of convgemm; extended test to ↵	Cedric Nugteren
	capture other parts of the kernel code
2018-05-21	Added method selection option to switch between im2col and single-kernel ↵	Cedric Nugteren
	approach for convgemm
2018-05-19	Moved new convgemm kernel to levelx kernel folder	Cedric Nugteren

2018-05-19	Second version of direct reading from image tensor for convgemm: also with ↵	Cedric Nugteren
	local memory support now
2018-05-17	First version of direct reading from image tensor for convgemm: only for ↵	Cedric Nugteren
	edge cases now
2018-05-13	Created a dedicated convgemm GEMM kernel as a copy of the batched direct ↵	Cedric Nugteren
	gemm kernel
2018-05-13	Plugged in the code of strided-batched-gemm into convgemm in preparation of ↵	Cedric Nugteren
	a new kernel
2018-04-24	Added Intel subgroup shuffle support to the 2D register caching GEMM kernel	Cedric Nugteren

2018-04-08	Fixed issues with the pre-processor	Cedric Nugteren

2018-04-07	Extended the GEMM tuner to be able to tune the new 'kernel 1'	Cedric Nugteren

2018-04-07	Fixed a compilation issue for complex datatypes and vload	Cedric Nugteren

2018-04-06	Fixed a compilation issue for complex datatypes and vload	Cedric Nugteren

2018-04-03	Added first version of 2D register tiling kernel with A and C transposed as well	Cedric Nugteren

2018-03-23	Removed arrays as function argument from GEMM kernels for Vivante OpenCL ↵	Cedric Nugteren
	compiler
2018-03-15	Fixed a failing TRSM test using a CPU with Apple OpenCL	Cedric Nugteren

2018-03-15	Fixed a failing TRSV test using a CPU with Apple OpenCL	Cedric Nugteren

2018-02-02	Implemented the XHAD Hadamard product routine	Cedric Nugteren