debian-clblast - Debian package for CLBlast.

Age	Commit message (Collapse)	Author
2023-05-07	AMAX/AMIN integer testing and bug fixes (#457)	Cedric Nugteren
	* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result * Perform proper integer-output testing in XAMAX tests * A few changes towards getting it ready for a PR * Also fix compilation for clBLAS and cuBLAS references * Fix a bug that would only use the real part of complex numbers in the amax/amin routines * A few small fixes related to the AMAX tests
2023-01-17	Updated according to feedback from CNugteren	Angus, Alexander

2023-01-03	implemented changes to boost Adreno performance according to ↵	Angus, Alexander
	https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2022-04-13	android.hpp: custom header guard of _clang_	danyougle
	In order not to have ambiguous definitions, exclude the functions for other compilers
2019-12-09	Reduce TestMatrix calls for xgemmstridedbatched.	Tarmo Räntilä
	Replace the looped test by a single one with the offset of the last batch.
2019-12-09	Reduce TestMatrix calls for xgemmbatched.	Tarmo Räntilä
	Replace the looped test by a single one with the maximal found offset.
2019-05-11	Added a function to set the OpenCL kernel standard, either 1.1 or 1.2	Cedric Nugteren

2019-05-08	Changed back to cl_intel_subgroups as suggested	Cedric Nugteren

2019-05-07	Added a host-code check to make sure the avc_motion_estimation is available	Cedric Nugteren

2018-11-12	Add kernel_mode option to im2col, col2im, and convgemm functions	Koichi Akabe

2018-10-30	Fix col2im implementation	Koichi Akabe

2018-09-16	Merge branch 'master' into convgemm_multi_kernel	Cedric Nugteren

2018-09-15	Disabled Intel subgroup shuffling for double-precision	Cedric Nugteren

2018-07-29	Merge branch 'master' into CLBlast-267-convgemm	Cedric Nugteren

2018-07-23	Merge pull request #297 from tyler-utah/master	Cedric Nugteren
	inline PTX to support subgroup shuffle for Nvidia GPUs
2018-07-14	Applied feedback from Cedric from first pull request	Tyler Sorensen

2018-07-13	Added device-name removal code to handle POCL naming convention	Cedric Nugteren

2018-07-11	added inline ptx to support shuffle on Nvidia GPUs	Tyler Sorensen

2018-06-03	Merge branch 'master' into CLBlast-267-convgemm	Cedric Nugteren

2018-05-23	Added an option in the clients to output timing statistics: minimum, mean, ↵	Cedric Nugteren
	and standard-deviation
2018-05-19	Merge branch 'master' into CLBlast-267-convgemm	Cedric Nugteren

2018-05-18	Merge branch 'master' into canary_buffer_overflow_protection	Cedric Nugteren

2018-05-17	Added a canary region for overflow detection to the tuners	Cedric Nugteren

2018-05-06	Added convgemm skeleton, test infrastructure, and first reference implementation	Cedric Nugteren

2018-05-01	Now stores a shared_ptr to the Program class in the cache	Cedric Nugteren

2018-04-24	Added a define to enable subgroup shuffling if supported by the device	Cedric Nugteren

2018-03-06	First version of the tuning API, added interface for copy-kernel, added sample	Cedric Nugteren

2018-02-11	Fixed a minor typo	Cedric Nugteren

2017-12-24	Fixes for the CUDA backend of CLBlast	Cedric Nugteren

2017-12-23	Added TRSV block-size tuner	Cedric Nugteren

2017-12-17	Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 ↵	Cedric Nugteren
	results based on kernel pre-processor
2017-12-10	Fixed a missing include	Cedric Nugteren

2017-12-09	Made the pre-processor run by default for ARM and Qualcomm GPUs	Cedric Nugteren

2017-11-30	Integrated pre-processor in compilation flow, default is still disabled	Cedric Nugteren

2017-11-25	Moved string splitting functions; added string character removal function	Cedric Nugteren

2017-11-22	Made parameter override in the clients a command-line argument and added ↵	Cedric Nugteren
	support for multi-kernel routines
2017-11-19	Added compilation timing and better compilation error reporting	Cedric Nugteren

2017-11-19	Revived the GEMM routine tuner; minor formatting changes	Cedric Nugteren

2017-11-17	Moved compilation function to separate file; removed dependency of tuners of ↵	Cedric Nugteren
	the CLBlast library
2017-11-15	Added first version of integrated and re-written auto-tuner	Cedric Nugteren

2017-11-15	Added kernel timing functionality to the utilities	Cedric Nugteren

2017-11-15	Added exception handle with catch-all	Cedric Nugteren

2017-11-13	Made the exception dispatch function optionally silent	Cedric Nugteren

2017-11-13	Moved square-difference utility function for use in the tuners	Cedric Nugteren

2017-11-07	Merge pull request #212 from CNugteren/kernel_selection_tuner	Cedric Nugteren
	GEMM kernel selection tuner
2017-11-02	Integrated the GEMM routine tuner for kernel selection; added first tuning ↵	Cedric Nugteren
	results
2017-10-30	Added collecting and printing of scores for the kernel-selection tuner	Cedric Nugteren

2017-10-29	Added Android support using the GNU C++ STL library and the GCC toolchain	Cedric Nugteren

2017-10-28	Merge branch 'master' into android_support	Cedric Nugteren

2017-10-28	Added initial version of a GEMM kernel selection tuner	Cedric Nugteren