debian-clblast - Debian package for CLBlast.

Age	Commit message (Collapse)	Author
2023-01-21	Add tuning results for Adreno 730	Cedric Nugteren

2023-01-17	Updated according to feedback from CNugteren	Angus, Alexander

2023-01-03	implemented changes to boost Adreno performance according to ↵	Angus, Alexander
	https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2022-09-22	Update PyCLBlast version number	Cedric Nugteren

2022-06-24	Fix typo in comment	Cedric Nugteren
	Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-05-23	Fix API inconsistency in cupp11.hpp	Cedric Nugteren
	The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-16	Merge pull request #432 from justingra/sum-fix	Cedric Nugteren
	sum fix
2022-04-25	Add tuning results for Adreno 540	Cedric Nugteren

2022-04-25	Add tuning results for Radeon RX 6500 XT	Cedric Nugteren

2022-04-25	Add tuning results for Radeon RX 6800 XT	Cedric Nugteren

2022-04-22	sum fix	Justin Graham

2022-04-13	android.hpp: custom header guard of _clang_	danyougle
	In order not to have ambiguous definitions, exclude the functions for other compilers
2021-08-27	Add Quadro T2000 tuning parameters for the Tesla T4	Cedric Nugteren

2021-08-27	Remove Tesla T4 tuning results	Cedric Nugteren

2021-08-19	Add tuning results for NVIDIA Tesla V100	Cedric Nugteren

2021-08-19	Add tuning results for NVIDIA Tesla T4	Cedric Nugteren

2021-08-19	Add tuning results for NVIDIA Quadro T2000	Cedric Nugteren

2021-08-19	Add tuning results for NVIDIA Quadro GV100	Cedric Nugteren

2021-08-19	Add tuning results for Intel Core i9-9980HK	Cedric Nugteren

2021-08-19	Add tuning results for NVIDIA A100	Cedric Nugteren

2021-05-22	Fix issue with printing out-of-bounds local/global sizes for level 1 tuners	Cedric Nugteren

2021-03-13	set the correct flop count for xgemm	JishinMaster

2021-02-05	Fix Windows paths in pyclblast	Cedric Nugteren

2021-02-04	Added second Windows library path	Cedric Nugteren

2021-01-30	Add library path for Windows as well	Cedric Nugteren

2021-01-29	Add library dir on Linux for pyclblast	Cedric Nugteren

2021-01-21	Update pyclblast package version number	Cedric Nugteren

2021-01-20	Use reference types to prevent unnecessary copying	Jerry James

2020-10-10	Add tuning results for TITAN RTX	Cedric Nugteren

2020-10-10	Add tuning results for Radeon RX Vega	Cedric Nugteren

2020-06-07	Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG	Pradeep Garigipati

2020-06-05	Fix Program::GetIR to handle programs with multiple devices	Pradeep Garigipati

2020-05-11	Increase display width of the local/global sizes	Cedric Nugteren

2020-05-10	Made sure that the global workgroup size is a multiple of the local size in ↵	Cedric Nugteren
	the tuners
2020-05-10	Added logging of local/global workgroup sizes when run the tuners	Cedric Nugteren

2020-05-10	Updated PyCLBlast version number	Cedric Nugteren

2020-05-10	Added a sample to demonstrate a batched routine	Cedric Nugteren

2020-05-10	Added pyclblast bindings for the 3 batched routines	Cedric Nugteren

2020-05-03	Move queue creation out of the tuner loop	Cedric Nugteren

2020-03-08	Made it more likely (but no guarantees) for amax/amin to return the first index	Cedric Nugteren

2020-03-08	Silenced a new OpenCL warning message	Cedric Nugteren

2020-02-17	Catches all exceptions of the tuners	Cedric Nugteren

2019-12-09	Reduce TestMatrix calls for xgemmstridedbatched.	Tarmo Räntilä
	Replace the looped test by a single one with the offset of the last batch.
2019-12-09	Reduce TestMatrix calls for xgemmbatched.	Tarmo Räntilä
	Replace the looped test by a single one with the maximal found offset.
2019-09-04	Fix out-of-bounds read/write in XhadFaster	etomzak
	Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.
2019-05-19	Fixed a bug in the absolute-min index kernel	Cedric Nugteren

2019-05-11	Added a function to set the OpenCL kernel standard, either 1.1 or 1.2	Cedric Nugteren

2019-05-08	Changed back to cl_intel_subgroups as suggested	Cedric Nugteren

2019-05-07	Added a host-code check to make sure the avc_motion_estimation is available	Cedric Nugteren

2019-05-07	Enabled avc_motion_estimation extension for Intel subgroup shuffling	Cedric Nugteren