debian-clblast - Debian package for CLBlast.

Age	Commit message (Collapse)	Author
2020-06-07	Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG	Pradeep Garigipati

2020-06-05	Fix Program::GetIR to handle programs with multiple devices	Pradeep Garigipati

2020-03-08	Silenced a new OpenCL warning message	Cedric Nugteren

2019-05-11	Added a function to set the OpenCL kernel standard, either 1.1 or 1.2	Cedric Nugteren

2019-05-03	Remove assert for extention not available in macOS	Umar Arshad
	The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.
2018-08-13	Made last operation in TRSV and TRSM asynchronous, making the events not null	Cedric Nugteren

2018-07-29	Fixed a wrong event issue causing error -57	Cedric Nugteren

2018-07-27	Fixed a bug: forgot to initialize the shared pointer for the null kernel	Cedric Nugteren

2018-07-27	Renamed AMD SI workaround defines	Cedric Nugteren

2018-07-25	Added workaround for weird AMD SI Hainan bug	Cedric Nugteren

2018-07-14	Applied feedback from Cedric from first pull request	Tyler Sorensen

2018-07-11	added inline ptx to support shuffle on Nvidia GPUs	Tyler Sorensen

2018-06-28	Disabled calls to clReleaseProgram under Windows to avoid segfaults when the ↵	Cedric Nugteren
	OpenCL driver unloads first
2018-05-01	Now stores a shared_ptr to the Program class in the cache	Cedric Nugteren

2018-04-26	Fixed an access violation when compiled with Visual Studio upon releasing ↵	Cedric Nugteren
	the OpenCL program
2017-12-30	Added optional temp-buffer argument to C++ interface of GEMM	Cedric Nugteren

2017-12-23	Added defines to disable OpenCL deprecation warnings	Cedric Nugteren

2017-12-09	Made the pre-processor run by default for ARM and Qualcomm GPUs	Cedric Nugteren

2017-11-20	Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated	Cedric Nugteren

2017-11-19	Some fixed for the new auto-tuner to be compatible with the Python scripts	Cedric Nugteren

2017-10-29	Added Android support using the GNU C++ STL library and the GCC toolchain	Cedric Nugteren

2017-10-28	Merge branch 'master' into android_support	Cedric Nugteren

2017-10-17	Made buffers of batched routines read/write (was: read-only)	Cedric Nugteren

2017-10-08	Moved the remaining OpenCL specific host code to the clpp11.h header where ↵	Cedric Nugteren
	it belongs
2017-10-07	Synchronizes clpp11.h with CLCudaAPI 9.0	Cedric Nugteren

2017-09-26	Added missing headers	Cedric Nugteren

2017-09-23	Made database-caching no longer dependent on device name but on ↵	Cedric Nugteren
	device/platform IDs
2017-09-16	Fixed an issue with the NVIDIA compute capability not being retrieved properly	Cedric Nugteren

2017-09-14	Added a guard against missing AMD and NVIDIA extensions	Cedric Nugteren

2017-09-10	Added the new vendor-architecture-name hierarchy to the tuners as well	Cedric Nugteren

2017-09-08	Introduced the notion of a device-architecture for the database and added ↵	Cedric Nugteren
	device and architecture name mappings
2017-04-07	Added a special override database for the Apple CPU implementation on OS X: ↵	Cedric Nugteren
	this makes the test work, it does not focus on good performance
2017-03-08	Make batched routines based on offsets instead of a vector of cl_mem objects ↵	Cedric Nugteren
	- undoing many earlier changes
2017-01-24	Routine, Cache: generalize, reduce amount of copying in fast path	Ivan Shapovalov
	Implement a generalized Cache<K, V>. Two variants are provided: the first one is based on std::map, using C++14-specific transparent std::less<> and generalized std::map::find() to allow searching by tuple of references. The second one is based on std::vector and O(n) lookup, but remains C++11-compliant.
2017-01-24	src/clpp11.hpp: check pointers before clRelease*()	Ivan Shapovalov
	This is to avoid spurious "induced" errors on destruction, if construction failed for some reason.
2017-01-24	src/clpp11.hpp: do not store program source/binary in Program	Ivan Shapovalov
	The stored source/binary does not seem to serve any purpose, yet its presence makes Program a heavy (not pure refcounted) object, which is undesired esp. because it is copied from the cache in the hot path.
2016-11-20	Forced OpenCL 1.1 compilation and disabled a deprecation warning	Cedric Nugteren

2016-10-22	treewide: use C++ exceptions properly	Ivan Shapovalov
	Since the codebase is designed around proper C++ idioms such as RAII, it makes sense to only use C++ exceptions internally instead of mixing exceptions and error codes. The exceptions are now caught at top level to preserve compatibility with the existing error code-based API. Note that we deliberately do not catch C++ runtime errors (such as `std::bad_alloc`) nor logic errors (aka failed assertions) because no actual handling can ever happen for such errors. However, in the C interface we do catch _all_ exceptions (...) and convert them into a wild-card error code.
2016-10-22	src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter	Ivan Shapovalov

2016-10-22	src/clpp11.hpp: GetInfoString: avoid reallocation	Ivan Shapovalov

2016-10-22	src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo()	Ivan Shapovalov

2016-09-27	Updated to version 8.0 of the CLCudaAPI header	Cedric Nugteren

2016-07-22	clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, ↵	Ivan Shapovalov
	support empty LWS
2016-07-22	cl::Kernel: skip NULL entries in waitForEvents	Ivan Shapovalov

2016-07-22	clblast::RunKernel, cl::Kernel: take const vector as waitForEvents	Ivan Shapovalov

2016-07-16	Fixed some more types and type conversions in the clpp11 interface to OpenCL	Cedric Nugteren

2016-07-13	Make sure the passed types are large enough.	Gian-Carlo Pascutto
	Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies.
2016-07-06	Added a VERBOSE mode to debug performance: now prints details about ↵	Cedric Nugteren
	compilation and kernel execution to screen
2016-07-02	Ensure clGetKernelWorkGroupInfo return value fits.	Gian-Carlo Pascutto
	In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account.
2016-07-02	Fixed some memory leaks related to events not properly cleaned-up	Cedric Nugteren