debian-clblast - Debian package for CLBlast.

Age	Commit message (Collapse)	Author
2016-07-25	Moved the XgemvFast and XgemvFastRot tuning database into a separate file	Cedric Nugteren

2016-07-24	Merge branch 'development' into gemv_performance	Cedric Nugteren

2016-07-24	Minor improvements after merging in groundwork for custom tuning parameters ↵	Cedric Nugteren
	and kernels
2016-07-23	Fixe a bug in the new XgemvFastRot kernel related to local memory size	Cedric Nugteren

2016-07-23	Further improvements to the XgemvFastRot kernel, properly enables coalescing now	Cedric Nugteren

2016-07-23	Improved the XgemvFastRot kernel by tiled loading of the input matrix A, ↵	Cedric Nugteren
	enabling better memory performance
2016-07-22	clblast::Database, clblast::Routine: implement "database overlays" provided ↵	Ivan Shapovalov
	by routine implementation
2016-07-22	clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, ↵	Ivan Shapovalov
	support empty LWS
2016-07-22	cl::Kernel: skip NULL entries in waitForEvents	Ivan Shapovalov

2016-07-22	clblast::RunKernel, cl::Kernel: take const vector as waitForEvents	Ivan Shapovalov

2016-07-22	xgemm: do not hardcode kernel requirements for internal matrix layout	Ivan Shapovalov
	Do not hardcode the knowledge about "A and C col-major, B row-major". This allows for easier reuse of the DoGemm() routine with different kernels.
2016-07-17	Improved the GEMM direct kernel by adding register blocking. Still not fast ↵	Cedric Nugteren
	though
2016-07-16	Created infrastructure to support a direct GEMM kernel; added correct but ↵	Cedric Nugteren
	slow reference kernel as a place-holder
2016-07-16	Fixed some more types and type conversions in the clpp11 interface to OpenCL	Cedric Nugteren

2016-07-16	Merge pull request #80 from gcp/getdevinfo_fixes	Cedric Nugteren
	Make sure the passed types are large enough.
2016-07-16	Removed an unused variable from the copy-transpose-pad function	Cedric Nugteren

2016-07-13	Make sure the passed types are large enough.	Gian-Carlo Pascutto
	Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies.
2016-07-10	Now passing alpha/beta to the kernel as arguments as before fp16 support; in ↵	Cedric Nugteren
	case of fp16 arguments are cast on host and in kernel
2016-07-10	Added tuning results for AMD Oland and for Intel Graphics HD 530	Cedric Nugteren

2016-07-10	Fixed a bug related to the cache and retrieval of programs based on the ↵	Cedric Nugteren
	OpenCL context
2016-07-08	Cache now compares cl_context instead of a pointer to a context; added ↵	Cedric Nugteren
	verbose print statements to the cache
2016-07-06	Added a VERBOSE mode to debug performance: now prints details about ↵	Cedric Nugteren
	compilation and kernel execution to screen
2016-07-06	Added an option to the performance clients to do a warm-up run before timing	Cedric Nugteren

2016-07-03	Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)	Cedric Nugteren

2016-07-02	Ensure clGetKernelWorkGroupInfo return value fits.	Gian-Carlo Pascutto
	In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account.
2016-07-02	Fixed some memory leaks related to events not properly cleaned-up	Cedric Nugteren

2016-06-30	Added declspec(dllexport) to ClearCache and FillCache, and added ↵	Cedric Nugteren
	declspec(dllimport) when not building the library
2016-06-29	Updated to version 6.0 of the CLCudaAPI header	Cedric Nugteren

2016-06-28	Made it possible to build the clients and tests on Windows using Visual Studio	CNugteren

2016-06-27	Fixes for the AppVeyor Windows build	Cedric Nugteren

2016-06-19	Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' ↵	Cedric Nugteren
	(thanks to OursDesCavernes)
2016-06-19	Renamed all C++ source files to .cpp to match the .hpp extension better	Cedric Nugteren

2016-06-18	Moved all headers into the source tree, changed headers to .hpp extension	Cedric Nugteren

2016-06-18	Clean-up of the routine class, moved RunKernel to the routine/common file	Cedric Nugteren

2016-06-18	Removed the template from the Routine base-class	Cedric Nugteren

2016-06-17	Removed the precision argument from the routines in favor of a single ↵	Cedric Nugteren
	templated function
2016-06-17	Removed the interface to the cache functions from the Routine class, calls ↵	Cedric Nugteren
	them directly now
2016-06-17	Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine ↵	Cedric Nugteren
	class
2016-06-17	Moved the test-for-valid-buffers function from the Routine class to separate ↵	Cedric Nugteren
	functions in a separate file
2016-06-16	Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, ↵	Cedric Nugteren
	and/or transposing
2016-06-15	Added some constness to variables related to the GEMM routines	Cedric Nugteren

2016-06-14	Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) ↵	Cedric Nugteren
	and renamed files and functions appropriately
2016-06-14	Moved device vendor and type checks to a common header	Cedric Nugteren

2016-06-14	Added support for FP16 on ARM Mali-T628 (officially not supported)	Cedric Nugteren

2016-06-08	Added global memory synchronisation for better cache performance on ARM Mali ↵	Cedric Nugteren
	GPUs
2016-05-26	Added half-precision tests for the clBLAS reference through conversion to ↵	Cedric Nugteren
	single-precision
2016-05-25	Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM	Cedric Nugteren

2016-05-24	Added proper argument handling and displaying for half-precision data-types	Cedric Nugteren

2016-05-22	Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2	Cedric Nugteren

2016-05-22	Fixed tuning results for half-precision; added first results for the xGER ↵	Cedric Nugteren
	kernels