Age | Commit message (Collapse) | Author | |
---|---|---|---|
2018-07-14 | Applied feedback from Cedric from first pull request | Tyler Sorensen | |
2018-07-11 | added inline ptx to support shuffle on Nvidia GPUs | Tyler Sorensen | |
2018-06-28 | Disabled calls to clReleaseProgram under Windows to avoid segfaults when the ↵ | Cedric Nugteren | |
OpenCL driver unloads first | |||
2018-05-01 | Now stores a shared_ptr to the Program class in the cache | Cedric Nugteren | |
2018-04-26 | Fixed an access violation when compiled with Visual Studio upon releasing ↵ | Cedric Nugteren | |
the OpenCL program | |||
2017-12-30 | Added optional temp-buffer argument to C++ interface of GEMM | Cedric Nugteren | |
2017-12-23 | Added defines to disable OpenCL deprecation warnings | Cedric Nugteren | |
2017-12-09 | Made the pre-processor run by default for ARM and Qualcomm GPUs | Cedric Nugteren | |
2017-11-20 | Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated | Cedric Nugteren | |
2017-11-19 | Some fixed for the new auto-tuner to be compatible with the Python scripts | Cedric Nugteren | |
2017-10-29 | Added Android support using the GNU C++ STL library and the GCC toolchain | Cedric Nugteren | |
2017-10-28 | Merge branch 'master' into android_support | Cedric Nugteren | |
2017-10-17 | Made buffers of batched routines read/write (was: read-only) | Cedric Nugteren | |
2017-10-08 | Moved the remaining OpenCL specific host code to the clpp11.h header where ↵ | Cedric Nugteren | |
it belongs | |||
2017-10-07 | Synchronizes clpp11.h with CLCudaAPI 9.0 | Cedric Nugteren | |
2017-09-26 | Added missing headers | Cedric Nugteren | |
2017-09-23 | Made database-caching no longer dependent on device name but on ↵ | Cedric Nugteren | |
device/platform IDs | |||
2017-09-16 | Fixed an issue with the NVIDIA compute capability not being retrieved properly | Cedric Nugteren | |
2017-09-14 | Added a guard against missing AMD and NVIDIA extensions | Cedric Nugteren | |
2017-09-10 | Added the new vendor-architecture-name hierarchy to the tuners as well | Cedric Nugteren | |
2017-09-08 | Introduced the notion of a device-architecture for the database and added ↵ | Cedric Nugteren | |
device and architecture name mappings | |||
2017-04-07 | Added a special override database for the Apple CPU implementation on OS X: ↵ | Cedric Nugteren | |
this makes the test work, it does not focus on good performance | |||
2017-03-08 | Make batched routines based on offsets instead of a vector of cl_mem objects ↵ | Cedric Nugteren | |
- undoing many earlier changes | |||
2017-01-24 | Routine, Cache: generalize, reduce amount of copying in fast path | Ivan Shapovalov | |
Implement a generalized Cache<K, V>. Two variants are provided: the first one is based on std::map, using C++14-specific transparent std::less<> and generalized std::map::find() to allow searching by tuple of references. The second one is based on std::vector and O(n) lookup, but remains C++11-compliant. | |||
2017-01-24 | src/clpp11.hpp: check pointers before clRelease*() | Ivan Shapovalov | |
This is to avoid spurious "induced" errors on destruction, if construction failed for some reason. | |||
2017-01-24 | src/clpp11.hpp: do not store program source/binary in Program | Ivan Shapovalov | |
The stored source/binary does not seem to serve any purpose, yet its presence makes Program a heavy (not pure refcounted) object, which is undesired esp. because it is copied from the cache in the hot path. | |||
2016-11-20 | Forced OpenCL 1.1 compilation and disabled a deprecation warning | Cedric Nugteren | |
2016-10-22 | treewide: use C++ exceptions properly | Ivan Shapovalov | |
Since the codebase is designed around proper C++ idioms such as RAII, it makes sense to only use C++ exceptions internally instead of mixing exceptions and error codes. The exceptions are now caught at top level to preserve compatibility with the existing error code-based API. Note that we deliberately do not catch C++ runtime errors (such as `std::bad_alloc`) nor logic errors (aka failed assertions) because no actual handling can ever happen for such errors. However, in the C interface we do catch _all_ exceptions (...) and convert them into a wild-card error code. | |||
2016-10-22 | src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter | Ivan Shapovalov | |
2016-10-22 | src/clpp11.hpp: GetInfoString: avoid reallocation | Ivan Shapovalov | |
2016-10-22 | src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo() | Ivan Shapovalov | |
2016-09-27 | Updated to version 8.0 of the CLCudaAPI header | Cedric Nugteren | |
2016-07-22 | clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, ↵ | Ivan Shapovalov | |
support empty LWS | |||
2016-07-22 | cl::Kernel: skip NULL entries in waitForEvents | Ivan Shapovalov | |
2016-07-22 | clblast::RunKernel, cl::Kernel: take const vector as waitForEvents | Ivan Shapovalov | |
2016-07-16 | Fixed some more types and type conversions in the clpp11 interface to OpenCL | Cedric Nugteren | |
2016-07-13 | Make sure the passed types are large enough. | Gian-Carlo Pascutto | |
Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies. | |||
2016-07-06 | Added a VERBOSE mode to debug performance: now prints details about ↵ | Cedric Nugteren | |
compilation and kernel execution to screen | |||
2016-07-02 | Ensure clGetKernelWorkGroupInfo return value fits. | Gian-Carlo Pascutto | |
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account. | |||
2016-07-02 | Fixed some memory leaks related to events not properly cleaned-up | Cedric Nugteren | |
2016-06-29 | Updated to version 6.0 of the CLCudaAPI header | Cedric Nugteren | |
2016-06-18 | Moved all headers into the source tree, changed headers to .hpp extension | Cedric Nugteren | |