summaryrefslogtreecommitdiff
path: root/src/clpp11.hpp
AgeCommit message (Collapse)Author
2020-06-07Add a cautionary note in Program::GetIR and mention the fix in CHANGELOGPradeep Garigipati
2020-06-05Fix Program::GetIR to handle programs with multiple devicesPradeep Garigipati
2020-03-08Silenced a new OpenCL warning messageCedric Nugteren
2019-05-11Added a function to set the OpenCL kernel standard, either 1.1 or 1.2Cedric Nugteren
2019-05-03Remove assert for extention not available in macOSUmar Arshad
The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.
2018-08-13Made last operation in TRSV and TRSM asynchronous, making the events not nullCedric Nugteren
2018-07-29Fixed a wrong event issue causing error -57Cedric Nugteren
2018-07-27Fixed a bug: forgot to initialize the shared pointer for the null kernelCedric Nugteren
2018-07-27Renamed AMD SI workaround definesCedric Nugteren
2018-07-25Added workaround for weird AMD SI Hainan bugCedric Nugteren
2018-07-14Applied feedback from Cedric from first pull requestTyler Sorensen
2018-07-11added inline ptx to support shuffle on Nvidia GPUsTyler Sorensen
2018-06-28Disabled calls to clReleaseProgram under Windows to avoid segfaults when the ↵Cedric Nugteren
OpenCL driver unloads first
2018-05-01Now stores a shared_ptr to the Program class in the cacheCedric Nugteren
2018-04-26Fixed an access violation when compiled with Visual Studio upon releasing ↵Cedric Nugteren
the OpenCL program
2017-12-30Added optional temp-buffer argument to C++ interface of GEMMCedric Nugteren
2017-12-23Added defines to disable OpenCL deprecation warningsCedric Nugteren
2017-12-09Made the pre-processor run by default for ARM and Qualcomm GPUsCedric Nugteren
2017-11-20Potentially fixed an MSVC 2013 issue with a copy-constructor not being generatedCedric Nugteren
2017-11-19Some fixed for the new auto-tuner to be compatible with the Python scriptsCedric Nugteren
2017-10-29Added Android support using the GNU C++ STL library and the GCC toolchainCedric Nugteren
2017-10-28Merge branch 'master' into android_supportCedric Nugteren
2017-10-17Made buffers of batched routines read/write (was: read-only)Cedric Nugteren
2017-10-08Moved the remaining OpenCL specific host code to the clpp11.h header where ↵Cedric Nugteren
it belongs
2017-10-07Synchronizes clpp11.h with CLCudaAPI 9.0Cedric Nugteren
2017-09-26Added missing headersCedric Nugteren
2017-09-23Made database-caching no longer dependent on device name but on ↵Cedric Nugteren
device/platform IDs
2017-09-16Fixed an issue with the NVIDIA compute capability not being retrieved properlyCedric Nugteren
2017-09-14Added a guard against missing AMD and NVIDIA extensionsCedric Nugteren
2017-09-10Added the new vendor-architecture-name hierarchy to the tuners as wellCedric Nugteren
2017-09-08Introduced the notion of a device-architecture for the database and added ↵Cedric Nugteren
device and architecture name mappings
2017-04-07Added a special override database for the Apple CPU implementation on OS X: ↵Cedric Nugteren
this makes the test work, it does not focus on good performance
2017-03-08Make batched routines based on offsets instead of a vector of cl_mem objects ↵Cedric Nugteren
- undoing many earlier changes
2017-01-24Routine, Cache: generalize, reduce amount of copying in fast pathIvan Shapovalov
Implement a generalized Cache<K, V>. Two variants are provided: the first one is based on std::map, using C++14-specific transparent std::less<> and generalized std::map::find() to allow searching by tuple of references. The second one is based on std::vector and O(n) lookup, but remains C++11-compliant.
2017-01-24src/clpp11.hpp: check pointers before clRelease*()Ivan Shapovalov
This is to avoid spurious "induced" errors on destruction, if construction failed for some reason.
2017-01-24src/clpp11.hpp: do not store program source/binary in ProgramIvan Shapovalov
The stored source/binary does not seem to serve any purpose, yet its presence makes Program a heavy (not pure refcounted) object, which is undesired esp. because it is copied from the cache in the hot path.
2016-11-20Forced OpenCL 1.1 compilation and disabled a deprecation warningCedric Nugteren
2016-10-22treewide: use C++ exceptions properlyIvan Shapovalov
Since the codebase is designed around proper C++ idioms such as RAII, it makes sense to only use C++ exceptions internally instead of mixing exceptions and error codes. The exceptions are now caught at top level to preserve compatibility with the existing error code-based API. Note that we deliberately do not catch C++ runtime errors (such as `std::bad_alloc`) nor logic errors (aka failed assertions) because no actual handling can ever happen for such errors. However, in the C interface we do catch _all_ exceptions (...) and convert them into a wild-card error code.
2016-10-22src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's DeleterIvan Shapovalov
2016-10-22src/clpp11.hpp: GetInfoString: avoid reallocationIvan Shapovalov
2016-10-22src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo()Ivan Shapovalov
2016-09-27Updated to version 8.0 of the CLCudaAPI headerCedric Nugteren
2016-07-22clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, ↵Ivan Shapovalov
support empty LWS
2016-07-22cl::Kernel: skip NULL entries in waitForEventsIvan Shapovalov
2016-07-22clblast::RunKernel, cl::Kernel: take const vector as waitForEventsIvan Shapovalov
2016-07-16Fixed some more types and type conversions in the clpp11 interface to OpenCLCedric Nugteren
2016-07-13Make sure the passed types are large enough.Gian-Carlo Pascutto
Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies.
2016-07-06Added a VERBOSE mode to debug performance: now prints details about ↵Cedric Nugteren
compilation and kernel execution to screen
2016-07-02Ensure clGetKernelWorkGroupInfo return value fits.Gian-Carlo Pascutto
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account.
2016-07-02Fixed some memory leaks related to events not properly cleaned-upCedric Nugteren