Age | Commit message (Collapse) | Author | |
---|---|---|---|
2017-12-28 | Added interface to compute the required temporary buffer size for GEMM | Cedric Nugteren | |
2017-12-09 | Made the pre-processor run by default for ARM and Qualcomm GPUs | Cedric Nugteren | |
2017-11-30 | Integrated pre-processor in compilation flow, default is still disabled | Cedric Nugteren | |
2017-11-11 | Factored out the creation of the OpenCL header and the program compilation | Cedric Nugteren | |
2017-11-07 | Merge pull request #212 from CNugteren/kernel_selection_tuner | Cedric Nugteren | |
GEMM kernel selection tuner | |||
2017-11-02 | Integrated the GEMM routine tuner for kernel selection; added first tuning ↵ | Cedric Nugteren | |
results | |||
2017-10-29 | Added platform ID to the binary program cache to prevent issues with ↵ | Cedric Nugteren | |
multi-platform systems | |||
2017-10-14 | Added OpenCL to CUDA translation header for the kernels | Cedric Nugteren | |
2017-10-08 | Moved the remaining OpenCL specific host code to the clpp11.h header where ↵ | Cedric Nugteren | |
it belongs | |||
2017-10-07 | Synchronizes clpp11.h with CLCudaAPI 9.0 | Cedric Nugteren | |
2017-09-24 | Updated database override function to work with the new database storage format | Cedric Nugteren | |
2017-09-23 | Made program and binary databases dependent on the routine parameters on top ↵ | Cedric Nugteren | |
of the name | |||
2017-09-23 | Made database-caching no longer dependent on device name but on ↵ | Cedric Nugteren | |
device/platform IDs | |||
2017-09-06 | Split the database files over multiple directories and files; first step ↵ | Cedric Nugteren | |
towards separate compilation | |||
2017-07-08 | Made the inline keyword in kernels optional currently only enabled for ↵ | Cedric Nugteren | |
NVIDIA and ARM GPUs | |||
2017-05-26 | Fixes inability to run GEMM on multiple identical GPUs (issue #155) | Kirill Mavreshko | |
2017-04-10 | Removed const-vector-of-const-objects from the database class to remain ↵ | Cedric Nugteren | |
according to the C++11 standard | |||
2017-02-26 | Merge branch 'development' into triangular_solvers | Cedric Nugteren | |
2017-02-13 | Added first version of the OverrideParameters function | Cedric Nugteren | |
2017-02-12 | Split the database into several smaller cached per-kernel databases (in ↵ | Cedric Nugteren | |
preparation of per-kernel database overrides) | |||
2017-01-24 | Database: pass Device instead of Queue for clarity | Ivan Shapovalov | |
2017-01-24 | Routine: cache the database instance as well | Ivan Shapovalov | |
This does not change much, but will become useful in next commits when plugin support is introduced. | |||
2017-01-24 | Routine, Cache: generalize, reduce amount of copying in fast path | Ivan Shapovalov | |
Implement a generalized Cache<K, V>. Two variants are provided: the first one is based on std::map, using C++14-specific transparent std::less<> and generalized std::map::find() to allow searching by tuple of references. The second one is based on std::vector and O(n) lookup, but remains C++11-compliant. | |||
2017-01-24 | Routine: fix semi-warm routine construction (when binary is in cache) | Ivan Shapovalov | |
There was a missing return statement in the semi-warm path that made CLBlast to continue to cold path after a cache hit. | |||
2017-01-20 | Routine: use PrecisionSupported<>() instead of duplicating the check | Ivan Shapovalov | |
2016-10-22 | Routine: get rid of ::SetUp() | Ivan Shapovalov | |
Since we now use C++ exceptions inside the implementation (and exceptions can be thrown from constructors), there is no need for a separate Routine::SetUp() function. For this, we also change the way how the kernel source string is constructed. The kernel-specific source code is now passed to the Routine ctor via an initializer_list of C strings to avoid unnecessary data copying while also working around C1091 of MSVC 2013. | |||
2016-10-22 | treewide: use C++ exceptions properly | Ivan Shapovalov | |
Since the codebase is designed around proper C++ idioms such as RAII, it makes sense to only use C++ exceptions internally instead of mixing exceptions and error codes. The exceptions are now caught at top level to preserve compatibility with the existing error code-based API. Note that we deliberately do not catch C++ runtime errors (such as `std::bad_alloc`) nor logic errors (aka failed assertions) because no actual handling can ever happen for such errors. However, in the C interface we do catch _all_ exceptions (...) and convert them into a wild-card error code. | |||
2016-10-14 | Fixed an issue with a growing database: the database is now a global ↵ | Cedric Nugteren | |
variable in a namespace and its container uses const-pointers to the actual data | |||
2016-09-21 | It is now possible to set the OpenCL compiler options through an ↵ | Cedric Nugteren | |
environmental variable | |||
2016-07-22 | clblast::Database, clblast::Routine: implement "database overlays" provided ↵ | Ivan Shapovalov | |
by routine implementation | |||
2016-07-06 | Added a VERBOSE mode to debug performance: now prints details about ↵ | Cedric Nugteren | |
compilation and kernel execution to screen | |||
2016-06-19 | Renamed all C++ source files to .cpp to match the .hpp extension better | Cedric Nugteren | |