Age | Commit message (Collapse) | Author | |
---|---|---|---|
2016-10-22 | Routine: get rid of ::SetUp() | Ivan Shapovalov | |
Since we now use C++ exceptions inside the implementation (and exceptions can be thrown from constructors), there is no need for a separate Routine::SetUp() function. For this, we also change the way how the kernel source string is constructed. The kernel-specific source code is now passed to the Routine ctor via an initializer_list of C strings to avoid unnecessary data copying while also working around C1091 of MSVC 2013. | |||
2016-10-22 | treewide: use C++ exceptions properly | Ivan Shapovalov | |
Since the codebase is designed around proper C++ idioms such as RAII, it makes sense to only use C++ exceptions internally instead of mixing exceptions and error codes. The exceptions are now caught at top level to preserve compatibility with the existing error code-based API. Note that we deliberately do not catch C++ runtime errors (such as `std::bad_alloc`) nor logic errors (aka failed assertions) because no actual handling can ever happen for such errors. However, in the C interface we do catch _all_ exceptions (...) and convert them into a wild-card error code. | |||
2016-10-22 | src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter | Ivan Shapovalov | |
2016-10-22 | src/clpp11.hpp: GetInfoString: avoid reallocation | Ivan Shapovalov | |
2016-10-22 | src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo() | Ivan Shapovalov | |
2016-10-14 | Fixed an issue with a growing database: the database is now a global ↵ | Cedric Nugteren | |
variable in a namespace and its container uses const-pointers to the actual data | |||
2016-10-13 | Added tuning results for Intel HD Graphics IvyBridge GPU | Cedric Nugteren | |
2016-10-12 | Removed a spurious #ifdef | Cedric Nugteren | |
2016-10-12 | Fixed missing line ending | Cedric Nugteren | |
2016-10-10 | Added support for compiling the library, the client, and the samples under ↵ | Cedric Nugteren | |
MSVC 2013 | |||
2016-10-10 | Fixed an issue with const members of structs in the database | Cedric Nugteren | |
2016-10-10 | Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 ↵ | Cedric Nugteren | |
and 2015 | |||
2016-10-10 | First fixes towards compilation on Visual Studio 2013 | Cedric Nugteren | |
2016-10-10 | Updated the tuning results for the GTX 750 Ti GPU | Cedric Nugteren | |
2016-10-10 | Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and ↵ | Cedric Nugteren | |
Intel GPUs | |||
2016-10-08 | Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results | Cedric Nugteren | |
2016-10-06 | Added first tuning results for the single-kernel direct GEMM implementation | Cedric Nugteren | |
2016-10-06 | Added a kernel selection database to select between the direct and indirect ↵ | Cedric Nugteren | |
GEMM kernels | |||
2016-10-03 | Fixed a const-correctness issue with complex conjugation in the GEMM direct ↵ | Cedric Nugteren | |
kernel | |||
2016-10-03 | Added functions to load from off-chip to local memory without vector loads ↵ | Cedric Nugteren | |
for the GEMM direct kernels | |||
2016-10-03 | Re-organised GEMM direct kernel and added faster fall-back version for ↵ | Cedric Nugteren | |
incomplete rectangles | |||
2016-10-02 | Set the default number of runs for all kernels to at least 2 runs | Cedric Nugteren | |
2016-10-02 | Specialised the GEMM direct kernel in four ways for ↵ | Cedric Nugteren | |
transposing/non-transposing: NN, NT, TN, TT | |||
2016-10-02 | Split the GEMM direct kernel into two files; set the default tuning target ↵ | Cedric Nugteren | |
to 256-256-256 | |||
2016-10-01 | Added padding to the local memory of the GEMM direct kernel | Cedric Nugteren | |
2016-10-01 | Added default num-runs to the tuner adding averaging over 10 runs as a ↵ | Cedric Nugteren | |
default for the GEMM direct kernel | |||
2016-10-01 | Merge branch 'development' into gemm_direct | Cedric Nugteren | |
2016-09-27 | Added an option to run tuned kernels multiple times to average execution ↵ | Cedric Nugteren | |
times; requires CLTune 2.5.0 | |||
2016-09-27 | Updated to version 8.0 of the CLCudaAPI header | Cedric Nugteren | |
2016-09-27 | Fixed the local memory size computation for the GEMM tuners | Cedric Nugteren | |
2016-09-27 | Now generates test/client/tuner data using a fixed seed to enable ↵ | Cedric Nugteren | |
reproducability of results | |||
2016-09-25 | Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, ↵ | Cedric Nugteren | |
NWGD and KWGD into one WGD parameter | |||
2016-09-25 | Separated the tuning parameters of the new direct GEMM kernel from the ↵ | Cedric Nugteren | |
indirect version | |||
2016-09-25 | Added a first version of the direct version of GEMM with local memory | Cedric Nugteren | |
2016-09-21 | Merge branch 'development' into gemm_direct | Cedric Nugteren | |
2016-09-21 | It is now possible to set the OpenCL compiler options through an ↵ | Cedric Nugteren | |
environmental variable | |||
2016-09-12 | Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ↵ | Cedric Nugteren | |
can't handle long strings | |||
2016-09-12 | Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are ↵ | Cedric Nugteren | |
now automatically taken from 32-bit if there are no entries at all | |||
2016-09-11 | Complete re-write of the database script. Changed Pandas for the much faster ↵ | Cedric Nugteren | |
and convienient plain JSON/dict data-type | |||
2016-09-10 | Updated database based on exhaustive tuning results for GEMM for the R9 ↵ | Cedric Nugteren | |
M370X GPU | |||
2016-09-10 | Updated the database script to remove duplicate entries: keeps only the ↵ | Cedric Nugteren | |
best-performing cases for a specific parameters combination | |||
2016-09-06 | Split GEMM tuning in two parts: a small set of tuning parameters which is ↵ | Cedric Nugteren | |
explored exhaustively and a larger set which is explored randomly | |||
2016-09-04 | The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ↵ | Cedric Nugteren | |
problems if C contains NaNs | |||
2016-09-03 | Added tuning results for Intel Broadwell 5500 GT2 GPU | Cedric Nugteren | |
2016-09-03 | Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to ↵ | Cedric Nugteren | |
handle duplicate entries of different runs | |||
2016-08-27 | test/correctness: read platform and device from environment | Ivan Shapovalov | |
Support passing environment variables CLBLAST_PLATFORM and CLBLAST_DEVICE instead of -platform and -device arguments to test executables. This is for `ctest`. | |||
2016-08-22 | Merge branch 'database_defaults' into development | Cedric Nugteren | |
2016-08-21 | Also changed the default-default for unknown device types to use the same ↵ | Cedric Nugteren | |
method as for known device groups | |||
2016-08-21 | Increased the ratio of GEMM tuning results to explore; reduced the tuning ↵ | Cedric Nugteren | |
search space to have a better chance to evaluate more likely parameter combinations | |||
2016-08-20 | Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into ↵ | Cedric Nugteren | |
dvasschemacq-master Conflicts: src/kernels/level1/xaxpy.opencl src/kernels/level2/xgemv.opencl src/kernels/level2/xgemv_fast.opencl src/kernels/level2/xger.opencl src/kernels/level2/xher.opencl src/kernels/level2/xher2.opencl src/kernels/level3/xgemm_part2.opencl |