Age | Commit message (Expand) | Author |
2016-09-12 | Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ... | Cedric Nugteren |
2016-09-12 | Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are n... | Cedric Nugteren |
2016-09-11 | Complete re-write of the database script. Changed Pandas for the much faster ... | Cedric Nugteren |
2016-09-10 | Updated database based on exhaustive tuning results for GEMM for the R9 M370X... | Cedric Nugteren |
2016-09-10 | Updated the database script to remove duplicate entries: keeps only the best-... | Cedric Nugteren |
2016-09-06 | Split GEMM tuning in two parts: a small set of tuning parameters which is exp... | Cedric Nugteren |
2016-09-04 | The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ... | Cedric Nugteren |
2016-09-03 | Added tuning results for Intel Broadwell 5500 GT2 GPU | Cedric Nugteren |
2016-09-03 | Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to h... | Cedric Nugteren |
2016-08-27 | test/correctness: read platform and device from environment | Ivan Shapovalov |
2016-08-22 | Merge branch 'database_defaults' into development | Cedric Nugteren |
2016-08-21 | Also changed the default-default for unknown device types to use the same met... | Cedric Nugteren |
2016-08-21 | Increased the ratio of GEMM tuning results to explore; reduced the tuning sea... | Cedric Nugteren |
2016-08-20 | Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvassch... | Cedric Nugteren |
2016-08-18 | Adapt opencl files for 1.1 OpenCL | D. Van Assche |
2016-08-15 | Updated the database script to calculate the relative best performance of tun... | Cedric Nugteren |
2016-07-25 | Removed all old tuning results for the XgemvFastRot kernel; re-added for a co... | Cedric Nugteren |
2016-07-25 | Moved the XgemvFast and XgemvFastRot tuning database into a separate file | Cedric Nugteren |
2016-07-24 | Merge branch 'development' into gemv_performance | Cedric Nugteren |
2016-07-24 | Minor improvements after merging in groundwork for custom tuning parameters a... | Cedric Nugteren |
2016-07-23 | Fixe a bug in the new XgemvFastRot kernel related to local memory size | Cedric Nugteren |
2016-07-23 | Further improvements to the XgemvFastRot kernel, properly enables coalescing now | Cedric Nugteren |
2016-07-23 | Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enab... | Cedric Nugteren |
2016-07-22 | clblast::Database, clblast::Routine: implement "database overlays" provided b... | Ivan Shapovalov |
2016-07-22 | clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, su... | Ivan Shapovalov |
2016-07-22 | cl::Kernel: skip NULL entries in waitForEvents | Ivan Shapovalov |
2016-07-22 | clblast::RunKernel, cl::Kernel: take const vector as waitForEvents | Ivan Shapovalov |
2016-07-22 | xgemm: do not hardcode kernel requirements for internal matrix layout | Ivan Shapovalov |
2016-07-16 | Fixed some more types and type conversions in the clpp11 interface to OpenCL | Cedric Nugteren |
2016-07-16 | Merge pull request #80 from gcp/getdevinfo_fixes | Cedric Nugteren |
2016-07-16 | Removed an unused variable from the copy-transpose-pad function | Cedric Nugteren |
2016-07-13 | Make sure the passed types are large enough. | Gian-Carlo Pascutto |
2016-07-10 | Now passing alpha/beta to the kernel as arguments as before fp16 support; in ... | Cedric Nugteren |
2016-07-10 | Added tuning results for AMD Oland and for Intel Graphics HD 530 | Cedric Nugteren |
2016-07-10 | Fixed a bug related to the cache and retrieval of programs based on the OpenC... | Cedric Nugteren |
2016-07-08 | Cache now compares cl_context instead of a pointer to a context; added verbos... | Cedric Nugteren |
2016-07-06 | Added a VERBOSE mode to debug performance: now prints details about compilati... | Cedric Nugteren |
2016-07-06 | Added an option to the performance clients to do a warm-up run before timing | Cedric Nugteren |
2016-07-03 | Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp) | Cedric Nugteren |
2016-07-02 | Ensure clGetKernelWorkGroupInfo return value fits. | Gian-Carlo Pascutto |
2016-07-02 | Fixed some memory leaks related to events not properly cleaned-up | Cedric Nugteren |
2016-06-30 | Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dll... | Cedric Nugteren |
2016-06-29 | Updated to version 6.0 of the CLCudaAPI header | Cedric Nugteren |
2016-06-28 | Made it possible to build the clients and tests on Windows using Visual Studio | CNugteren |
2016-06-27 | Fixes for the AppVeyor Windows build | Cedric Nugteren |
2016-06-19 | Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' ... | Cedric Nugteren |
2016-06-19 | Renamed all C++ source files to .cpp to match the .hpp extension better | Cedric Nugteren |
2016-06-18 | Moved all headers into the source tree, changed headers to .hpp extension | Cedric Nugteren |
2016-06-18 | Clean-up of the routine class, moved RunKernel to the routine/common file | Cedric Nugteren |
2016-06-18 | Removed the template from the Routine base-class | Cedric Nugteren |