Age | Commit message (Expand) | Author |
2019-12-09 | Reduce TestMatrix calls for xgemmstridedbatched. | Tarmo Räntilä |
2019-12-09 | Reduce TestMatrix calls for xgemmbatched. | Tarmo Räntilä |
2019-05-11 | Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 | Cedric Nugteren |
2019-05-08 | Changed back to cl_intel_subgroups as suggested | Cedric Nugteren |
2019-05-07 | Added a host-code check to make sure the avc_motion_estimation is available | Cedric Nugteren |
2018-11-12 | Add kernel_mode option to im2col, col2im, and convgemm functions | Koichi Akabe |
2018-10-30 | Fix col2im implementation | Koichi Akabe |
2018-09-16 | Merge branch 'master' into convgemm_multi_kernel | Cedric Nugteren |
2018-09-15 | Disabled Intel subgroup shuffling for double-precision | Cedric Nugteren |
2018-07-29 | Merge branch 'master' into CLBlast-267-convgemm | Cedric Nugteren |
2018-07-23 | Merge pull request #297 from tyler-utah/master | Cedric Nugteren |
2018-07-14 | Applied feedback from Cedric from first pull request | Tyler Sorensen |
2018-07-13 | Added device-name removal code to handle POCL naming convention | Cedric Nugteren |
2018-07-11 | added inline ptx to support shuffle on Nvidia GPUs | Tyler Sorensen |
2018-06-03 | Merge branch 'master' into CLBlast-267-convgemm | Cedric Nugteren |
2018-05-23 | Added an option in the clients to output timing statistics: minimum, mean, an... | Cedric Nugteren |
2018-05-19 | Merge branch 'master' into CLBlast-267-convgemm | Cedric Nugteren |
2018-05-18 | Merge branch 'master' into canary_buffer_overflow_protection | Cedric Nugteren |
2018-05-17 | Added a canary region for overflow detection to the tuners | Cedric Nugteren |
2018-05-06 | Added convgemm skeleton, test infrastructure, and first reference implementation | Cedric Nugteren |
2018-05-01 | Now stores a shared_ptr to the Program class in the cache | Cedric Nugteren |
2018-04-24 | Added a define to enable subgroup shuffling if supported by the device | Cedric Nugteren |
2018-03-06 | First version of the tuning API, added interface for copy-kernel, added sample | Cedric Nugteren |
2018-02-11 | Fixed a minor typo | Cedric Nugteren |
2017-12-24 | Fixes for the CUDA backend of CLBlast | Cedric Nugteren |
2017-12-23 | Added TRSV block-size tuner | Cedric Nugteren |
2017-12-17 | Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results... | Cedric Nugteren |
2017-12-10 | Fixed a missing include | Cedric Nugteren |
2017-12-09 | Made the pre-processor run by default for ARM and Qualcomm GPUs | Cedric Nugteren |
2017-11-30 | Integrated pre-processor in compilation flow, default is still disabled | Cedric Nugteren |
2017-11-25 | Moved string splitting functions; added string character removal function | Cedric Nugteren |
2017-11-22 | Made parameter override in the clients a command-line argument and added supp... | Cedric Nugteren |
2017-11-19 | Added compilation timing and better compilation error reporting | Cedric Nugteren |
2017-11-19 | Revived the GEMM routine tuner; minor formatting changes | Cedric Nugteren |
2017-11-17 | Moved compilation function to separate file; removed dependency of tuners of ... | Cedric Nugteren |
2017-11-15 | Added first version of integrated and re-written auto-tuner | Cedric Nugteren |
2017-11-15 | Added kernel timing functionality to the utilities | Cedric Nugteren |
2017-11-15 | Added exception handle with catch-all | Cedric Nugteren |
2017-11-13 | Made the exception dispatch function optionally silent | Cedric Nugteren |
2017-11-13 | Moved square-difference utility function for use in the tuners | Cedric Nugteren |
2017-11-07 | Merge pull request #212 from CNugteren/kernel_selection_tuner | Cedric Nugteren |
2017-11-02 | Integrated the GEMM routine tuner for kernel selection; added first tuning re... | Cedric Nugteren |
2017-10-30 | Added collecting and printing of scores for the kernel-selection tuner | Cedric Nugteren |
2017-10-29 | Added Android support using the GNU C++ STL library and the GCC toolchain | Cedric Nugteren |
2017-10-28 | Merge branch 'master' into android_support | Cedric Nugteren |
2017-10-28 | Added initial version of a GEMM kernel selection tuner | Cedric Nugteren |
2017-10-28 | Moved timing function to a separate file | Cedric Nugteren |
2017-10-15 | Various fixes to make the first CUDA examples work | Cedric Nugteren |
2017-10-12 | CUDA API now takes context and device in instead of stream | Cedric Nugteren |
2017-10-11 | Added first (untested) version of a CUDA API | Cedric Nugteren |