Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-02-17 | Catches all exceptions of the tuners | Cedric Nugteren | |
2019-12-15 | Merge pull request #372 from trantila/master | Cedric Nugteren | |
Reduced number of TestMatrix calls for the batched xgemm routines. | |||
2019-12-09 | Reduce TestMatrix calls for xgemmstridedbatched. | Tarmo Räntilä | |
Replace the looped test by a single one with the offset of the last batch. | |||
2019-12-09 | Reduce TestMatrix calls for xgemmbatched. | Tarmo Räntilä | |
Replace the looped test by a single one with the maximal found offset. | |||
2019-09-06 | Added notion of fixes in XhadFaster | Cedric Nugteren | |
2019-09-06 | Merge pull request #368 from etomzak/master | Cedric Nugteren | |
Fix out-of-bounds read/write in XhadFaster | |||
2019-09-04 | Fix out-of-bounds read/write in XhadFaster | etomzak | |
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd. | |||
2019-05-19 | Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin | Cedric Nugteren | |
Fixed a bug in the absolute-min index kernel | |||
2019-05-19 | Fixed a bug in the absolute-min index kernel | Cedric Nugteren | |
2019-05-16 | Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix | Cedric Nugteren | |
intel shuffle extension fix | |||
2019-05-11 | Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 | Cedric Nugteren | |
2019-05-08 | Changed back to cl_intel_subgroups as suggested | Cedric Nugteren | |
2019-05-07 | Added a host-code check to make sure the avc_motion_estimation is available | Cedric Nugteren | |
2019-05-07 | Enabled avc_motion_estimation extension for Intel subgroup shuffling | Cedric Nugteren | |
2019-05-06 | Merge pull request #356 from umar456/osx_assert | Cedric Nugteren | |
Remove assert for extention not available in macOS | |||
2019-05-03 | Remove assert for extention not available in macOS | Umar Arshad | |
The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime. | |||
2019-02-09 | Added tuning parameters for Tesla P100 16GB | Cedric Nugteren | |
2019-02-09 | Added tuning parameters for Xeon E5-2630 v3 and v4 | Cedric Nugteren | |
2019-01-26 | Merge pull request #348 from ↵ | Cedric Nugteren | |
CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support | |||
2019-01-23 | Added fp32 to fp16 conversion function in Python to make haxpy example work | Cedric Nugteren | |
2019-01-22 | Added a (non-working) sample of half precision AXPY in Python | Cedric Nugteren | |
2019-01-22 | Updated pyclblast README, updated to 1.2.0 for half-precision support | Cedric Nugteren | |
2019-01-22 | Added experimental support for half-precision in pyclblast | Cedric Nugteren | |
2019-01-19 | Merge pull request #345 from CNugteren/convolution-fixes-and-tuner | Cedric Nugteren | |
Convolution with single kernel | |||
2019-01-19 | Added documentation on the convgemm routine | Cedric Nugteren | |
2019-01-19 | Added a few more initial Intel tuning parameters for convgemm | Cedric Nugteren | |
2019-01-05 | Added a check to prevent the stride of matrix C being set to 0 for the ↵ | Cedric Nugteren | |
strided-batched-GEMM routine | |||
2018-12-31 | Added convgemm to the CLBlast database, added initial parameters for Skylake GPU | Cedric Nugteren | |
2018-12-31 | Added support for the convgemm tuner in the tuner database | Cedric Nugteren | |
2018-12-31 | Added the forgotten batch dimension to the tuner to get correct kernel ↵ | Cedric Nugteren | |
executions | |||
2018-12-23 | Merge pull request #343 from vbkaisetsu/feature/convgemm-single | Cedric Nugteren | |
Fix single kernel version of convgemm | |||
2018-12-22 | Merge branch 'master' into convolution-fixes-and-tuner | Cedric Nugteren | |
2018-12-21 | Update changelog | Koichi Akabe | |
2018-12-18 | Update the documentation | Koichi Akabe | |
2018-12-18 | Fix the xconvgemm tuner | Koichi Akabe | |
2018-12-18 | Added first version of a tuner for the ConvGemm direct kernel | Cedric Nugteren | |
2018-12-18 | Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel | Koichi Akabe | |
2018-12-17 | Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests | Cedric Nugteren | |
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm | |||
2018-12-17 | Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm | Koichi Akabe | |
2018-12-04 | Updated to version 1.5.0 | Cedric Nugteren | |
2018-12-01 | Updated the roadmap document | Cedric Nugteren | |
2018-12-01 | Added a FAQ document | Cedric Nugteren | |
2018-12-01 | Merge pull request #341 from ↵ | Cedric Nugteren | |
CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG Fixed an issue for the GEMMK == 1 kernel | |||
2018-11-30 | Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel | Cedric Nugteren | |
2018-11-19 | Merge pull request #335 from vbkaisetsu/patch-1 | Cedric Nugteren | |
Remove unnecessary qualifier of inline function | |||
2018-11-19 | Remove unnecessary attribute of inline function | Koichi Akabe | |
2018-11-17 | Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip | Cedric Nugteren | |
Add im2colflip and col2imflip functions | |||
2018-11-12 | Add kernel_mode option to im2col, col2im, and convgemm functions | Koichi Akabe | |
2018-11-09 | Merge pull request #331 from CNugteren/CLBlast-270-col2im | Cedric Nugteren | |
Implements col2im routine | |||
2018-11-07 | Changed col2im to append to the existing im-buffer | Cedric Nugteren | |