Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-05-10 | Updated PyCLBlast version number | Cedric Nugteren | |
2020-05-10 | Added a sample to demonstrate a batched routine | Cedric Nugteren | |
2020-05-10 | Added pyclblast bindings for the 3 batched routines | Cedric Nugteren | |
2020-05-04 | Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner | Cedric Nugteren | |
Move queue creation out of the tuner loop | |||
2020-05-03 | Move queue creation out of the tuner loop | Cedric Nugteren | |
2020-03-15 | Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin | Cedric Nugteren | |
Change amax/amin behaviour | |||
2020-03-08 | Update API documentation | Cedric Nugteren | |
2020-03-08 | Made it more likely (but no guarantees) for amax/amin to return the first index | Cedric Nugteren | |
2020-03-08 | Added sample to play around with XAMAX routine | Cedric Nugteren | |
2020-03-08 | Silenced a new OpenCL warning message | Cedric Nugteren | |
2020-02-18 | Updated to version 1.5.1 | Cedric Nugteren | |
2020-02-18 | Merge pull request #376 from CNugteren/fix_tuner_exception_catching | Cedric Nugteren | |
Catches all exceptions of the tuners | |||
2020-02-17 | Catches all exceptions of the tuners | Cedric Nugteren | |
2019-12-15 | Merge pull request #372 from trantila/master | Cedric Nugteren | |
Reduced number of TestMatrix calls for the batched xgemm routines. | |||
2019-12-09 | Reduce TestMatrix calls for xgemmstridedbatched. | Tarmo Räntilä | |
Replace the looped test by a single one with the offset of the last batch. | |||
2019-12-09 | Reduce TestMatrix calls for xgemmbatched. | Tarmo Räntilä | |
Replace the looped test by a single one with the maximal found offset. | |||
2019-09-06 | Added notion of fixes in XhadFaster | Cedric Nugteren | |
2019-09-06 | Merge pull request #368 from etomzak/master | Cedric Nugteren | |
Fix out-of-bounds read/write in XhadFaster | |||
2019-09-04 | Fix out-of-bounds read/write in XhadFaster | etomzak | |
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd. | |||
2019-05-19 | Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin | Cedric Nugteren | |
Fixed a bug in the absolute-min index kernel | |||
2019-05-19 | Fixed a bug in the absolute-min index kernel | Cedric Nugteren | |
2019-05-16 | Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix | Cedric Nugteren | |
intel shuffle extension fix | |||
2019-05-11 | Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 | Cedric Nugteren | |
2019-05-08 | Changed back to cl_intel_subgroups as suggested | Cedric Nugteren | |
2019-05-07 | Added a host-code check to make sure the avc_motion_estimation is available | Cedric Nugteren | |
2019-05-07 | Enabled avc_motion_estimation extension for Intel subgroup shuffling | Cedric Nugteren | |
2019-05-06 | Merge pull request #356 from umar456/osx_assert | Cedric Nugteren | |
Remove assert for extention not available in macOS | |||
2019-05-03 | Remove assert for extention not available in macOS | Umar Arshad | |
The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime. | |||
2019-02-09 | Added tuning parameters for Tesla P100 16GB | Cedric Nugteren | |
2019-02-09 | Added tuning parameters for Xeon E5-2630 v3 and v4 | Cedric Nugteren | |
2019-01-26 | Merge pull request #348 from ↵ | Cedric Nugteren | |
CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support | |||
2019-01-23 | Added fp32 to fp16 conversion function in Python to make haxpy example work | Cedric Nugteren | |
2019-01-22 | Added a (non-working) sample of half precision AXPY in Python | Cedric Nugteren | |
2019-01-22 | Updated pyclblast README, updated to 1.2.0 for half-precision support | Cedric Nugteren | |
2019-01-22 | Added experimental support for half-precision in pyclblast | Cedric Nugteren | |
2019-01-19 | Merge pull request #345 from CNugteren/convolution-fixes-and-tuner | Cedric Nugteren | |
Convolution with single kernel | |||
2019-01-19 | Added documentation on the convgemm routine | Cedric Nugteren | |
2019-01-19 | Added a few more initial Intel tuning parameters for convgemm | Cedric Nugteren | |
2019-01-05 | Added a check to prevent the stride of matrix C being set to 0 for the ↵ | Cedric Nugteren | |
strided-batched-GEMM routine | |||
2018-12-31 | Added convgemm to the CLBlast database, added initial parameters for Skylake GPU | Cedric Nugteren | |
2018-12-31 | Added support for the convgemm tuner in the tuner database | Cedric Nugteren | |
2018-12-31 | Added the forgotten batch dimension to the tuner to get correct kernel ↵ | Cedric Nugteren | |
executions | |||
2018-12-23 | Merge pull request #343 from vbkaisetsu/feature/convgemm-single | Cedric Nugteren | |
Fix single kernel version of convgemm | |||
2018-12-22 | Merge branch 'master' into convolution-fixes-and-tuner | Cedric Nugteren | |
2018-12-21 | Update changelog | Koichi Akabe | |
2018-12-18 | Update the documentation | Koichi Akabe | |
2018-12-18 | Fix the xconvgemm tuner | Koichi Akabe | |
2018-12-18 | Added first version of a tuner for the ConvGemm direct kernel | Cedric Nugteren | |
2018-12-18 | Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel | Koichi Akabe | |
2018-12-17 | Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests | Cedric Nugteren | |
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm |