Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-08-16 | Added FUNDING.yml file | Cedric Nugteren | |
2020-06-07 | Merge pull request #392 from 9prady9/fix_Program_getIR | Cedric Nugteren | |
Fix Program::GetIR to handle programs with multiple devices | |||
2020-06-07 | Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG | Pradeep Garigipati | |
2020-06-05 | Fix Program::GetIR to handle programs with multiple devices | Pradeep Garigipati | |
2020-05-13 | Merge pull request #389 from CNugteren/CLBlast-385-version-defines | Cedric Nugteren | |
Added version number defines | |||
2020-05-12 | Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version ↵ | Cedric Nugteren | |
numbering | |||
2020-05-11 | Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure | Cedric Nugteren | |
Fixed tuners global workgroup size | |||
2020-05-11 | Increase display width of the local/global sizes | Cedric Nugteren | |
2020-05-10 | Made sure that the global workgroup size is a multiple of the local size in ↵ | Cedric Nugteren | |
the tuners | |||
2020-05-10 | Added logging of local/global workgroup sizes when run the tuners | Cedric Nugteren | |
2020-05-10 | Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines | Cedric Nugteren | |
PyCLBlast: add missing batched routines | |||
2020-05-10 | Updated PyCLBlast version number | Cedric Nugteren | |
2020-05-10 | Added a sample to demonstrate a batched routine | Cedric Nugteren | |
2020-05-10 | Added pyclblast bindings for the 3 batched routines | Cedric Nugteren | |
2020-05-04 | Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner | Cedric Nugteren | |
Move queue creation out of the tuner loop | |||
2020-05-03 | Move queue creation out of the tuner loop | Cedric Nugteren | |
2020-03-15 | Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin | Cedric Nugteren | |
Change amax/amin behaviour | |||
2020-03-08 | Update API documentation | Cedric Nugteren | |
2020-03-08 | Made it more likely (but no guarantees) for amax/amin to return the first index | Cedric Nugteren | |
2020-03-08 | Added sample to play around with XAMAX routine | Cedric Nugteren | |
2020-03-08 | Silenced a new OpenCL warning message | Cedric Nugteren | |
2020-02-18 | Updated to version 1.5.1 | Cedric Nugteren | |
2020-02-18 | Merge pull request #376 from CNugteren/fix_tuner_exception_catching | Cedric Nugteren | |
Catches all exceptions of the tuners | |||
2020-02-17 | Catches all exceptions of the tuners | Cedric Nugteren | |
2019-12-15 | Merge pull request #372 from trantila/master | Cedric Nugteren | |
Reduced number of TestMatrix calls for the batched xgemm routines. | |||
2019-12-09 | Reduce TestMatrix calls for xgemmstridedbatched. | Tarmo Räntilä | |
Replace the looped test by a single one with the offset of the last batch. | |||
2019-12-09 | Reduce TestMatrix calls for xgemmbatched. | Tarmo Räntilä | |
Replace the looped test by a single one with the maximal found offset. | |||
2019-09-06 | Added notion of fixes in XhadFaster | Cedric Nugteren | |
2019-09-06 | Merge pull request #368 from etomzak/master | Cedric Nugteren | |
Fix out-of-bounds read/write in XhadFaster | |||
2019-09-04 | Fix out-of-bounds read/write in XhadFaster | etomzak | |
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd. | |||
2019-05-19 | Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin | Cedric Nugteren | |
Fixed a bug in the absolute-min index kernel | |||
2019-05-19 | Fixed a bug in the absolute-min index kernel | Cedric Nugteren | |
2019-05-16 | Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix | Cedric Nugteren | |
intel shuffle extension fix | |||
2019-05-11 | Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 | Cedric Nugteren | |
2019-05-08 | Changed back to cl_intel_subgroups as suggested | Cedric Nugteren | |
2019-05-07 | Added a host-code check to make sure the avc_motion_estimation is available | Cedric Nugteren | |
2019-05-07 | Enabled avc_motion_estimation extension for Intel subgroup shuffling | Cedric Nugteren | |
2019-05-06 | Merge pull request #356 from umar456/osx_assert | Cedric Nugteren | |
Remove assert for extention not available in macOS | |||
2019-05-03 | Remove assert for extention not available in macOS | Umar Arshad | |
The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime. | |||
2019-02-09 | Added tuning parameters for Tesla P100 16GB | Cedric Nugteren | |
2019-02-09 | Added tuning parameters for Xeon E5-2630 v3 and v4 | Cedric Nugteren | |
2019-01-26 | Merge pull request #348 from ↵ | Cedric Nugteren | |
CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support | |||
2019-01-23 | Added fp32 to fp16 conversion function in Python to make haxpy example work | Cedric Nugteren | |
2019-01-22 | Added a (non-working) sample of half precision AXPY in Python | Cedric Nugteren | |
2019-01-22 | Updated pyclblast README, updated to 1.2.0 for half-precision support | Cedric Nugteren | |
2019-01-22 | Added experimental support for half-precision in pyclblast | Cedric Nugteren | |
2019-01-19 | Merge pull request #345 from CNugteren/convolution-fixes-and-tuner | Cedric Nugteren | |
Convolution with single kernel | |||
2019-01-19 | Added documentation on the convgemm routine | Cedric Nugteren | |
2019-01-19 | Added a few more initial Intel tuning parameters for convgemm | Cedric Nugteren | |
2019-01-05 | Added a check to prevent the stride of matrix C being set to 0 for the ↵ | Cedric Nugteren | |
strided-batched-GEMM routine |