Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-10-10 | Add tuning results for Radeon RX Vega | Cedric Nugteren | |
2020-10-05 | Merge pull request #400 from baryluk/patch-6 | Cedric Nugteren | |
Allow single graph / subplot on plot | |||
2020-10-05 | Allow single graph / subplot on plot | Witold Baryluk | |
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots. It is not actually helpful, and IMHO bad design. Make it always `ndarray`. The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it. Also, no need for `ndarray.flat` really. Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1). | |||
2020-10-04 | Merge pull request #399 from baryluk/patch-3 | Cedric Nugteren | |
Fix a typo in benchmark when running fp 16 vs 32 | |||
2020-10-04 | Fix a typo in benchmark when running fp 16 vs 32 | Witold Baryluk | |
The intention here was to limit the iteration range to common indexes only. Fix that. | |||
2020-10-04 | Merge pull request #397 from baryluk/patch-1 | Cedric Nugteren | |
Fix Python SyntaxWarning | |||
2020-10-04 | Merge pull request #398 from baryluk/patch-2 | Cedric Nugteren | |
Fix --load_from_disk argument help message | |||
2020-10-04 | Fix --load_from_disk argument help message | Witold Baryluk | |
2020-10-04 | Fix Python SyntaxWarning | Witold Baryluk | |
There is no guarantee that all empty strings objects are the same or share object with `""` literal. | |||
2020-10-03 | Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-script | Cedric Nugteren | |
Fix a Python 3 bug in the benchmark script | |||
2020-10-02 | Fix a Python 3 bug in the benchmark script | Cedric Nugteren | |
2020-08-16 | Added FUNDING.yml file | Cedric Nugteren | |
2020-06-07 | Merge pull request #392 from 9prady9/fix_Program_getIR | Cedric Nugteren | |
Fix Program::GetIR to handle programs with multiple devices | |||
2020-06-07 | Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG | Pradeep Garigipati | |
2020-06-05 | Fix Program::GetIR to handle programs with multiple devices | Pradeep Garigipati | |
2020-05-13 | Merge pull request #389 from CNugteren/CLBlast-385-version-defines | Cedric Nugteren | |
Added version number defines | |||
2020-05-12 | Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version ↵ | Cedric Nugteren | |
numbering | |||
2020-05-11 | Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure | Cedric Nugteren | |
Fixed tuners global workgroup size | |||
2020-05-11 | Increase display width of the local/global sizes | Cedric Nugteren | |
2020-05-10 | Made sure that the global workgroup size is a multiple of the local size in ↵ | Cedric Nugteren | |
the tuners | |||
2020-05-10 | Added logging of local/global workgroup sizes when run the tuners | Cedric Nugteren | |
2020-05-10 | Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines | Cedric Nugteren | |
PyCLBlast: add missing batched routines | |||
2020-05-10 | Updated PyCLBlast version number | Cedric Nugteren | |
2020-05-10 | Added a sample to demonstrate a batched routine | Cedric Nugteren | |
2020-05-10 | Added pyclblast bindings for the 3 batched routines | Cedric Nugteren | |
2020-05-04 | Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner | Cedric Nugteren | |
Move queue creation out of the tuner loop | |||
2020-05-03 | Move queue creation out of the tuner loop | Cedric Nugteren | |
2020-03-15 | Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin | Cedric Nugteren | |
Change amax/amin behaviour | |||
2020-03-08 | Update API documentation | Cedric Nugteren | |
2020-03-08 | Made it more likely (but no guarantees) for amax/amin to return the first index | Cedric Nugteren | |
2020-03-08 | Added sample to play around with XAMAX routine | Cedric Nugteren | |
2020-03-08 | Silenced a new OpenCL warning message | Cedric Nugteren | |
2020-02-18 | Updated to version 1.5.1 | Cedric Nugteren | |
2020-02-18 | Merge pull request #376 from CNugteren/fix_tuner_exception_catching | Cedric Nugteren | |
Catches all exceptions of the tuners | |||
2020-02-17 | Catches all exceptions of the tuners | Cedric Nugteren | |
2019-12-15 | Merge pull request #372 from trantila/master | Cedric Nugteren | |
Reduced number of TestMatrix calls for the batched xgemm routines. | |||
2019-12-09 | Reduce TestMatrix calls for xgemmstridedbatched. | Tarmo Räntilä | |
Replace the looped test by a single one with the offset of the last batch. | |||
2019-12-09 | Reduce TestMatrix calls for xgemmbatched. | Tarmo Räntilä | |
Replace the looped test by a single one with the maximal found offset. | |||
2019-09-06 | Added notion of fixes in XhadFaster | Cedric Nugteren | |
2019-09-06 | Merge pull request #368 from etomzak/master | Cedric Nugteren | |
Fix out-of-bounds read/write in XhadFaster | |||
2019-09-04 | Fix out-of-bounds read/write in XhadFaster | etomzak | |
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd. | |||
2019-05-19 | Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin | Cedric Nugteren | |
Fixed a bug in the absolute-min index kernel | |||
2019-05-19 | Fixed a bug in the absolute-min index kernel | Cedric Nugteren | |
2019-05-16 | Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix | Cedric Nugteren | |
intel shuffle extension fix | |||
2019-05-11 | Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 | Cedric Nugteren | |
2019-05-08 | Changed back to cl_intel_subgroups as suggested | Cedric Nugteren | |
2019-05-07 | Added a host-code check to make sure the avc_motion_estimation is available | Cedric Nugteren | |
2019-05-07 | Enabled avc_motion_estimation extension for Intel subgroup shuffling | Cedric Nugteren | |
2019-05-06 | Merge pull request #356 from umar456/osx_assert | Cedric Nugteren | |
Remove assert for extention not available in macOS | |||
2019-05-03 | Remove assert for extention not available in macOS | Umar Arshad | |
The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime. |