summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-10Add tuning results for Radeon RX VegaCedric Nugteren
2020-10-05Merge pull request #400 from baryluk/patch-6Cedric Nugteren
Allow single graph / subplot on plot
2020-10-05Allow single graph / subplot on plotWitold Baryluk
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots. It is not actually helpful, and IMHO bad design. Make it always `ndarray`. The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it. Also, no need for `ndarray.flat` really. Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1).
2020-10-04Merge pull request #399 from baryluk/patch-3Cedric Nugteren
Fix a typo in benchmark when running fp 16 vs 32
2020-10-04Fix a typo in benchmark when running fp 16 vs 32Witold Baryluk
The intention here was to limit the iteration range to common indexes only. Fix that.
2020-10-04Merge pull request #397 from baryluk/patch-1Cedric Nugteren
Fix Python SyntaxWarning
2020-10-04Merge pull request #398 from baryluk/patch-2Cedric Nugteren
Fix --load_from_disk argument help message
2020-10-04Fix --load_from_disk argument help messageWitold Baryluk
2020-10-04Fix Python SyntaxWarningWitold Baryluk
There is no guarantee that all empty strings objects are the same or share object with `""` literal.
2020-10-03Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-scriptCedric Nugteren
Fix a Python 3 bug in the benchmark script
2020-10-02Fix a Python 3 bug in the benchmark scriptCedric Nugteren
2020-08-16Added FUNDING.yml fileCedric Nugteren
2020-06-07Merge pull request #392 from 9prady9/fix_Program_getIRCedric Nugteren
Fix Program::GetIR to handle programs with multiple devices
2020-06-07Add a cautionary note in Program::GetIR and mention the fix in CHANGELOGPradeep Garigipati
2020-06-05Fix Program::GetIR to handle programs with multiple devicesPradeep Garigipati
2020-05-13Merge pull request #389 from CNugteren/CLBlast-385-version-definesCedric Nugteren
Added version number defines
2020-05-12Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version ↵Cedric Nugteren
numbering
2020-05-11Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failureCedric Nugteren
Fixed tuners global workgroup size
2020-05-11Increase display width of the local/global sizesCedric Nugteren
2020-05-10Made sure that the global workgroup size is a multiple of the local size in ↵Cedric Nugteren
the tuners
2020-05-10Added logging of local/global workgroup sizes when run the tunersCedric Nugteren
2020-05-10Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routinesCedric Nugteren
PyCLBlast: add missing batched routines
2020-05-10Updated PyCLBlast version numberCedric Nugteren
2020-05-10Added a sample to demonstrate a batched routineCedric Nugteren
2020-05-10Added pyclblast bindings for the 3 batched routinesCedric Nugteren
2020-05-04Merge pull request #383 from CNugteren/CLBlast-382-improve-tunerCedric Nugteren
Move queue creation out of the tuner loop
2020-05-03Move queue creation out of the tuner loopCedric Nugteren
2020-03-15Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-aminCedric Nugteren
Change amax/amin behaviour
2020-03-08Update API documentationCedric Nugteren
2020-03-08Made it more likely (but no guarantees) for amax/amin to return the first indexCedric Nugteren
2020-03-08Added sample to play around with XAMAX routineCedric Nugteren
2020-03-08Silenced a new OpenCL warning messageCedric Nugteren
2020-02-18Updated to version 1.5.1Cedric Nugteren
2020-02-18Merge pull request #376 from CNugteren/fix_tuner_exception_catchingCedric Nugteren
Catches all exceptions of the tuners
2020-02-17Catches all exceptions of the tunersCedric Nugteren
2019-12-15Merge pull request #372 from trantila/masterCedric Nugteren
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-09Reduce TestMatrix calls for xgemmstridedbatched.Tarmo Räntilä
Replace the looped test by a single one with the offset of the last batch.
2019-12-09Reduce TestMatrix calls for xgemmbatched.Tarmo Räntilä
Replace the looped test by a single one with the maximal found offset.
2019-09-06Added notion of fixes in XhadFasterCedric Nugteren
2019-09-06Merge pull request #368 from etomzak/masterCedric Nugteren
Fix out-of-bounds read/write in XhadFaster
2019-09-04Fix out-of-bounds read/write in XhadFasteretomzak
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.
2019-05-19Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iaminCedric Nugteren
Fixed a bug in the absolute-min index kernel
2019-05-19Fixed a bug in the absolute-min index kernelCedric Nugteren
2019-05-16Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fixCedric Nugteren
intel shuffle extension fix
2019-05-11Added a function to set the OpenCL kernel standard, either 1.1 or 1.2Cedric Nugteren
2019-05-08Changed back to cl_intel_subgroups as suggestedCedric Nugteren
2019-05-07Added a host-code check to make sure the avc_motion_estimation is availableCedric Nugteren
2019-05-07Enabled avc_motion_estimation extension for Intel subgroup shufflingCedric Nugteren
2019-05-06Merge pull request #356 from umar456/osx_assertCedric Nugteren
Remove assert for extention not available in macOS
2019-05-03Remove assert for extention not available in macOSUmar Arshad
The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.