summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-01-12Drop erroneous suggests for libclblast-devGard Spreemann
These were introduced through hamfisted copy-pasting.
2021-01-10Changelog updated for releasedebian/1.5.1-2Gard Spreemann
2021-01-10Watch fileGard Spreemann
2021-01-10Git to SalsaGard Spreemann
2020-12-22Changelog for initial upload.gspr/new-queue-20201222debian/1.5.1-1Gard Spreemann
2020-12-22Alphabetically order entries.Gard Spreemann
2020-12-22VCS URLsGard Spreemann
2020-12-22Close ITP.Gard Spreemann
2020-12-22Initial packaging.Gard Spreemann
2020-12-22Merge tag '1.5.1' into debian/sidGard Spreemann
2020-12-22Initial packaging.Gard Spreemann
2020-02-18Updated to version 1.5.1Cedric Nugteren
2020-02-18Merge pull request #376 from CNugteren/fix_tuner_exception_catchingCedric Nugteren
Catches all exceptions of the tuners
2020-02-17Catches all exceptions of the tunersCedric Nugteren
2019-12-15Merge pull request #372 from trantila/masterCedric Nugteren
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-09Reduce TestMatrix calls for xgemmstridedbatched.Tarmo Räntilä
Replace the looped test by a single one with the offset of the last batch.
2019-12-09Reduce TestMatrix calls for xgemmbatched.Tarmo Räntilä
Replace the looped test by a single one with the maximal found offset.
2019-09-06Added notion of fixes in XhadFasterCedric Nugteren
2019-09-06Merge pull request #368 from etomzak/masterCedric Nugteren
Fix out-of-bounds read/write in XhadFaster
2019-09-04Fix out-of-bounds read/write in XhadFasteretomzak
Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.
2019-05-19Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iaminCedric Nugteren
Fixed a bug in the absolute-min index kernel
2019-05-19Fixed a bug in the absolute-min index kernelCedric Nugteren
2019-05-16Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fixCedric Nugteren
intel shuffle extension fix
2019-05-11Added a function to set the OpenCL kernel standard, either 1.1 or 1.2Cedric Nugteren
2019-05-08Changed back to cl_intel_subgroups as suggestedCedric Nugteren
2019-05-07Added a host-code check to make sure the avc_motion_estimation is availableCedric Nugteren
2019-05-07Enabled avc_motion_estimation extension for Intel subgroup shufflingCedric Nugteren
2019-05-06Merge pull request #356 from umar456/osx_assertCedric Nugteren
Remove assert for extention not available in macOS
2019-05-03Remove assert for extention not available in macOSUmar Arshad
The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.
2019-02-09Added tuning parameters for Tesla P100 16GBCedric Nugteren
2019-02-09Added tuning parameters for Xeon E5-2630 v3 and v4Cedric Nugteren
2019-01-26Merge pull request #348 from ↵Cedric Nugteren
CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support
2019-01-23Added fp32 to fp16 conversion function in Python to make haxpy example workCedric Nugteren
2019-01-22Added a (non-working) sample of half precision AXPY in PythonCedric Nugteren
2019-01-22Updated pyclblast README, updated to 1.2.0 for half-precision supportCedric Nugteren
2019-01-22Added experimental support for half-precision in pyclblastCedric Nugteren
2019-01-19Merge pull request #345 from CNugteren/convolution-fixes-and-tunerCedric Nugteren
Convolution with single kernel
2019-01-19Added documentation on the convgemm routineCedric Nugteren
2019-01-19Added a few more initial Intel tuning parameters for convgemmCedric Nugteren
2019-01-05Added a check to prevent the stride of matrix C being set to 0 for the ↵Cedric Nugteren
strided-batched-GEMM routine
2018-12-31Added convgemm to the CLBlast database, added initial parameters for Skylake GPUCedric Nugteren
2018-12-31Added support for the convgemm tuner in the tuner databaseCedric Nugteren
2018-12-31Added the forgotten batch dimension to the tuner to get correct kernel ↵Cedric Nugteren
executions
2018-12-23Merge pull request #343 from vbkaisetsu/feature/convgemm-singleCedric Nugteren
Fix single kernel version of convgemm
2018-12-22Merge branch 'master' into convolution-fixes-and-tunerCedric Nugteren
2018-12-21Update changelogKoichi Akabe
2018-12-18Update the documentationKoichi Akabe
2018-12-18Fix the xconvgemm tunerKoichi Akabe
2018-12-18Added first version of a tuner for the ConvGemm direct kernelCedric Nugteren
2018-12-18Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernelKoichi Akabe