summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-22Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast ↵Cedric Nugteren
call in the tests and samples
2016-09-21It is now possible to set the OpenCL compiler options through an ↵Cedric Nugteren
environmental variable
2016-09-21Merge branch 'master' into developmentCedric Nugteren
2016-09-20Merge pull request #100 from gpu/masterCedric Nugteren
Fixed link in README.md
2016-09-20Fixed link in README.mdMarco Hutter
The GitHub link could be https://github.com/gpu (without "s"), but the website should be OK, too
2016-09-13Merge pull request #99 from CNugteren/developmentCedric Nugteren
Update to version 0.9.0
2016-09-13Updated to version 0.9.0Cedric Nugteren
2016-09-13Renamed the DEFAULT_DEVICE and DEFAULT_PLATFORM env variables to be in line ↵Cedric Nugteren
with recent usages of CLBLAST_DEVICE and CLBLAST_PLATFORM
2016-09-13Merge pull request #98 from intelfx/no-ignored-attributesCedric Nugteren
CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings
2016-09-13CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warningsIvan Shapovalov
2016-09-12Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ↵Cedric Nugteren
can't handle long strings
2016-09-12Merge branch 'database_rewrite' into developmentCedric Nugteren
2016-09-12Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are ↵Cedric Nugteren
now automatically taken from 32-bit if there are no entries at all
2016-09-11Complete re-write of the database script. Changed Pandas for the much faster ↵Cedric Nugteren
and convienient plain JSON/dict data-type
2016-09-10Merge branch 'xgemm_tuner_exhaustive' into developmentCedric Nugteren
2016-09-10Updated database based on exhaustive tuning results for GEMM for the R9 ↵Cedric Nugteren
M370X GPU
2016-09-10Updated the database script to remove duplicate entries: keeps only the ↵Cedric Nugteren
best-performing cases for a specific parameters combination
2016-09-06Split GEMM tuning in two parts: a small set of tuning parameters which is ↵Cedric Nugteren
explored exhaustively and a larger set which is explored randomly
2016-09-04Refactored the Python C++ generator script; now confirms to the PEP8 styleguideCedric Nugteren
2016-09-04The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ↵Cedric Nugteren
problems if C contains NaNs
2016-09-03Added tuning results for Intel Broadwell 5500 GT2 GPUCedric Nugteren
2016-09-03Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to ↵Cedric Nugteren
handle duplicate entries of different runs
2016-08-27Merge pull request #93 from intelfx/test-read-environmentCedric Nugteren
test/correctness: read platform and device from environment
2016-08-27test/correctness: read platform and device from environmentIvan Shapovalov
Support passing environment variables CLBLAST_PLATFORM and CLBLAST_DEVICE instead of -platform and -device arguments to test executables. This is for `ctest`.
2016-08-22Merge branch 'database_defaults' into developmentCedric Nugteren
2016-08-21Also changed the default-default for unknown device types to use the same ↵Cedric Nugteren
method as for known device groups
2016-08-21Increased the ratio of GEMM tuning results to explore; reduced the tuning ↵Cedric Nugteren
search space to have a better chance to evaluate more likely parameter combinations
2016-08-21Updated the changelog; refactored the database-get-bests code a bitCedric Nugteren
2016-08-20Merge branch 'development' of github.com:CNugteren/CLBlast into developmentCedric Nugteren
Conflicts: README.md
2016-08-20Merge branch 'dvasschemacq-master' into developmentCedric Nugteren
2016-08-20Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into ↵Cedric Nugteren
dvasschemacq-master Conflicts: src/kernels/level1/xaxpy.opencl src/kernels/level2/xgemv.opencl src/kernels/level2/xgemv_fast.opencl src/kernels/level2/xger.opencl src/kernels/level2/xher.opencl src/kernels/level2/xher2.opencl src/kernels/level3/xgemm_part2.opencl
2016-08-18Adapt opencl files for 1.1 OpenCLD. Van Assche
In OpenCL 1.1 __kernel has to be before __attribute__, at least with Vivante compiler.
2016-08-15Updated the database script to calculate the relative best performance of ↵Cedric Nugteren
tuning results common for a device/vendor type
2016-08-09Improved the speed of the new common-best defaults method for the database ↵Cedric Nugteren
generation
2016-08-07Added a first version of the database's common-best default calculationCedric Nugteren
2016-07-28Minor update regarding the previous CMake export/install target changesCedric Nugteren
2016-07-28Merge pull request #86 from intelfx/cmakeCedric Nugteren
CMakeLists.txt: provide a find_package() config for dependent projects
2016-07-28.appveyor.yml: move {OPENCL,CLBLAST}_ROOT out of source treeIvan Shapovalov
Reasoning is the same as in previous commit: CMake does not like having OpenCL header path inside of the source tree. CLBLAST_ROOT is moved for uniformity.
2016-07-28.travis.yml: use OpenCL ICD Loader and headers shipped by distroIvan Shapovalov
Using our own headers causes problems with CMake which does not like having OpenCL header path inside of the source tree. While at it, use distro's universal OpenCL loader as well.
2016-07-28CMakeLists.txt: use target_include_directories()Ivan Shapovalov
2016-07-28CMakeLists.txt: provide a find_package() config for dependent projectsIvan Shapovalov
2016-07-26Merge branch 'gemv_performance' into developmentCedric Nugteren
2016-07-25Removed all old tuning results for the XgemvFastRot kernel; re-added for a ↵Cedric Nugteren
couple of devices
2016-07-25Moved the XgemvFast and XgemvFastRot tuning database into a separate fileCedric Nugteren
2016-07-24Merge branch 'development' into gemv_performanceCedric Nugteren
2016-07-24Minor improvements after merging in groundwork for custom tuning parameters ↵Cedric Nugteren
and kernels
2016-07-24Merge pull request #84 from intelfx/device-specific-kernelsCedric Nugteren
Groundwork for device-specific routines
2016-07-24Refactored the Python database script: separated functionality in modules, ↵Cedric Nugteren
now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up
2016-07-23Fixe a bug in the new XgemvFastRot kernel related to local memory sizeCedric Nugteren
2016-07-23Further improvements to the XgemvFastRot kernel, properly enables coalescing nowCedric Nugteren