summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md19
1 files changed, 15 insertions, 4 deletions
diff --git a/README.md b/README.md
index ddd841e2..b9631ea0 100644
--- a/README.md
+++ b/README.md
@@ -99,20 +99,26 @@ The CLBlast library will be tuned in the future for the most commonly used OpenC
* NVIDIA GPUs:
- GRID K520
- GeForce GTX 480
+ - GeForce GTX 670
- GeForce GTX 680
+ - GeForce GTX 750
- GeForce GTX 750 Ti
- GeForce GTX 980
+ - GeForce GTX 1070
- GeForce GTX Titan
- GeForce GTX Titan X
- Tesla K20m
- Tesla K40m
* AMD GPUs:
- - Tahiti
+ - AMD Radeon R9 M370X Compute Engine
- Hawaii
+ - Oland
- Pitcairn
- - Radeon R9 M370X Compute Engine
+ - Tahiti
* Intel GPUs:
+ - HD Graphics 530
- HD Graphics Haswell Ultrabook GT2 Mobile
+ - HD Graphics 5500 BroadWell U-Processor GT2
- HD Graphics Skylake ULT GT2
- Iris
- Iris Pro
@@ -130,7 +136,7 @@ If your device is not (yet) among this list or if you want to tune CLBlast for s
Note that CLBlast's tuners are based on the [CLTune auto-tuning library](https://github.com/CNugteren/CLTune), which has to be installed separately (requires version 2.3.1 or higher).
-Compiling with `-DTUNERS=ON` will generate a number of tuners, each named `clblast_tuner_xxxxx`, in which `xxxxx` corresponds to a `.opencl` kernel file as found in `src/kernels`. These kernels corresponds to routines (e.g. `xgemm`) or to common pre-processing or post-processing kernels (`copy` and `transpose`). Running such a tuner will test a number of parameter-value combinations on your device and report which one gave the best performance. Running `make alltuners` runs all tuners for all precisions in one go. You can set the default device and platform for `alltuners` by setting the `DEFAULT_DEVICE` and `DEFAULT_PLATFORM` environmental variables before running CMake.
+Compiling with `-DTUNERS=ON` will generate a number of tuners, each named `clblast_tuner_xxxxx`, in which `xxxxx` corresponds to a `.opencl` kernel file as found in `src/kernels`. These kernels corresponds to routines (e.g. `xgemm`) or to common pre-processing or post-processing kernels (`copy` and `transpose`). Running such a tuner will test a number of parameter-value combinations on your device and report which one gave the best performance. Running `make alltuners` runs all tuners for all precisions in one go. You can set the default device and platform for `alltuners` by setting the `CLBLAST_DEVICE` and `CLBLAST_PLATFORM` environmental variables before running CMake.
The tuners output a JSON-file with the results. The best results need to be added to `src/database/kernels/xxxxx.hpp` in the appropriate section. However, this can be done automatically based on the JSON-data using a Python script in `scripts/database/database.py`. If you want the found parameters to be included in future releases of CLBlast, please attach the JSON files to the corresponding issue on GitHub or [email the main author](http://www.cedricnugteren.nl).
@@ -162,7 +168,7 @@ To build these tests, another BLAS library is needed to serve as a reference. Th
Afterwards, executables in the form of `clblast_test_xxxxx` are available, in which `xxxxx` is the name of a routine (e.g. `xgemm`). Note that CLBlast is tested for correctness against [clBLAS](http://github.com/clMathLibraries/clBLAS) and/or a regular CPU BLAS library. If both are installed on your system, setting the command-line option `-clblas 1` or `-cblas 1` will select the library to test against for the `clblast_test_xxxxx` executables. All tests have a `-verbose` option to enable additional diagnostic output. They also have a `-full_test` option to increase coverage further.
-All tests can be run directly together in one go through the `make alltests` target or using CTest (`make test` or `ctest`). In the latter case the output is less verbose. Both cases allow you to set the default device and platform to non-zero by setting the `DEFAULT_DEVICE` and `DEFAULT_PLATFORM` environmental variables before running CMake.
+All tests can be run directly together in one go through the `make alltests` target or using CTest (`make test` or `ctest`). In the latter case the output is less verbose. Both cases allow you to set the default device and platform to non-zero by setting the `CLBLAST_DEVICE` and `CLBLAST_PLATFORM` environmental variables before running CMake.
Compiling the performance tests/clients (optional)
@@ -180,6 +186,8 @@ The folder `doc/performance` contains some PDF files with performance results on
Note that the CLBlast library provides pre-tuned parameter-values for some devices only: if your device is not among these, then out-of-the-box performance might be poor. See above under `Using the tuners` to find out how to tune for your device.
+In case performance is still sub-optimal or something else is wrong, CLBlast can be build in verbose mode for (performance) debugging by specifying `-DVERBOSE=ON` to CMake.
+
Supported routines
-------------
@@ -278,6 +286,9 @@ The contributing authors (code, pull requests, testing) so far are:
* [Dragan Djuric](https://github.com/blueberry)
* [Marco Hutter](https://github.com/gpus)
* [Hugh Perkins](https://github.com/hughperkins)
+* [Gian-Carlo Pascutto](https://github.com/gcp)
+* [Ivan Shapovalov](https://github.com/intelfx)
+* [Dimitri Van Assche](https://github.com/dvasschemacq)
Tuning and testing on a variety of OpenCL devices was made possible by: