summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCedric Nugteren <web@cedricnugteren.nl>2017-10-16 21:54:23 +0200
committerCedric Nugteren <web@cedricnugteren.nl>2017-10-16 21:54:42 +0200
commit03760f80eb7eb07450da379d129ba64d92bfcc41 (patch)
tree81b1466c86e9bbb3c4dc52f223b21d21c55d6092
parent0719f1448655192d2ce6c17ee51c770ef16dd120 (diff)
Added CUDA API documentation
-rw-r--r--CHANGELOG4
-rw-r--r--README.md14
2 files changed, 17 insertions, 1 deletions
diff --git a/CHANGELOG b/CHANGELOG
index bb2013a6..a2416dd3 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,5 +1,9 @@
Development (next version)
+- Added a CUDA API to CLBlast:
+ * The library and kernels can be compiled with the CUDA driver API and NVRTC (requires CUDA 7.5)
+ * Two CUDA API sample programs are added: SGEMM and DAXPY
+ * All correctness tests and performance clients work on CUDA like they did for OpenCL
- Kernels are now cached based on their tuning parameters: fits the use-case of 'OverrideParameters'
- Improved performance for small GEMM problems by going from 3 to 1 optional temporary buffers
- Various minor fixes and enhancements
diff --git a/README.md b/README.md
index c13770f6..dac47fce 100644
--- a/README.md
+++ b/README.md
@@ -99,11 +99,23 @@ To get started quickly, a couple of stand-alone example programs are included in
cmake -DSAMPLES=ON ..
+For all of CLBlast's APIs, it is possible to optionally set an OS environmental variable `CLBLAST_BUILD_OPTIONS` to pass specific build options to the OpenCL compiler.
+
+
+Using the library (Netlib API)
+-------------
+
There is also a Netlib CBLAS C API available. This is however not recommended for full control over performance, since at every call it will copy all buffers to and from the OpenCL device. Especially for level 1 and level 2 BLAS functions performance will be impacted severely. However, it can be useful if you don't want to touch OpenCL at all. You can set the default device and platform by setting the `CLBLAST_DEVICE` and `CLBLAST_PLATFORM` environmental variables. This API can be used as follows after providing the `-DNETLIB=ON` flag to CMake:
#include <clblast_netlib_c.h>
-For all of CLBlast's APIs, it is possible to optionally set an OS environmental variable `CLBLAST_BUILD_OPTIONS` to pass specific build options to the OpenCL compiler.
+
+Using the library (CUDA API)
+-------------
+
+There is also a CUDA API of CLBlast available. Enabling this compiles the whole library for CUDA and thus replaces the OpenCL API. It is based upon the CUDA runtime and NVRTC APIs, requiring NVIDIA CUDA 7.5 or higher. The CUDA version of the library can be used as follows after providing the `-DCUDA=ON -DOPENCL=OFF` flags to CMake:
+
+ #include <clblast_cuda.h>
Using the tuners (optional)