summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorCedric Nugteren <web@cedricnugteren.nl>2018-03-11 15:38:33 +0100
committerGitHub <noreply@github.com>2018-03-11 15:38:33 +0100
commit934893972ee0b8d279ad24e3867ca8af99e170ec (patch)
tree2525761df72c21e0a9a724dff3d84dbfa1de55c0 /doc
parentbcf12084319ed6eb687e2308fcb050eaad7c95ec (diff)
parent903deaf36812616ce82ea94afb880fd16ad6cf0b (diff)
Merge pull request #262 from CNugteren/CLBlast-237-tuning-api
CLBlast #237: Tuning API
Diffstat (limited to 'doc')
-rw-r--r--doc/api.md74
-rw-r--r--doc/tuning.md24
2 files changed, 98 insertions, 0 deletions
diff --git a/doc/api.md b/doc/api.md
index 0fbdeaa0..a60e16ce 100644
--- a/doc/api.md
+++ b/doc/api.md
@@ -3497,3 +3497,77 @@ Arguments to OverrideParameters (C++ version):
* `const std::string &kernel_name`: The target kernel name. This has to be one of the existing CLBlast kernels (Xaxpy, Xdot, Xgemv, XgemvFast, XgemvFastRot, Xgemv, Xger, Copy, Pad, Transpose, Padtranspose, Xgemm, or XgemmDirect). If this argument is incorrect, this function will return with the `clblast::kInvalidOverrideKernel` status-code.
* `const Precision precision`: The CLBlast precision enum to set the new parameters for.
* `const std::unordered_map<std::string,size_t> &parameters`: An unordered map of strings to integers. This has to contain all the tuning parameters for a specific kernel as reported by the included tuners (e.g. `{ {"COPY_DIMX",8}, {"COPY_DIMY",32}, {"COPY_VW",4}, {"COPY_WPT",8} }` for the `Copy` kernel). If this argument is incorrect, this function will return with the `clblast::kMissingOverrideParameter` status-code.
+
+
+
+Tune<kernel_name>: Run the tuner for a particular kernel (advanced usage)
+-------------
+
+The CLBlast kernels can be tuned using the tuning binaries, but also programmatically through an API. This is only recommended for advanced usage, see for more information [the tuning docs](tuning.md).
+
+C++ API:
+```
+// Tunes the "Xaxpy" kernel, used for many level-1 routines such as XAXPY, XCOPY, and XSWAP
+template <typename T>
+StatusCode PUBLIC_API TuneXaxpy(cl_command_queue* queue, const size_t n,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Xdot" kernel, used for level-1 reduction routines such as XDOT, XMAX, and XSUM
+template <typename T>
+StatusCode PUBLIC_API TuneXdot(cl_command_queue* queue, const size_t n,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Xgemv" kernel, used for matrix-vector level-2 routines such as XGEMV, XGBMV, and XHEMV
+template <typename T>
+StatusCode PUBLIC_API TuneXgemv(cl_command_queue* queue, const size_t m, const size_t n,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Xger" kernel, used for matrix update level-2 routines such as XGER, XHER, and XSYR2
+template <typename T>
+StatusCode PUBLIC_API TuneXger(cl_command_queue* queue, const size_t m, const size_t n,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Xgemm" kernel, used for most level-3 routines such as XGEMM, XSYMM, and XHER2K
+template <typename T>
+StatusCode PUBLIC_API TuneXgemm(cl_command_queue* queue, const size_t m, const size_t n, const size_t k,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "XgemmDiret" kernel, used for most level-3 routines such as XGEMM, XSYMM, and XHER2K
+template <typename T>
+StatusCode PUBLIC_API TuneXgemmDirect(cl_command_queue* queue, const size_t m, const size_t n, const size_t k,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Copy" kernel, used for most level-3 routines such as XGEMM, XSYMM, and XHER2K
+template <typename T>
+StatusCode PUBLIC_API TuneCopy(cl_command_queue* queue, const size_t m, const size_t n,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Pad" kernel, used for most level-3 routines such as XGEMM, XSYMM, and XHER2K
+template <typename T>
+StatusCode PUBLIC_API TunePad(cl_command_queue* queue, const size_t m, const size_t n,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Transpose" kernel, used for most level-3 routines such as XGEMM, XSYMM, and XHER2K
+template <typename T>
+StatusCode PUBLIC_API TuneTranspose(cl_command_queue* queue, const size_t m, const size_t n,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Padtranspose" kernel, used for most level-3 routines such as XGEMM, XSYMM, and XHER2K
+template <typename T>
+StatusCode PUBLIC_API TunePadtranspose(cl_command_queue* queue, const size_t m, const size_t n,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+
+// Tunes the "Xgemm" kernel, used for the level-3 routine XTRSM
+template <typename T>
+StatusCode PUBLIC_API TuneInvert(cl_command_queue* queue, const size_t m, const size_t n, const size_t k,
+ const double fraction, std::unordered_map<std::string,size_t> &parameters);
+```
+
+Arguments to Tune<kernel_name> (C++ version):
+
+* `cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to tune the kernel for.
+* `const size_t m`: The routine argument `m` to tune for (not applicable for all kernels)
+* `const size_t n`: The routine argument `n` to tune for
+* `const size_t k`: The routine argument `k` to tune for (not applicable for all kernels)
+* `const double fraction`: A value between 0.0 and 1.0 which determines the fraction of the tuning search space to explore.
+* `std::unordered_map<std::string,size_t> &parameters`: An unordered map of strings to integers. This will return the best found tuning parameters.
diff --git a/doc/tuning.md b/doc/tuning.md
index 88c4fc4c..ebf3cb0c 100644
--- a/doc/tuning.md
+++ b/doc/tuning.md
@@ -100,6 +100,14 @@ In summary, tuning the entire library for your device can be done as follows (st
After the kernels are tuned, you can run the `clblast_tuner_routine_xgemm` tuner to optimize the high-level GEMM routine, i.e. selecting which method to use: the direct kernel or the in-direct kernel.
+Tuning using the API (advanced users only)
+-------------
+
+Apart from running the tuning binaries, it is also possible to run the tuners programmatically through the CLBlast API. This could be useful if you want to tune for non-standard arguments (e.g. a rectangular or very small matrix). The tuning results can then also be set programmatically using `OverrideParameters`.
+
+The tuning API does not perform any disk or stdout I/O, thus it is not possible to track progress. Running the regular tuner binaries should give an idea of the amount of configurations to explore for a particular device, thus giving an indication of a good value for the `fraction` argument (see the [API documentation](api.md) for more details).
+
+
Inspecting and changing tuning parameters at run-time
-------------
@@ -120,3 +128,19 @@ Tuning OpenCL compiler options
-------------
For all of CLBlast's APIs, it is possible to optionally set an OS environmental variable `CLBLAST_BUILD_OPTIONS` to pass specific build options to the OpenCL compiler. Also make sure this is set in the same way when running the tuners.
+
+
+Which kernels are used for which routines?
+-------------
+
+To find out which tuners to run for which routines, you can use the table below. The kernel names correspond to the tuner binaries, the tuner API, and to the arguments for `OverrideParameters` and `RetrieveParameters`.
+
+| Routines | Kernel(s) / Tuner(s) |
+| -------------------------------------------------------------------------|---------------------------------|
+| AXPY COPY SCAL SWAP OMATCOPY AXPYBATCHED | Xaxpy |
+| AMAX ASUM DOT DOTC DOTU NRM2 SUM MAX MIN AMIN | Xdot |
+| GBMV GEMV HBMV HEMV HPMV SBMV SPMV SYMV TMBV TPMV TRMV TRSV | Xgemv |
+| GER GERC GERU HER HER2 HPR HPR2 SPR SPR2 SYR SYR2 | Xger |
+| GEMM HEMM HER2K HERK SYMM SYR2K SYRK TRMM GEMMBATCHED GEMMSTRIDEDBATCHED | Xgemm XgemmDirect Copy Pad Transpose Padtranspose |
+| TRSM | Xgemm XgemmDirect Copy Pad Transpose Padtranspose Invert |
+| IM2COL | Copy |