Merge pull request #212 from CNugteren/kernel_selection_tuner

GEMM kernel selection tuner
author: Cedric Nugteren <web@cedricnugteren.nl> 2017-11-07 22:20:13 +0100
committer: GitHub <noreply@github.com> 2017-11-07 22:20:13 +0100
commit: b18cc9d3f18accf88c9551c98c51b99add57b96c (patch)
tree: a9017ad18e161647b05ba6c597dfe8ae5125298b /README.md
parent: 061b1c571b86714f1d323563a9ac587a850ecddc (diff)
parent: 6fe9916231a0c6316e3427aaed3be281080a2692 (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/README.md b/README.md
index 8321c2ce..8a0fe17a 100644
--- a/README.md
+++ b/README.md
@@ -196,6 +196,8 @@ In summary, tuning the entire library for your device can be done as follows (st
 
 Alternatively, you can also supply your tuning parameters programmatically through the CLBlast API. This is especially useful if you tune for specific non-standard arguments (e.g. a rectangular or a very small matrix). To do so, you can call the `OverrideParameters` function which will set new parameters for a specific kernel. At the first next call of the target routine, CLBlast will compile a new binary and use it together with the new parameters from then on. Until `OverrideParameters` is called again of course. See the [API documentation](doc/clblast.md#overrideparameters-override-tuning-parameters-auxiliary-function) for more details.
 
+After the kernels are tuned, you can run the `clblast_tuner_routine_xgemm` tuner to optimize the high-level GEMM routine, i.e. selecting which method to use: the direct kernel or the in-direct kernel.
+
 
 Compiling the correctness tests (optional)
 -------------
author	Cedric Nugteren <web@cedricnugteren.nl>	2017-11-07 22:20:13 +0100
committer	GitHub <noreply@github.com>	2017-11-07 22:20:13 +0100
commit	b18cc9d3f18accf88c9551c98c51b99add57b96c (patch)
tree	a9017ad18e161647b05ba6c597dfe8ae5125298b /README.md
parent	061b1c571b86714f1d323563a9ac587a850ecddc (diff)
parent	6fe9916231a0c6316e3427aaed3be281080a2692 (diff)