2 files changed, 15 insertions, 0 deletions
diff --git a/README.md b/README.md
index 3c8ceee7..2084e51e 100644
--- a/README.md
+++ b/README.md
@@ -78,6 +78,7 @@ More detailed documentation is available in separate files:
 * [Tuning for better performance](doc/tuning.md)
 * [Testing the library for correctness](doc/testing.md)
 * [Bindings / wrappers for other languages](doc/bindings.md)
+* [Glossary with some terms explained](doc/glossary.md)
 
 
 Known issues
diff --git a/doc/glossary.md b/doc/glossary.md
new file mode 100644
index 00000000..821ffc69
--- /dev/null
+++ b/doc/glossary.md
@@ -0,0 +1,14 @@
+CLBlast: Glossary
+================
+
+This document describes some commonly used terms in CLBlast documentation and code. For other information about CLBlast, see the [main README](../README.md).
+
+* __BLAS__: The set of 'Basic Linear Algebra Subroutines'.
+* __Netlib BLAS__: The official BLAS API definition, with __CBLAS__ providing the C headers. 
+* __OpenCL__: The open compute language, a Khronos standard for heterogeneous and parallel computing, e.g. on GPUs.
+* __kernel__: An OpenCL parallel program that runs on the target device.
+* __clBLAS__: Another OpenCL BLAS library, maintained by AMD.
+* __cuBLAS__: The main CUDA BLAS library, maintained by NVIDIA.
+* __GEMM__: The 'GEneral Matrix Multiplication' routine.
+* __Direct GEMM__: Computing GEMM using a single generic kernel which handles all cases (e.g. all kinds of matrix sizes).
+* __Indirect GEMM__: Computing GEMM using multiple kernels: the main GEMM kernel and a few pre-processing and post-processing kernels. The main kernel makes several assumptions (e.g. sizes need to be multiples of 32), which the other kernels make sure are satisfied. The main kernel is often faster than the generic kernel of the direct approach, but the cost of pre-processing and post-processing kernels can sometimes be high for small sizes or particular devices.