summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-10-12CUDA API now takes context and device in instead of streamCedric Nugteren
2017-10-11Added first (untested) version of a CUDA APICedric Nugteren
2017-10-09Made the half-precision header OpenCL-independentCedric Nugteren
2017-07-02Added interface and stubs for the im2col routineCedric Nugteren
2017-05-12Added the IxAMIN routines: absolute minimum version of IxAMAXCedric Nugteren
2017-04-17Fixed a namespace clash with CUDA FP16 for the half-datatypeCedric Nugteren
2017-04-07Added a special override database for the Apple CPU implementation on OS X: ↵Cedric Nugteren
this makes the test work, it does not focus on good performance
2017-03-10Added API and test infrastructure for the batched GEMM routineCedric Nugteren
2017-03-08Make batched routines based on offsets instead of a vector of cl_mem objects ↵Cedric Nugteren
- undoing many earlier changes
2017-03-05Added first naive version of the batched AXPY routineCedric Nugteren
2017-03-05Prepared generator for batched routines; added batched AXPY routine interfaceCedric Nugteren
2017-02-26Merge branch 'development' into triangular_solversCedric Nugteren
2017-02-26Removed half-precision support from the TRSM routine; too unstableCedric Nugteren
2017-02-18Fixed the naming of the C API of OverrideParameters and fixed the descriptionCedric Nugteren
2017-02-16Added a C interface to the OverrideParameters function; added some in-line ↵Cedric Nugteren
comments to the API
2017-02-16Added input-sanity checks for the OverrideParameters functionCedric Nugteren
2017-02-13Added first version of the OverrideParameters functionCedric Nugteren
2016-11-22Minor changes to ensure full compatibility with the Netlib CBLAS APICedric Nugteren
2016-11-20Made functions with scalar-buffers as output properly return valuesCedric Nugteren
2016-10-25Renamed the include and source files of the Netlib CBLAS APICedric Nugteren
2016-10-25Fixed some issues preventing the Netlib CBLAS API from linking correctlyCedric Nugteren
2016-10-25Made the Netlib CBLAS API use the same enums with prefixes as the regular C ↵Cedric Nugteren
API of CLBlast
2016-10-25Added initial version of a Netlib CBLAS implementation. TODO: Set correct ↵Cedric Nugteren
buffer sizes
2016-10-25Merge branch 'development' into netlib_blas_apiCedric Nugteren
Conflicts: scripts/generator/generator.py scripts/generator/generator/routine.py
2016-10-22All enums in the C API are now prefixed with CLBlast to avoid potential name ↵Cedric Nugteren
clashes with other projects
2016-10-22Added extra error codes to reflect the more detailed error reporting of ↵Cedric Nugteren
OpenCL functions
2016-10-22treewide: use C++ exceptions properlyIvan Shapovalov
Since the codebase is designed around proper C++ idioms such as RAII, it makes sense to only use C++ exceptions internally instead of mixing exceptions and error codes. The exceptions are now caught at top level to preserve compatibility with the existing error code-based API. Note that we deliberately do not catch C++ runtime errors (such as `std::bad_alloc`) nor logic errors (aka failed assertions) because no actual handling can ever happen for such errors. However, in the C interface we do catch _all_ exceptions (...) and convert them into a wild-card error code.
2016-10-16Merge branch 'development' into netlib_blas_apiCedric Nugteren
2016-10-15Added documentation and minor refactoring for the recent support of static ↵Cedric Nugteren
library compilation
2016-10-14Fixes for static lib compilation on WindowsShehzan Mohammed
2016-10-10Added support for compiling the library, the client, and the samples under ↵Cedric Nugteren
MSVC 2013
2016-10-05Made non-standard types void-pointers in the Netlib BLAS interfaceCedric Nugteren
2016-10-05Added first version of Netlib BLAS API headerCedric Nugteren
2016-06-30Added declspec(dllexport) to ClearCache and FillCache, and added ↵Cedric Nugteren
declspec(dllimport) when not building the library
2016-06-18Moved all headers into the source tree, changed headers to .hpp extensionCedric Nugteren
2016-06-18Clean-up of the routine class, moved RunKernel to the routine/common fileCedric Nugteren
2016-06-18Removed the template from the Routine base-classCedric Nugteren
2016-06-17Removed the precision argument from the routines in favor of a single ↵Cedric Nugteren
templated function
2016-06-17Removed the interface to the cache functions from the Routine class, calls ↵Cedric Nugteren
them directly now
2016-06-17Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine ↵Cedric Nugteren
class
2016-06-17Moved the ErrorIn function from the Routine class to the utilities headerCedric Nugteren
2016-06-17Moved the test-for-valid-buffers function from the Routine class to separate ↵Cedric Nugteren
functions in a separate file
2016-06-16Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, ↵Cedric Nugteren
and/or transposing
2016-06-15Added some constness to variables related to the GEMM routinesCedric Nugteren
2016-06-14Moved device vendor and type checks to a common headerCedric Nugteren
2016-06-08Added global memory synchronisation for better cache performance on ARM Mali ↵Cedric Nugteren
GPUs
2016-06-01Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'Cedric Nugteren
2016-05-26Added half-precision tests for the clBLAS reference through conversion to ↵Cedric Nugteren
single-precision
2016-05-25Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMMCedric Nugteren
2016-05-22Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2Cedric Nugteren