summaryrefslogtreecommitdiff
path: root/src/routines/level2
AgeCommit message (Collapse)Author
2018-05-31Added error-checking for half-empty local work group sizes; fixed a minor ↵Cedric Nugteren
TRSV global worksize issue
2018-05-27Added a check to return 'NotImplemented' error code in case of systems with ↵Cedric Nugteren
< 16 LWGS for TSRV and TRSM
2018-05-27Made FillMatrix and FillVector functions take a configurable local workgroup ↵Cedric Nugteren
size
2017-12-23Updated the database to use the new TRSV and Invert tunersCedric Nugteren
2017-10-27Added GEMV synchronisation for the TRSV routine: similar bug as in TRSMCedric Nugteren
2017-04-07Added some missing const-nessCedric Nugteren
2017-02-27Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to ↵Cedric Nugteren
incorrect constants
2017-02-26Merge branch 'development' into triangular_solversCedric Nugteren
2017-02-13Fixed a small bug in GEMV: unused kernel in parameter listCedric Nugteren
2017-02-05Merge branch 'development' into triangular_solversCedric Nugteren
2017-02-04Improved substition kernels a bit; added complex supportCedric Nugteren
2017-02-04Completed a first STRSV implementationCedric Nugteren
2017-02-04Added row-major support for TRSVCedric Nugteren
2017-01-29Added first (incomplete) version of TRSV routineCedric Nugteren
2017-01-24Routine, Cache: generalize, reduce amount of copying in fast pathIvan Shapovalov
Implement a generalized Cache<K, V>. Two variants are provided: the first one is based on std::map, using C++14-specific transparent std::less<> and generalized std::map::find() to allow searching by tuple of references. The second one is based on std::vector and O(n) lookup, but remains C++11-compliant.
2017-01-20Added prototype for the TRSV routineCedric Nugteren
2016-10-22Routine: get rid of ::SetUp()Ivan Shapovalov
Since we now use C++ exceptions inside the implementation (and exceptions can be thrown from constructors), there is no need for a separate Routine::SetUp() function. For this, we also change the way how the kernel source string is constructed. The kernel-specific source code is now passed to the Routine ctor via an initializer_list of C strings to avoid unnecessary data copying while also working around C1091 of MSVC 2013.
2016-10-22treewide: use C++ exceptions properlyIvan Shapovalov
Since the codebase is designed around proper C++ idioms such as RAII, it makes sense to only use C++ exceptions internally instead of mixing exceptions and error codes. The exceptions are now caught at top level to preserve compatibility with the existing error code-based API. Note that we deliberately do not catch C++ runtime errors (such as `std::bad_alloc`) nor logic errors (aka failed assertions) because no actual handling can ever happen for such errors. However, in the C interface we do catch _all_ exceptions (...) and convert them into a wild-card error code.
2016-07-25Moved the XgemvFast and XgemvFastRot tuning database into a separate fileCedric Nugteren
2016-07-23Improved the XgemvFastRot kernel by tiled loading of the input matrix A, ↵Cedric Nugteren
enabling better memory performance
2016-07-10Now passing alpha/beta to the kernel as arguments as before fp16 support; in ↵Cedric Nugteren
case of fp16 arguments are cast on host and in kernel
2016-06-19Renamed all C++ source files to .cpp to match the .hpp extension betterCedric Nugteren
2016-06-18Moved all headers into the source tree, changed headers to .hpp extensionCedric Nugteren
2016-06-18Removed the template from the Routine base-classCedric Nugteren
2016-06-17Removed the precision argument from the routines in favor of a single ↵Cedric Nugteren
templated function
2016-06-17Removed the interface to the cache functions from the Routine class, calls ↵Cedric Nugteren
them directly now
2016-06-17Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine ↵Cedric Nugteren
class
2016-06-17Moved the test-for-valid-buffers function from the Routine class to separate ↵Cedric Nugteren
functions in a separate file
2016-05-22Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2Cedric Nugteren
2016-05-22Prepared the GER kernels and tuner for half-precision supportCedric Nugteren
2016-05-22Added level-2 half-precision routines ↵Cedric Nugteren
HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
2016-05-22Prepared the GEMV kernels and tuner for half-precision supportCedric Nugteren
2016-04-28Fixed the cache to store binaries instead of OpenCL programsCedric Nugteren
2016-04-09Events are now properly implemented using event waiting list and asking the ↵cnugteren
user to wait for event completion
2016-04-04Removed redundant queue synchronisation statementscnugteren
2016-03-06Fixed a bug in the GER-family of routines due to incorrect division of the ↵Cedric Nugteren
workgroup size
2016-03-06Added preliminary support for xHPR2 and xSPR2 routinesCedric Nugteren
2016-03-02Added preliminary support for xHER2 and xSYR2 routinesCedric Nugteren
2016-02-28Fixed a couple of correctness bugs in the Xher kernelsCedric Nugteren
2016-02-28Added support for xHER, xHPR, xSYR, and xSPR routinesCedric Nugteren
2016-02-20Added support for xGERU and xGERC routinesCedric Nugteren
2016-02-20Added XGER routine, kernel, and tunerCedric Nugteren
2016-02-08Split-up the XGEMV kernel in two partsCedric Nugteren
2015-09-26Added TRMV/TBMV/TPMV routinesCNugteren
2015-09-19Added SBMV and SPMV routinesCNugteren
2015-09-19Added the HPMV routineCNugteren
2015-09-19Added the HBMV routineCNugteren
2015-09-18Improved the organization and performance of level 2 routinesCNugteren
2015-09-18Added first version of banded matrix-vector multiplicationCNugteren
2015-08-04Added distinguished names for GEMV inherited HEMV/SYMVCNugteren