Age | Commit message (Expand) | Author |
2017-05-11 | Re-added random tuning for GEMM after accidental removal | Cedric Nugteren |
2017-04-22 | Increased the default number of runs for the tuner from 2 up to 10 for fast k... | Cedric Nugteren |
2017-04-21 | Increased the default number of runs for GEMV tuning; updated GEMV tuning res... | Cedric Nugteren |
2017-04-17 | Fixed a namespace clash with CUDA FP16 for the half-datatype | Cedric Nugteren |
2017-04-14 | Added a new Xaxpy kernel in between the regular and fast version in | Cedric Nugteren |
2017-03-14 | Added the possibility to tune batched kernels | Cedric Nugteren |
2017-03-05 | Changed the way the test-data is generated: now using a single MT generator a... | Cedric Nugteren |
2016-11-27 | Made it possible to use the command-line environmental variables for each exe... | Cedric Nugteren |
2016-10-22 | Moved files around a bit; created a utilities subfolder | Cedric Nugteren |
2016-10-03 | Re-organised GEMM direct kernel and added faster fall-back version for incomp... | Cedric Nugteren |
2016-10-02 | Set the default number of runs for all kernels to at least 2 runs | Cedric Nugteren |
2016-10-02 | Specialised the GEMM direct kernel in four ways for transposing/non-transposi... | Cedric Nugteren |
2016-10-02 | Split the GEMM direct kernel into two files; set the default tuning target to... | Cedric Nugteren |
2016-10-01 | Added padding to the local memory of the GEMM direct kernel | Cedric Nugteren |
2016-10-01 | Added default num-runs to the tuner adding averaging over 10 runs as a defaul... | Cedric Nugteren |
2016-10-01 | Merge branch 'development' into gemm_direct | Cedric Nugteren |
2016-09-27 | Added an option to run tuned kernels multiple times to average execution time... | Cedric Nugteren |
2016-09-27 | Fixed the local memory size computation for the GEMM tuners | Cedric Nugteren |
2016-09-27 | Now generates test/client/tuner data using a fixed seed to enable reproducabi... | Cedric Nugteren |
2016-09-25 | Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, ... | Cedric Nugteren |
2016-09-12 | Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ... | Cedric Nugteren |
2016-09-06 | Split GEMM tuning in two parts: a small set of tuning parameters which is exp... | Cedric Nugteren |
2016-08-21 | Increased the ratio of GEMM tuning results to explore; reduced the tuning sea... | Cedric Nugteren |
2016-07-25 | Moved the XgemvFast and XgemvFastRot tuning database into a separate file | Cedric Nugteren |
2016-07-23 | Fixe a bug in the new XgemvFastRot kernel related to local memory size | Cedric Nugteren |
2016-07-23 | Further improvements to the XgemvFastRot kernel, properly enables coalescing now | Cedric Nugteren |
2016-07-23 | Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enab... | Cedric Nugteren |
2016-07-10 | Now passing alpha/beta to the kernel as arguments as before fp16 support; in ... | Cedric Nugteren |
2016-06-19 | Renamed all C++ source files to .cpp to match the .hpp extension better | Cedric Nugteren |
2016-06-18 | Moved all headers into the source tree, changed headers to .hpp extension | Cedric Nugteren |
2016-06-16 | Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and... | Cedric Nugteren |
2016-06-14 | Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) a... | Cedric Nugteren |
2016-05-22 | Prepared the GER kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-22 | Prepared the GEMV kernels and tuner for half-precision support | Cedric Nugteren |
2016-05-22 | Added half-precision support for all level 1 routines | Cedric Nugteren |
2016-05-16 | Prepared GEMM and supporting kernels and tuners for half-precision support | Cedric Nugteren |
2016-05-15 | Added header with conversions from and to half-precision floating-point | Cedric Nugteren |
2016-05-13 | Initial experimental version of the half-precision HAXPY routine | Cedric Nugteren |
2016-05-01 | Made the default xDOT tuning size smaller | Cedric Nugteren |
2016-04-14 | Updated the reduction-kernel tuner to also tune the epilogue | cnugteren |
2016-02-28 | Added support for xHER, xHPR, xSYR, and xSPR routines | Cedric Nugteren |
2016-02-20 | Added XGER routine, kernel, and tuner | Cedric Nugteren |
2016-02-08 | Separated the GEMM kernel in two parts to reduce string length for MSVC | Cedric Nugteren |
2016-02-08 | Split-up the XGEMV kernel in two parts | Cedric Nugteren |
2016-02-06 | Reduced the maximum workgroup-size for GEMV kernels further | CNugteren |
2016-02-06 | Reduced unrolling factor in xgemv kernel to reduce compilation times | CNugteren |
2015-10-28 | Now sets local memory size in xgemv tuner properly | CNugteren |
2015-10-25 | Fixed an arguments-related bug in the GEMV tuner | CNugteren |
2015-10-12 | Moved level3 kernel files to a subfolder | CNugteren |
2015-09-18 | Added first version of banded matrix-vector multiplication | CNugteren |