Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
Added support for performance testing against cuBLAS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
works
|
|
|
|
|
|
Conflicts:
scripts/generator/generator.py
|
|
Patch to make tests complete on Apple's CPU implementation
|
|
|
|
according to the C++11 standard
|
|
|
|
this makes the test work, it does not focus on good performance
|
|
struct; fixes bug on Apple OpenCL
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Benchmark scripts re-written in Python/Matplotlib
|
|
CBLAS reference code is now separated from device-host copies
|
|
|
|
|
|
CBLAS reference code; for fair timing and code de-duplication
|
|
|
|
|
|
Added a first batched version of the GEMM routine
|
|
|
|
|
|
|
|
|
|
|
|
GEMM kernel
|
|
|
|
Added the batched version of the AXPY routine
|
|
|
|
implementation
|
|
|