summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2016-10-03Fixed a const-correctness issue with complex conjugation in the GEMM direct ↵Cedric Nugteren
kernel
2016-10-03Added functions to load from off-chip to local memory without vector loads ↵Cedric Nugteren
for the GEMM direct kernels
2016-10-03Re-organised GEMM direct kernel and added faster fall-back version for ↵Cedric Nugteren
incomplete rectangles
2016-10-02Set the default number of runs for all kernels to at least 2 runsCedric Nugteren
2016-10-02Specialised the GEMM direct kernel in four ways for ↵Cedric Nugteren
transposing/non-transposing: NN, NT, TN, TT
2016-10-02Split the GEMM direct kernel into two files; set the default tuning target ↵Cedric Nugteren
to 256-256-256
2016-10-01Added padding to the local memory of the GEMM direct kernelCedric Nugteren
2016-10-01Added default num-runs to the tuner adding averaging over 10 runs as a ↵Cedric Nugteren
default for the GEMM direct kernel
2016-10-01Merge branch 'development' into gemm_directCedric Nugteren
2016-09-27Added an option to run tuned kernels multiple times to average execution ↵Cedric Nugteren
times; requires CLTune 2.5.0
2016-09-27Updated to version 8.0 of the CLCudaAPI headerCedric Nugteren
2016-09-27Fixed the local memory size computation for the GEMM tunersCedric Nugteren
2016-09-27Now generates test/client/tuner data using a fixed seed to enable ↵Cedric Nugteren
reproducability of results
2016-09-25Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, ↵Cedric Nugteren
NWGD and KWGD into one WGD parameter
2016-09-25Separated the tuning parameters of the new direct GEMM kernel from the ↵Cedric Nugteren
indirect version
2016-09-25Added a first version of the direct version of GEMM with local memoryCedric Nugteren
2016-09-21Merge branch 'development' into gemm_directCedric Nugteren
2016-09-21It is now possible to set the OpenCL compiler options through an ↵Cedric Nugteren
environmental variable
2016-09-12Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ↵Cedric Nugteren
can't handle long strings
2016-09-12Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are ↵Cedric Nugteren
now automatically taken from 32-bit if there are no entries at all
2016-09-11Complete re-write of the database script. Changed Pandas for the much faster ↵Cedric Nugteren
and convienient plain JSON/dict data-type
2016-09-10Updated database based on exhaustive tuning results for GEMM for the R9 ↵Cedric Nugteren
M370X GPU
2016-09-10Updated the database script to remove duplicate entries: keeps only the ↵Cedric Nugteren
best-performing cases for a specific parameters combination
2016-09-06Split GEMM tuning in two parts: a small set of tuning parameters which is ↵Cedric Nugteren
explored exhaustively and a larger set which is explored randomly
2016-09-04The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ↵Cedric Nugteren
problems if C contains NaNs
2016-09-03Added tuning results for Intel Broadwell 5500 GT2 GPUCedric Nugteren
2016-09-03Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to ↵Cedric Nugteren
handle duplicate entries of different runs
2016-08-27test/correctness: read platform and device from environmentIvan Shapovalov
Support passing environment variables CLBLAST_PLATFORM and CLBLAST_DEVICE instead of -platform and -device arguments to test executables. This is for `ctest`.
2016-08-22Merge branch 'database_defaults' into developmentCedric Nugteren
2016-08-21Also changed the default-default for unknown device types to use the same ↵Cedric Nugteren
method as for known device groups
2016-08-21Increased the ratio of GEMM tuning results to explore; reduced the tuning ↵Cedric Nugteren
search space to have a better chance to evaluate more likely parameter combinations
2016-08-20Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into ↵Cedric Nugteren
dvasschemacq-master Conflicts: src/kernels/level1/xaxpy.opencl src/kernels/level2/xgemv.opencl src/kernels/level2/xgemv_fast.opencl src/kernels/level2/xger.opencl src/kernels/level2/xher.opencl src/kernels/level2/xher2.opencl src/kernels/level3/xgemm_part2.opencl
2016-08-18Adapt opencl files for 1.1 OpenCLD. Van Assche
In OpenCL 1.1 __kernel has to be before __attribute__, at least with Vivante compiler.
2016-08-15Updated the database script to calculate the relative best performance of ↵Cedric Nugteren
tuning results common for a device/vendor type
2016-07-26Fixed issues related to the recent changes in the Xgemm infrastructureCedric Nugteren
2016-07-26Merge branch 'development' into gemm_directCedric Nugteren
2016-07-25Removed all old tuning results for the XgemvFastRot kernel; re-added for a ↵Cedric Nugteren
couple of devices
2016-07-25Moved the XgemvFast and XgemvFastRot tuning database into a separate fileCedric Nugteren
2016-07-24Merge branch 'development' into gemv_performanceCedric Nugteren
2016-07-24Minor improvements after merging in groundwork for custom tuning parameters ↵Cedric Nugteren
and kernels
2016-07-23Fixe a bug in the new XgemvFastRot kernel related to local memory sizeCedric Nugteren
2016-07-23Further improvements to the XgemvFastRot kernel, properly enables coalescing nowCedric Nugteren
2016-07-23Improved the XgemvFastRot kernel by tiled loading of the input matrix A, ↵Cedric Nugteren
enabling better memory performance
2016-07-22clblast::Database, clblast::Routine: implement "database overlays" provided ↵Ivan Shapovalov
by routine implementation
2016-07-22clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, ↵Ivan Shapovalov
support empty LWS
2016-07-22cl::Kernel: skip NULL entries in waitForEventsIvan Shapovalov
2016-07-22clblast::RunKernel, cl::Kernel: take const vector as waitForEventsIvan Shapovalov
2016-07-22xgemm: do not hardcode kernel requirements for internal matrix layoutIvan Shapovalov
Do not hardcode the knowledge about "A and C col-major, B row-major". This allows for easier reuse of the DoGemm() routine with different kernels.
2016-07-17Improved the GEMM direct kernel by adding register blocking. Still not fast ↵Cedric Nugteren
though
2016-07-16Created infrastructure to support a direct GEMM kernel; added correct but ↵Cedric Nugteren
slow reference kernel as a place-holder