summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2016-10-03Re-organised GEMM direct kernel and added faster fall-back version for incomp...Cedric Nugteren
2016-10-02Set the default number of runs for all kernels to at least 2 runsCedric Nugteren
2016-10-02Specialised the GEMM direct kernel in four ways for transposing/non-transposi...Cedric Nugteren
2016-10-02Split the GEMM direct kernel into two files; set the default tuning target to...Cedric Nugteren
2016-10-01Added padding to the local memory of the GEMM direct kernelCedric Nugteren
2016-10-01Added default num-runs to the tuner adding averaging over 10 runs as a defaul...Cedric Nugteren
2016-10-01Merge branch 'development' into gemm_directCedric Nugteren
2016-09-27Added an option to run tuned kernels multiple times to average execution time...Cedric Nugteren
2016-09-27Updated to version 8.0 of the CLCudaAPI headerCedric Nugteren
2016-09-27Fixed the local memory size computation for the GEMM tunersCedric Nugteren
2016-09-27Now generates test/client/tuner data using a fixed seed to enable reproducabi...Cedric Nugteren
2016-09-27Added more relaxed error checking for the half-precision testsCedric Nugteren
2016-09-27Merge pull request #103 from dividiti/link_clblas_with_pthreadCedric Nugteren
2016-09-26Use cross-platform thread lib idiom instead of *nix-specific pthread.Anton Lokhmotov
2016-09-26Link clBLAS together with pthread.Anton Lokhmotov
2016-09-25Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, ...Cedric Nugteren
2016-09-25Separated the tuning parameters of the new direct GEMM kernel from the indire...Cedric Nugteren
2016-09-25Added a first version of the direct version of GEMM with local memoryCedric Nugteren
2016-09-25Updated AppVeyor script to fix an issue with changes in the latest AppVeyor s...Cedric Nugteren
2016-09-25Fix another issue with the packaging in the AppVeyor scriptCedric Nugteren
2016-09-25Fix an issue with the packaging in the AppVeyor scriptCedric Nugteren
2016-09-25Updated AppVeyor script to fix an issue with changes in the latest AppVeyor s...Cedric Nugteren
2016-09-24Merge pull request #101 from dividiti/add_ref_includes_to_test_correctness_co...Cedric Nugteren
2016-09-24Add path to ref library header when building tests.Anton Lokhmotov
2016-09-22Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast ...Cedric Nugteren
2016-09-21Merge branch 'development' into gemm_directCedric Nugteren
2016-09-21It is now possible to set the OpenCL compiler options through an environmenta...Cedric Nugteren
2016-09-21Merge branch 'master' into developmentCedric Nugteren
2016-09-20Merge pull request #100 from gpu/masterCedric Nugteren
2016-09-20Fixed link in README.mdMarco Hutter
2016-09-13Merge pull request #99 from CNugteren/developmentCedric Nugteren
2016-09-13Updated to version 0.9.0Cedric Nugteren
2016-09-13Renamed the DEFAULT_DEVICE and DEFAULT_PLATFORM env variables to be in line w...Cedric Nugteren
2016-09-13Merge pull request #98 from intelfx/no-ignored-attributesCedric Nugteren
2016-09-13CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warningsIvan Shapovalov
2016-09-12Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC ...Cedric Nugteren
2016-09-12Merge branch 'database_rewrite' into developmentCedric Nugteren
2016-09-12Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are n...Cedric Nugteren
2016-09-11Complete re-write of the database script. Changed Pandas for the much faster ...Cedric Nugteren
2016-09-10Merge branch 'xgemm_tuner_exhaustive' into developmentCedric Nugteren
2016-09-10Updated database based on exhaustive tuning results for GEMM for the R9 M370X...Cedric Nugteren
2016-09-10Updated the database script to remove duplicate entries: keeps only the best-...Cedric Nugteren
2016-09-06Split GEMM tuning in two parts: a small set of tuning parameters which is exp...Cedric Nugteren
2016-09-04Refactored the Python C++ generator script; now confirms to the PEP8 styleguideCedric Nugteren
2016-09-04The GEMM kernel no longer adds beta*C in case beta is zero; this would cause ...Cedric Nugteren
2016-09-03Added tuning results for Intel Broadwell 5500 GT2 GPUCedric Nugteren
2016-09-03Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to h...Cedric Nugteren
2016-08-27Merge pull request #93 from intelfx/test-read-environmentCedric Nugteren
2016-08-27test/correctness: read platform and device from environmentIvan Shapovalov
2016-08-22Merge branch 'database_defaults' into developmentCedric Nugteren