summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-08-21Also changed the default-default for unknown device types to use the same ↵Cedric Nugteren
method as for known device groups
2016-08-21Increased the ratio of GEMM tuning results to explore; reduced the tuning ↵Cedric Nugteren
search space to have a better chance to evaluate more likely parameter combinations
2016-08-21Updated the changelog; refactored the database-get-bests code a bitCedric Nugteren
2016-08-15Updated the database script to calculate the relative best performance of ↵Cedric Nugteren
tuning results common for a device/vendor type
2016-08-09Improved the speed of the new common-best defaults method for the database ↵Cedric Nugteren
generation
2016-08-07Added a first version of the database's common-best default calculationCedric Nugteren
2016-07-28Minor update regarding the previous CMake export/install target changesCedric Nugteren
2016-07-28Merge pull request #86 from intelfx/cmakeCedric Nugteren
CMakeLists.txt: provide a find_package() config for dependent projects
2016-07-28.appveyor.yml: move {OPENCL,CLBLAST}_ROOT out of source treeIvan Shapovalov
Reasoning is the same as in previous commit: CMake does not like having OpenCL header path inside of the source tree. CLBLAST_ROOT is moved for uniformity.
2016-07-28.travis.yml: use OpenCL ICD Loader and headers shipped by distroIvan Shapovalov
Using our own headers causes problems with CMake which does not like having OpenCL header path inside of the source tree. While at it, use distro's universal OpenCL loader as well.
2016-07-28CMakeLists.txt: use target_include_directories()Ivan Shapovalov
2016-07-28CMakeLists.txt: provide a find_package() config for dependent projectsIvan Shapovalov
2016-07-26Merge branch 'gemv_performance' into developmentCedric Nugteren
2016-07-25Removed all old tuning results for the XgemvFastRot kernel; re-added for a ↵Cedric Nugteren
couple of devices
2016-07-25Moved the XgemvFast and XgemvFastRot tuning database into a separate fileCedric Nugteren
2016-07-24Merge branch 'development' into gemv_performanceCedric Nugteren
2016-07-24Minor improvements after merging in groundwork for custom tuning parameters ↵Cedric Nugteren
and kernels
2016-07-24Merge pull request #84 from intelfx/device-specific-kernelsCedric Nugteren
Groundwork for device-specific routines
2016-07-24Refactored the Python database script: separated functionality in modules, ↵Cedric Nugteren
now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up
2016-07-23Fixe a bug in the new XgemvFastRot kernel related to local memory sizeCedric Nugteren
2016-07-23Further improvements to the XgemvFastRot kernel, properly enables coalescing nowCedric Nugteren
2016-07-23Improved the XgemvFastRot kernel by tiled loading of the input matrix A, ↵Cedric Nugteren
enabling better memory performance
2016-07-22clblast::Database, clblast::Routine: implement "database overlays" provided ↵Ivan Shapovalov
by routine implementation
2016-07-22clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, ↵Ivan Shapovalov
support empty LWS
2016-07-22cl::Kernel: skip NULL entries in waitForEventsIvan Shapovalov
2016-07-22clblast::RunKernel, cl::Kernel: take const vector as waitForEventsIvan Shapovalov
2016-07-22xgemm: do not hardcode kernel requirements for internal matrix layoutIvan Shapovalov
Do not hardcode the knowledge about "A and C col-major, B row-major". This allows for easier reuse of the DoGemm() routine with different kernels.
2016-07-22CMakeLists.txt: use ${clblast_SOURCE_DIR} instead of ${CMAKE_SOURCE_DIR}Ivan Shapovalov
2016-07-16Fixed some more types and type conversions in the clpp11 interface to OpenCLCedric Nugteren
2016-07-16Merge pull request #80 from gcp/getdevinfo_fixesCedric Nugteren
Make sure the passed types are large enough.
2016-07-16Removed an unused variable from the copy-transpose-pad functionCedric Nugteren
2016-07-13Make sure the passed types are large enough.Gian-Carlo Pascutto
Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies.
2016-07-10Now passing alpha/beta to the kernel as arguments as before fp16 support; in ↵Cedric Nugteren
case of fp16 arguments are cast on host and in kernel
2016-07-10Added tuning results for AMD Oland and for Intel Graphics HD 530Cedric Nugteren
2016-07-10Fixed a bug related to the cache and retrieval of programs based on the ↵Cedric Nugteren
OpenCL context
2016-07-08Cache now compares cl_context instead of a pointer to a context; added ↵Cedric Nugteren
verbose print statements to the cache
2016-07-06Added a VERBOSE mode to debug performance: now prints details about ↵Cedric Nugteren
compilation and kernel execution to screen
2016-07-06Added an option to the performance clients to do a warm-up run before timingCedric Nugteren
2016-07-04Fixed a linking issue with the tuners on Visual StudioCNugteren
2016-07-03Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)Cedric Nugteren
2016-07-03Merge pull request #76 from gcp/fix_local_mem_sizeCedric Nugteren
Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems
2016-07-02Ensure clGetKernelWorkGroupInfo return value fits.Gian-Carlo Pascutto
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account.
2016-07-02Prints the current pandas version and reports the minimum required versionCedric Nugteren
2016-07-02Fixed some memory leaks related to events not properly cleaned-upCedric Nugteren
2016-06-30Added declspec(dllexport) to ClearCache and FillCache, and added ↵Cedric Nugteren
declspec(dllimport) when not building the library
2016-06-29Updated to version 6.0 of the CLCudaAPI headerCedric Nugteren
2016-06-28Prepared the changelog for the next releaseCedric Nugteren
2016-06-28Updated to version 0.8.0Cedric Nugteren
2016-06-28Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' (2)Cedric Nugteren
2016-06-28Changed the AppVeyor buildscript to use nmake instead of 'cmake --build'Cedric Nugteren