Age | Commit message (Collapse) | Author | |
---|---|---|---|
2016-08-21 | Also changed the default-default for unknown device types to use the same ↵ | Cedric Nugteren | |
method as for known device groups | |||
2016-08-21 | Increased the ratio of GEMM tuning results to explore; reduced the tuning ↵ | Cedric Nugteren | |
search space to have a better chance to evaluate more likely parameter combinations | |||
2016-08-21 | Updated the changelog; refactored the database-get-bests code a bit | Cedric Nugteren | |
2016-08-15 | Updated the database script to calculate the relative best performance of ↵ | Cedric Nugteren | |
tuning results common for a device/vendor type | |||
2016-08-09 | Improved the speed of the new common-best defaults method for the database ↵ | Cedric Nugteren | |
generation | |||
2016-08-07 | Added a first version of the database's common-best default calculation | Cedric Nugteren | |
2016-07-28 | Minor update regarding the previous CMake export/install target changes | Cedric Nugteren | |
2016-07-28 | Merge pull request #86 from intelfx/cmake | Cedric Nugteren | |
CMakeLists.txt: provide a find_package() config for dependent projects | |||
2016-07-28 | .appveyor.yml: move {OPENCL,CLBLAST}_ROOT out of source tree | Ivan Shapovalov | |
Reasoning is the same as in previous commit: CMake does not like having OpenCL header path inside of the source tree. CLBLAST_ROOT is moved for uniformity. | |||
2016-07-28 | .travis.yml: use OpenCL ICD Loader and headers shipped by distro | Ivan Shapovalov | |
Using our own headers causes problems with CMake which does not like having OpenCL header path inside of the source tree. While at it, use distro's universal OpenCL loader as well. | |||
2016-07-28 | CMakeLists.txt: use target_include_directories() | Ivan Shapovalov | |
2016-07-28 | CMakeLists.txt: provide a find_package() config for dependent projects | Ivan Shapovalov | |
2016-07-26 | Merge branch 'gemv_performance' into development | Cedric Nugteren | |
2016-07-25 | Removed all old tuning results for the XgemvFastRot kernel; re-added for a ↵ | Cedric Nugteren | |
couple of devices | |||
2016-07-25 | Moved the XgemvFast and XgemvFastRot tuning database into a separate file | Cedric Nugteren | |
2016-07-24 | Merge branch 'development' into gemv_performance | Cedric Nugteren | |
2016-07-24 | Minor improvements after merging in groundwork for custom tuning parameters ↵ | Cedric Nugteren | |
and kernels | |||
2016-07-24 | Merge pull request #84 from intelfx/device-specific-kernels | Cedric Nugteren | |
Groundwork for device-specific routines | |||
2016-07-24 | Refactored the Python database script: separated functionality in modules, ↵ | Cedric Nugteren | |
now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up | |||
2016-07-23 | Fixe a bug in the new XgemvFastRot kernel related to local memory size | Cedric Nugteren | |
2016-07-23 | Further improvements to the XgemvFastRot kernel, properly enables coalescing now | Cedric Nugteren | |
2016-07-23 | Improved the XgemvFastRot kernel by tiled loading of the input matrix A, ↵ | Cedric Nugteren | |
enabling better memory performance | |||
2016-07-22 | clblast::Database, clblast::Routine: implement "database overlays" provided ↵ | Ivan Shapovalov | |
by routine implementation | |||
2016-07-22 | clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, ↵ | Ivan Shapovalov | |
support empty LWS | |||
2016-07-22 | cl::Kernel: skip NULL entries in waitForEvents | Ivan Shapovalov | |
2016-07-22 | clblast::RunKernel, cl::Kernel: take const vector as waitForEvents | Ivan Shapovalov | |
2016-07-22 | xgemm: do not hardcode kernel requirements for internal matrix layout | Ivan Shapovalov | |
Do not hardcode the knowledge about "A and C col-major, B row-major". This allows for easier reuse of the DoGemm() routine with different kernels. | |||
2016-07-22 | CMakeLists.txt: use ${clblast_SOURCE_DIR} instead of ${CMAKE_SOURCE_DIR} | Ivan Shapovalov | |
2016-07-16 | Fixed some more types and type conversions in the clpp11 interface to OpenCL | Cedric Nugteren | |
2016-07-16 | Merge pull request #80 from gcp/getdevinfo_fixes | Cedric Nugteren | |
Make sure the passed types are large enough. | |||
2016-07-16 | Removed an unused variable from the copy-transpose-pad function | Cedric Nugteren | |
2016-07-13 | Make sure the passed types are large enough. | Gian-Carlo Pascutto | |
Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies. | |||
2016-07-10 | Now passing alpha/beta to the kernel as arguments as before fp16 support; in ↵ | Cedric Nugteren | |
case of fp16 arguments are cast on host and in kernel | |||
2016-07-10 | Added tuning results for AMD Oland and for Intel Graphics HD 530 | Cedric Nugteren | |
2016-07-10 | Fixed a bug related to the cache and retrieval of programs based on the ↵ | Cedric Nugteren | |
OpenCL context | |||
2016-07-08 | Cache now compares cl_context instead of a pointer to a context; added ↵ | Cedric Nugteren | |
verbose print statements to the cache | |||
2016-07-06 | Added a VERBOSE mode to debug performance: now prints details about ↵ | Cedric Nugteren | |
compilation and kernel execution to screen | |||
2016-07-06 | Added an option to the performance clients to do a warm-up run before timing | Cedric Nugteren | |
2016-07-04 | Fixed a linking issue with the tuners on Visual Studio | CNugteren | |
2016-07-03 | Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp) | Cedric Nugteren | |
2016-07-03 | Merge pull request #76 from gcp/fix_local_mem_size | Cedric Nugteren | |
Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems | |||
2016-07-02 | Ensure clGetKernelWorkGroupInfo return value fits. | Gian-Carlo Pascutto | |
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account. | |||
2016-07-02 | Prints the current pandas version and reports the minimum required version | Cedric Nugteren | |
2016-07-02 | Fixed some memory leaks related to events not properly cleaned-up | Cedric Nugteren | |
2016-06-30 | Added declspec(dllexport) to ClearCache and FillCache, and added ↵ | Cedric Nugteren | |
declspec(dllimport) when not building the library | |||
2016-06-29 | Updated to version 6.0 of the CLCudaAPI header | Cedric Nugteren | |
2016-06-28 | Prepared the changelog for the next release | Cedric Nugteren | |
2016-06-28 | Updated to version 0.8.0 | Cedric Nugteren | |
2016-06-28 | Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' (2) | Cedric Nugteren | |
2016-06-28 | Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' | Cedric Nugteren | |