summaryrefslogtreecommitdiff
path: root/CHANGELOG
diff options
context:
space:
mode:
authorCedric Nugteren <web@cedricnugteren.nl>2016-09-04 17:21:16 +0200
committerCedric Nugteren <web@cedricnugteren.nl>2016-09-04 17:21:16 +0200
commitb30b26b89e52eceb06f5661622c3de0312206ab4 (patch)
tree22fa403c54e5039cb8e34723d1e47007c71dcba5 /CHANGELOG
parent521bf6cdfc650f82488c1e07918eeabd7b328a78 (diff)
The GEMM kernel no longer adds beta*C in case beta is zero; this would cause problems if C contains NaNs
Diffstat (limited to 'CHANGELOG')
-rw-r--r--CHANGELOG5
1 files changed, 3 insertions, 2 deletions
diff --git a/CHANGELOG b/CHANGELOG
index 9b027e6d..10cde25d 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,15 +1,16 @@
Development version (next release)
- Updated to version 6.0 of the CLCudaAPI C++11 OpenCL header
+- Improved performance significantly of rotated GEMV computations
+- Improved performance of unseen/un-tuned devices by a better default tuning parameter selection
- Fixed proper MSVC dllimport and dllexport declarations
- Fixed memory leaks related to events not being released
- Fixed a bug with a size_t and cl_ulong mismatch on 32-bit systems
- Fixed a bug related to the cache and retrieval of programs based on the OpenCL context
- Fixed a performance issue (caused by fp16 support) by optimizing alpha/beta parameter passing to kernels
- Fixed a bug in the OpenCL kernels: now placing __kernel before __attribute__
+- Fixed a bug in level-3 routines when beta is zero and matrix C contains NaNs
- Added an option (-warm_up) to do a warm-up run before timing in the performance clients
-- Improved performance significantly of rotated GEMV computations
-- Improved performance of unseen/un-tuned devices by a better default tuning parameter selection
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see README)