summaryrefslogtreecommitdiff
path: root/external/clBLAS/CHANGELOG
blob: 03b9faffd2b86bc3e7a925e48cf1ac614e60cc7c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
# ########################################################################
# Copyright 2013 Advanced Micro Devices, Inc.
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
# http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ########################################################################

clBLAS Readme

Version:       1.10
Release Date:  April 2013

ChangeLog:
____________
Current Version:
New:
  * New Level 1 routines added (an 'x' implies all 4 precisions)
        xSWAP, xCOPY, xSCAL, CSSCAL, ZDSCAL, xAXPY, SDOT, DDOT, 
        CDOTU, ZDOTU, CDOTC, ZDOTC, xROTG, SROTMG, DROTMG,
		SROT, DROT, CSROT, ZDROT, SROTM, DROTM, SNRM2, DNRM2,
		SCNRM2, DZNRM2, ixAMAX, SASUM, DASUM, SCASUM, DZASUM
  * Samples have been added for the new functions 
  * This release tested using the 9.012 runtime driver and the 2.8 APPSDK
Fixed:
  * Failures in *trsm functions with clMAGMA tests
Known Issues:
  * Failures & hangs in ztrmm, *trsv, *tpsv functions on Southern Island GPU devices
  * Failures in zgemm functions on Northern Island GPU devices
  * Failures & hangs are expected to be fixed in the upcoming AMD graphics driver versions.
		It is strongly recommended that users keep their graphics driver versions up to date. 
		
____________
Version 1.8.291:
Fixed:
  * Failures in the following functions: ssyr2, ssyr2k, strsm, strsv, ssyrk, cher, 
        ctrsv, csymm, cher2, ztrmm on Southern Island GPU devices.
  * Failures in the following functions: dsyr, dsyr2, dgemv, dsyrk,
        dsyr2k, zsyr2k on Trinity platforms. 
Known Issues:
  * Failures in *trsm functions with clMAGMA tests
  
____________
Version 1.8.269 (Beta, clMAGMA support):
New:
  * No new routines
  * This release tested using the 8.961 runtime driver and the 2.6 APPSDK

Known Issues:
  * The clBLASTune executable has been observed to hang on Windows.  If 
        this happens, abort execution of the tune program; it is not required 
        for correct operation of the BLAS routines (as of 8.872).
  * clBLAS can return invalid results on CPU devices (as 
        of 8.961).  The CPU device is primarily a test/debug device, and GPU 
		devices are unaffected.
  * clBLAS can return invalid results for double precision functions (dsyr, 
        dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of 
        8.961).
  * clBLAS can return invalid results (ssyr2, ssyr2k, strsm, strsv, ssyrk, cher, 
        ctrsv, csymm, cher2, ztrmm) on Southern Island GPU devices (as of 8.961).

____________
Version 1.7 (Beta, clMAGMA support):
New:
  * New Level 3 routines added (an 'x' implies all 4 precisions)
		CHER2K, ZHER2K
  * New Level 2 routines added (an 'x' implies all 4 precisions)
        xTPMV, xTPSV, SSPVM, DSPMV, CHPMV, ZHPMV, SSPR, DSPR, CHPR, ZHPR, 
        SSPR2, DSPR2, CHPR2, ZHPR2, xGBMV, CHBMV, ZHBMV, SSBMV, DSBMV, 
        xTBMV, xTBSV
  * Samples have been added for the new functions, but are not fully tested 
  * This release tested using the 8.951 runtime driver and the 2.6 APPSDK
  * Note that documentation is incomplete for the new functions

Known Issues:
  * The clBLASTune executable has been observed to hang on Windows.  If 
        this happens, abort execution of the tune program; it is not required 
        for correct operation of the BLAS routines (as of 8.872).
  * clBLAS can return invalid results on CPU devices that support AVX (as 
        of 8.951).  CPU devices that support up to SSE3 are unaffected.  The 
        CPU device is primarily a test/debug device, and GPU devices are 
        unaffected.
  * clBLAS can return invalid results for double precision functions (dsyr, 
        dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of 
        8.951).
  * clBLAS can return invalid results (ssyr, ssyr2, strsv, ctrsv, ssyrk, 
        ssyr2k, ztrmm) on Southern Island GPU devices (as of 8.951).

____________
Version 1.6:
New:
  * New Level 3 routines added (an 'x' implies all 4 precisions)
        CSYRK, ZSYRK, CSYR2K, ZSYR2K, CHEMM, ZHEMM, CHERK, ZHERK, xSYMM
  * New Level 2 routines added (an 'x' implies all 4 precisions)
        CGEMV, ZGEMV, xTRMV, xTRSV, CHEMV, ZHEMV, SGER, DGER, CGERU, ZGERU, 
		CGERC, ZGERC, CHER, ZHER, CHER2, ZHER2, SSYR, DSYR, SSYR2, DSYR2
  * For all the original functions prior to 1.6, a new API has been introduced
        with an *Ex suffix.  These extended API's add new parameters that allow
		users to specify an offset to a matrix argument.  This allows efficient
		sub-matrix indexing within a clBLAS routine without requiring expensive
		sub-matrix copy operations.
  * Samples have been added for the new functions
  * Preview: Support for AMD Radeon� HD7000 series GPUs
  * This release tested using the 8.92 runtime driver and the 2.6 APP SDK

Known Issues:
  * The clBLASTune executable has been observed to hang on Windows.  If this
        happens, abort execution of the tune program; it is not required for 
		correct operation of the BLAS routines (as of 8.872).
  * The CPU device for clBLAS is not functioning for this release (as of 
        8.872).  The CPU device is primarily a test/debug device, and GPU 
		devices are unaffected.

____________
Version 1.4:
New:
  * New Level 3 routines added
        SSYRK, DSYRK, SSYR2K, DSYR2K
  * New Level 2 routines added
        SGEMV, DGEMV, SSYMV, DSYMV
  * The image support functions (clblasAddScratchImage, 
        clblasRemoveScratchImage) have been deprecated.  Images are no 
		longer required for the highest performance.
  * InstallShield is now used for APPML libraries.  The default install 
        location has changed from c:\amd\clBLAS to 
		C:\Program Files (x86)\AMD\clBLAS.  It is recommended that previous 
		versions of clBLAS are uninstalled first.
  * Samples have been added for the new functions
  * This release tested using the 8.872 runtime driver and the 2.5 APP SDK

Known Issues:
  * The clBLASTune executable has been observed to hang on Windows.  If this
        happens, abort execution of the tune program; it is not required for 
		correct operation of the BLAS routines (as of 8.872).
  * The CPU device for clBLAS is not functioning for this release (as of 
        8.872).  The CPU device is primarily a test/debug device, and GPU 
		devices are unaffected.


____________
Version 1.2:
  * The library now supports both 32- and 64-bit Windows and Linux operating 
        systems.
  * xTRSM routines are available in 1.2.
  * clBLAS routines return clBLASStatus error codes, instead of native 
        OpenCL error codes

Fixed:
  * xTRMM routines were not properly handling implicit unit diagonal 
        elements and implicit off-diagonal zero values specified by the BLAS 
        parameters SIDE, UPLO and DIAG.
  * Possible crash with CPU device on 32-bit systems.
  * clblasDgemm routine return an invalid event as its last argument.
  * clBLAS routines return clblasStatus error codes, instead of 
        native OpenCL error codes.
		
Known Issues:
  * The clBLASTune executable has been observed to hang on Windows.  If this
        happens, abort execution of the tune program; it is not required for 
		correct operation of the BLAS routines (as of 8.872).
  * The CPU device for clBLAS is not functioning for this release (as of 
        8.872).  The CPU device is primarily a test/debug device, and GPU 
		devices are unaffected.
		
____________________
Version 1.0:
  * Initial release

Known Issues:
  * Available only on Linux64.
  * xTRMM routines were not properly handling implicit unit diagonal elements 
        and implicit off-diagonal zero values specified by the BLAS parameters
		SIDE, UPLO and DIAG
  * clblasDgemm returned an invalid event as its last argument
	  
_____________
Building the Samples:

To install the Linux versions of clBLAS, uncompress the initial download, then 
execute the install script.

For example:

	tar -xf clBLAS-${version}-Linux.tar.gz
		- This installs three files into the local directory, one being an 
            executable bash script.

	sudo mkdir /opt/clBLAS-${version}
		- This pre-creates the install directory with proper permissions 
            in /opt if it is to be installed there. (This is the default.)

	./install-clBLAS-${version}.sh
        - This prints an EULA and uncompresses files into the chosen install 
		directory.

	cd ${installDir}/bin64
	export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clBLASLibDir}
		- Be sure to export library dependencies to resolve all external 
            linkages to the client program; you can create a bash script to 
			help automate this procedure.

	./example_sgemm
		- Run a simple client; one example is provided for each supported 
                  main BLAS function family.

The sample program does not ship with native build files; instead, a CMake 
file is shipped, and the user generates a native build file for their system.

For example:

	cd ${installDir}

	mkdir samplesBin/
		- This creates a sister directory to the samples directory that 
                  houses the native makefiles and the generated files from the 
                  build.

	cd samplesBin/
	ccmake ../samples/
		- ccmake is a curses-based cmake program; it takes a parameter 
                  that specifies the location of the source code to compile.
		- Hit 'c' to configure for the platform; ensure that the 
                  dependencies to external libraries are satisfied, including 
                  paths to 'ATI Stream SDK'.
		- After dependencies are satisfied, hit 'c' again to finalize 
                  configuration. Then, hit 'g' to generate a makefile and 
                  exit ccmake.

	make help
		- Look at the options available for make.

	make
		- Build the sample client program.

	./example_sgemm
		- Run a simple client; one example is provided for each supported main 
		BLAS function family.