# ######################################################################## # Copyright 2013 Advanced Micro Devices, Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ######################################################################## clBLAS Readme Version: 1.10 Release Date: April 2013 ChangeLog: ____________ Current Version: New: * New Level 1 routines added (an 'x' implies all 4 precisions) xSWAP, xCOPY, xSCAL, CSSCAL, ZDSCAL, xAXPY, SDOT, DDOT, CDOTU, ZDOTU, CDOTC, ZDOTC, xROTG, SROTMG, DROTMG, SROT, DROT, CSROT, ZDROT, SROTM, DROTM, SNRM2, DNRM2, SCNRM2, DZNRM2, ixAMAX, SASUM, DASUM, SCASUM, DZASUM * Samples have been added for the new functions * This release tested using the 9.012 runtime driver and the 2.8 APPSDK Fixed: * Failures in *trsm functions with clMAGMA tests Known Issues: * Failures & hangs in ztrmm, *trsv, *tpsv functions on Southern Island GPU devices * Failures in zgemm functions on Northern Island GPU devices * Failures & hangs are expected to be fixed in the upcoming AMD graphics driver versions. It is strongly recommended that users keep their graphics driver versions up to date. ____________ Version 1.8.291: Fixed: * Failures in the following functions: ssyr2, ssyr2k, strsm, strsv, ssyrk, cher, ctrsv, csymm, cher2, ztrmm on Southern Island GPU devices. * Failures in the following functions: dsyr, dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k on Trinity platforms. Known Issues: * Failures in *trsm functions with clMAGMA tests ____________ Version 1.8.269 (Beta, clMAGMA support): New: * No new routines * This release tested using the 8.961 runtime driver and the 2.6 APPSDK Known Issues: * The clBLASTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * clBLAS can return invalid results on CPU devices (as of 8.961). The CPU device is primarily a test/debug device, and GPU devices are unaffected. * clBLAS can return invalid results for double precision functions (dsyr, dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of 8.961). * clBLAS can return invalid results (ssyr2, ssyr2k, strsm, strsv, ssyrk, cher, ctrsv, csymm, cher2, ztrmm) on Southern Island GPU devices (as of 8.961). ____________ Version 1.7 (Beta, clMAGMA support): New: * New Level 3 routines added (an 'x' implies all 4 precisions) CHER2K, ZHER2K * New Level 2 routines added (an 'x' implies all 4 precisions) xTPMV, xTPSV, SSPVM, DSPMV, CHPMV, ZHPMV, SSPR, DSPR, CHPR, ZHPR, SSPR2, DSPR2, CHPR2, ZHPR2, xGBMV, CHBMV, ZHBMV, SSBMV, DSBMV, xTBMV, xTBSV * Samples have been added for the new functions, but are not fully tested * This release tested using the 8.951 runtime driver and the 2.6 APPSDK * Note that documentation is incomplete for the new functions Known Issues: * The clBLASTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * clBLAS can return invalid results on CPU devices that support AVX (as of 8.951). CPU devices that support up to SSE3 are unaffected. The CPU device is primarily a test/debug device, and GPU devices are unaffected. * clBLAS can return invalid results for double precision functions (dsyr, dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of 8.951). * clBLAS can return invalid results (ssyr, ssyr2, strsv, ctrsv, ssyrk, ssyr2k, ztrmm) on Southern Island GPU devices (as of 8.951). ____________ Version 1.6: New: * New Level 3 routines added (an 'x' implies all 4 precisions) CSYRK, ZSYRK, CSYR2K, ZSYR2K, CHEMM, ZHEMM, CHERK, ZHERK, xSYMM * New Level 2 routines added (an 'x' implies all 4 precisions) CGEMV, ZGEMV, xTRMV, xTRSV, CHEMV, ZHEMV, SGER, DGER, CGERU, ZGERU, CGERC, ZGERC, CHER, ZHER, CHER2, ZHER2, SSYR, DSYR, SSYR2, DSYR2 * For all the original functions prior to 1.6, a new API has been introduced with an *Ex suffix. These extended API's add new parameters that allow users to specify an offset to a matrix argument. This allows efficient sub-matrix indexing within a clBLAS routine without requiring expensive sub-matrix copy operations. * Samples have been added for the new functions * Preview: Support for AMD Radeon™ HD7000 series GPUs * This release tested using the 8.92 runtime driver and the 2.6 APP SDK Known Issues: * The clBLASTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * The CPU device for clBLAS is not functioning for this release (as of 8.872). The CPU device is primarily a test/debug device, and GPU devices are unaffected. ____________ Version 1.4: New: * New Level 3 routines added SSYRK, DSYRK, SSYR2K, DSYR2K * New Level 2 routines added SGEMV, DGEMV, SSYMV, DSYMV * The image support functions (clblasAddScratchImage, clblasRemoveScratchImage) have been deprecated. Images are no longer required for the highest performance. * InstallShield is now used for APPML libraries. The default install location has changed from c:\amd\clBLAS to C:\Program Files (x86)\AMD\clBLAS. It is recommended that previous versions of clBLAS are uninstalled first. * Samples have been added for the new functions * This release tested using the 8.872 runtime driver and the 2.5 APP SDK Known Issues: * The clBLASTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * The CPU device for clBLAS is not functioning for this release (as of 8.872). The CPU device is primarily a test/debug device, and GPU devices are unaffected. ____________ Version 1.2: * The library now supports both 32- and 64-bit Windows and Linux operating systems. * xTRSM routines are available in 1.2. * clBLAS routines return clBLASStatus error codes, instead of native OpenCL error codes Fixed: * xTRMM routines were not properly handling implicit unit diagonal elements and implicit off-diagonal zero values specified by the BLAS parameters SIDE, UPLO and DIAG. * Possible crash with CPU device on 32-bit systems. * clblasDgemm routine return an invalid event as its last argument. * clBLAS routines return clblasStatus error codes, instead of native OpenCL error codes. Known Issues: * The clBLASTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * The CPU device for clBLAS is not functioning for this release (as of 8.872). The CPU device is primarily a test/debug device, and GPU devices are unaffected. ____________________ Version 1.0: * Initial release Known Issues: * Available only on Linux64. * xTRMM routines were not properly handling implicit unit diagonal elements and implicit off-diagonal zero values specified by the BLAS parameters SIDE, UPLO and DIAG * clblasDgemm returned an invalid event as its last argument _____________ Building the Samples: To install the Linux versions of clBLAS, uncompress the initial download, then execute the install script. For example: tar -xf clBLAS-${version}-Linux.tar.gz - This installs three files into the local directory, one being an executable bash script. sudo mkdir /opt/clBLAS-${version} - This pre-creates the install directory with proper permissions in /opt if it is to be installed there. (This is the default.) ./install-clBLAS-${version}.sh - This prints an EULA and uncompresses files into the chosen install directory. cd ${installDir}/bin64 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clBLASLibDir} - Be sure to export library dependencies to resolve all external linkages to the client program; you can create a bash script to help automate this procedure. ./example_sgemm - Run a simple client; one example is provided for each supported main BLAS function family. The sample program does not ship with native build files; instead, a CMake file is shipped, and the user generates a native build file for their system. For example: cd ${installDir} mkdir samplesBin/ - This creates a sister directory to the samples directory that houses the native makefiles and the generated files from the build. cd samplesBin/ ccmake ../samples/ - ccmake is a curses-based cmake program; it takes a parameter that specifies the location of the source code to compile. - Hit 'c' to configure for the platform; ensure that the dependencies to external libraries are satisfied, including paths to 'ATI Stream SDK'. - After dependencies are satisfied, hit 'c' again to finalize configuration. Then, hit 'g' to generate a makefile and exit ccmake. make help - Look at the options available for make. make - Build the sample client program. ./example_sgemm - Run a simple client; one example is provided for each supported main BLAS function family.