diff options
-rw-r--r-- | CHANGELOG | 1 | ||||
-rw-r--r-- | README.md | 1 | ||||
-rw-r--r-- | doc/routines.md | 1 | ||||
-rw-r--r-- | doc/tuning.md | 2 | ||||
-rw-r--r-- | src/kernels/levelx/col2im.opencl | 2 |
5 files changed, 5 insertions, 2 deletions
@@ -11,6 +11,7 @@ Development (next version) - Various minor fixes and enhancements - Added non-BLAS routines: * SCONVGEMM/DCONVGEMM/HCONVGEMM (convolution as im2col followed by batched GEMM) + * SCOL2IM/DCOL2IM/CCOL2IM/ZCOL2IM/HCOL2IM (col2im transform as used in machine learning) Version 1.4.1 - Fixed an access violation under Windows upon releasing the OpenCL program when the driver is already unloaded @@ -124,6 +124,7 @@ The main contributing authors (code, pull requests, testing) are: * [Shehzan Mohammed](https://shehzan10.github.io) * [Marco Cianfriglia](https://github.com/mcian) * [Kodonnell](https://github.com/kodonnell) +* [Koichi Akabe](https://github.com/vbkaisetsu) * Everyone else listed as a [GitHub contributor](https://github.com/CNugteren/CLBlast/graphs/contributors) Tuning and testing on a variety of OpenCL devices was made possible by: diff --git a/doc/routines.md b/doc/routines.md index 7c6a1eb9..a4cb5e57 100644 --- a/doc/routines.md +++ b/doc/routines.md @@ -93,6 +93,7 @@ In addition, some extra non-BLAS routines are also supported by CLBlast, classif | xHAD | ✔ | ✔ | ✔ | ✔ | ✔ | (Hadamard product) | xOMATCOPY | ✔ | ✔ | ✔ | ✔ | ✔ | (Out-of-place copying/transposing/scaling of matrices) | xIM2COL | ✔ | ✔ | ✔ | ✔ | ✔ | (Image to column transform as used to express convolution as GEMM) +| xCOL2IM | ✔ | ✔ | ✔ | ✔ | ✔ | (Column to image transform as used in machine learning) | xCONVGEMM | ✔ | ✔ | - | - | ✔ | (Experimental, implemented as im2col followed by batched GEMM) Some less commonly used BLAS routines are not yet supported by CLBlast. They are xROTG, xROTMG, xROT, xROTM, xTBSV, and xTPSV. diff --git a/doc/tuning.md b/doc/tuning.md index 6243d135..6b52f4a2 100644 --- a/doc/tuning.md +++ b/doc/tuning.md @@ -235,4 +235,4 @@ To find out which tuners to run for which routines, you can use the table below. | GER GERC GERU HER HER2 HPR HPR2 SPR SPR2 SYR SYR2 | Xger | | GEMM HEMM HER2K HERK SYMM SYR2K SYRK TRMM GEMMBATCHED GEMMSTRIDEDBATCHED | Xgemm XgemmDirect Copy Pad Transpose Padtranspose | | TRSM | Xgemm XgemmDirect Copy Pad Transpose Padtranspose Invert | -| IM2COL | Copy | +| IM2COL COL2IM | Copy | diff --git a/src/kernels/levelx/col2im.opencl b/src/kernels/levelx/col2im.opencl index 44908ca1..5cadeec6 100644 --- a/src/kernels/levelx/col2im.opencl +++ b/src/kernels/levelx/col2im.opencl @@ -80,7 +80,7 @@ void col2im(const int input_h, const int input_w, const int channels, } } - // Sets the input value + // Sets the resulting value const int input_index = w_index + input_w * (h_index + input_h * c_id); im_buffer[input_index + im_offset] = val; } |