summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--CHANGELOG1
-rw-r--r--README.md1
-rw-r--r--doc/routines.md1
-rw-r--r--doc/tuning.md2
-rw-r--r--src/kernels/levelx/col2im.opencl2
5 files changed, 5 insertions, 2 deletions
diff --git a/CHANGELOG b/CHANGELOG
index 18c9051d..4a17a47b 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -11,6 +11,7 @@ Development (next version)
- Various minor fixes and enhancements
- Added non-BLAS routines:
* SCONVGEMM/DCONVGEMM/HCONVGEMM (convolution as im2col followed by batched GEMM)
+ * SCOL2IM/DCOL2IM/CCOL2IM/ZCOL2IM/HCOL2IM (col2im transform as used in machine learning)
Version 1.4.1
- Fixed an access violation under Windows upon releasing the OpenCL program when the driver is already unloaded
diff --git a/README.md b/README.md
index 85dc4386..9464e6e7 100644
--- a/README.md
+++ b/README.md
@@ -124,6 +124,7 @@ The main contributing authors (code, pull requests, testing) are:
* [Shehzan Mohammed](https://shehzan10.github.io)
* [Marco Cianfriglia](https://github.com/mcian)
* [Kodonnell](https://github.com/kodonnell)
+* [Koichi Akabe](https://github.com/vbkaisetsu)
* Everyone else listed as a [GitHub contributor](https://github.com/CNugteren/CLBlast/graphs/contributors)
Tuning and testing on a variety of OpenCL devices was made possible by:
diff --git a/doc/routines.md b/doc/routines.md
index 7c6a1eb9..a4cb5e57 100644
--- a/doc/routines.md
+++ b/doc/routines.md
@@ -93,6 +93,7 @@ In addition, some extra non-BLAS routines are also supported by CLBlast, classif
| xHAD | ✔ | ✔ | ✔ | ✔ | ✔ | (Hadamard product)
| xOMATCOPY | ✔ | ✔ | ✔ | ✔ | ✔ | (Out-of-place copying/transposing/scaling of matrices)
| xIM2COL | ✔ | ✔ | ✔ | ✔ | ✔ | (Image to column transform as used to express convolution as GEMM)
+| xCOL2IM | ✔ | ✔ | ✔ | ✔ | ✔ | (Column to image transform as used in machine learning)
| xCONVGEMM | ✔ | ✔ | - | - | ✔ | (Experimental, implemented as im2col followed by batched GEMM)
Some less commonly used BLAS routines are not yet supported by CLBlast. They are xROTG, xROTMG, xROT, xROTM, xTBSV, and xTPSV.
diff --git a/doc/tuning.md b/doc/tuning.md
index 6243d135..6b52f4a2 100644
--- a/doc/tuning.md
+++ b/doc/tuning.md
@@ -235,4 +235,4 @@ To find out which tuners to run for which routines, you can use the table below.
| GER GERC GERU HER HER2 HPR HPR2 SPR SPR2 SYR SYR2 | Xger |
| GEMM HEMM HER2K HERK SYMM SYR2K SYRK TRMM GEMMBATCHED GEMMSTRIDEDBATCHED | Xgemm XgemmDirect Copy Pad Transpose Padtranspose |
| TRSM | Xgemm XgemmDirect Copy Pad Transpose Padtranspose Invert |
-| IM2COL | Copy |
+| IM2COL COL2IM | Copy |
diff --git a/src/kernels/levelx/col2im.opencl b/src/kernels/levelx/col2im.opencl
index 44908ca1..5cadeec6 100644
--- a/src/kernels/levelx/col2im.opencl
+++ b/src/kernels/levelx/col2im.opencl
@@ -80,7 +80,7 @@ void col2im(const int input_h, const int input_w, const int channels,
}
}
- // Sets the input value
+ // Sets the resulting value
const int input_index = w_index + input_w * (h_index + input_h * c_id);
im_buffer[input_index + im_offset] = val;
}