diff options
author | Cedric Nugteren <web@cedricnugteren.nl> | 2017-03-10 21:15:29 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2017-03-10 21:15:29 +0100 |
commit | de3500ed18ddb39261ffa270f460909571276462 (patch) | |
tree | b515368fcd1e39afb5805f67796b082ccc8066f9 /doc | |
parent | 37228c90988509acef9e8a892a752300b7645210 (diff) | |
parent | 3846f44eaf389ee24a698d4947e5c16bd14c3d0e (diff) |
Merge pull request #141 from CNugteren/axpy_batched
Added the batched version of the AXPY routine
Diffstat (limited to 'doc')
-rw-r--r-- | doc/clblast.md | 66 |
1 files changed, 66 insertions, 0 deletions
diff --git a/doc/clblast.md b/doc/clblast.md index 1d7c0df2..120c0c2c 100644 --- a/doc/clblast.md +++ b/doc/clblast.md @@ -2903,6 +2903,72 @@ Requirements for OMATCOPY: +xAXPYBATCHED: Batched version of AXPY +------------- + +As AXPY, but multiple operations are batched together for better performance. + +C++ API: +``` +template <typename T> +StatusCode AxpyBatched(const size_t n, + const T *alphas, + const cl_mem x_buffer, const size_t *x_offsets, const size_t x_inc, + cl_mem y_buffer, const size_t *y_offsets, const size_t y_inc, + const size_t batch_count, + cl_command_queue* queue, cl_event* event) +``` + +C API: +``` +CLBlastStatusCode CLBlastSaxpyBatched(const size_t n, + const float *alphas, + const cl_mem x_buffer, const size_t *x_offsets, const size_t x_inc, + cl_mem y_buffer, const size_t *y_offsets, const size_t y_inc, + const size_t batch_count, + cl_command_queue* queue, cl_event* event) +CLBlastStatusCode CLBlastDaxpyBatched(const size_t n, + const double *alphas, + const cl_mem x_buffer, const size_t *x_offsets, const size_t x_inc, + cl_mem y_buffer, const size_t *y_offsets, const size_t y_inc, + const size_t batch_count, + cl_command_queue* queue, cl_event* event) +CLBlastStatusCode CLBlastCaxpyBatched(const size_t n, + const cl_float2 *alphas, + const cl_mem x_buffer, const size_t *x_offsets, const size_t x_inc, + cl_mem y_buffer, const size_t *y_offsets, const size_t y_inc, + const size_t batch_count, + cl_command_queue* queue, cl_event* event) +CLBlastStatusCode CLBlastZaxpyBatched(const size_t n, + const cl_double2 *alphas, + const cl_mem x_buffer, const size_t *x_offsets, const size_t x_inc, + cl_mem y_buffer, const size_t *y_offsets, const size_t y_inc, + const size_t batch_count, + cl_command_queue* queue, cl_event* event) +CLBlastStatusCode CLBlastHaxpyBatched(const size_t n, + const cl_half *alphas, + const cl_mem x_buffer, const size_t *x_offsets, const size_t x_inc, + cl_mem y_buffer, const size_t *y_offsets, const size_t y_inc, + const size_t batch_count, + cl_command_queue* queue, cl_event* event) +``` + +Arguments to AXPYBATCHED: + +* `const size_t n`: Integer size argument. This value must be positive. +* `const T *alphas`: Input scalar constants. +* `const cl_mem x_buffer`: OpenCL buffer to store the input x vector. +* `const size_t *x_offsets`: The offsets in elements from the start of the input x vector. +* `const size_t x_inc`: Stride/increment of the input x vector. This value must be greater than 0. +* `cl_mem y_buffer`: OpenCL buffer to store the output y vector. +* `const size_t *y_offsets`: The offsets in elements from the start of the output y vector. +* `const size_t y_inc`: Stride/increment of the output y vector. This value must be greater than 0. +* `const size_t batch_count`: Number of batches. This value must be positive. +* `cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on. +* `cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument. + + + ClearCache: Resets the cache of compiled binaries (auxiliary function) ------------- |