Age | Commit message (Collapse) | Author |
|
|
|
the tuners
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Replace the looped test by a single one with the offset of the last batch.
|
|
Replace the looped test by a single one with the maximal found offset.
|
|
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).
This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).
Courtesy of Codeplay Software Ltd.
|
|
|
|
|
|
|
|
|
|
|
|
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Convolution with single kernel
|
|
|
|
strided-batched-GEMM routine
|
|
|
|
|
|
executions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
kernel and test
|
|
|
|
|
|
|
|
First im2col+GEMM implementation of convolution
|
|
|
|
|
|
|
|
|
|
|