Age | Commit message (Collapse) | Author |
|
based on assumptions
|
|
|
|
|
|
|
|
|
|
changed transpose kernel code
|
|
|
|
|
|
|
|
|
|
optional temporary buffers
|
|
|
|
|
|
kernel 2d instead of 3d
|
|
|
|
|
|
NVIDIA and ARM GPUs
|
|
sets of input parameters
|
|
|
|
|
|
|
|
struct; fixes bug on Apple OpenCL
|
|
|
|
|
|
GEMM kernel
|
|
implementation
|
|
|
|
- undoing many earlier changes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TRSM
|
|
OpenCL 1.1 or lower
|
|
|
|
specific tuning parameters (2)
|
|
specific tuning parameters
|
|
kernel
|
|
for the GEMM direct kernels
|
|
incomplete rectangles
|
|
transposing/non-transposing: NN, NT, TN, TT
|
|
to 256-256-256
|
|
|
|
NWGD and KWGD into one WGD parameter
|