Age | Commit message (Collapse) | Author |
|
and 2015
|
|
GEMM kernels
|
|
incomplete rectangles
|
|
transposing/non-transposing: NN, NT, TN, TT
|
|
to 256-256-256
|
|
NWGD and KWGD into one WGD parameter
|
|
indirect version
|
|
|
|
can't handle long strings
|
|
|
|
|
|
Do not hardcode the knowledge about "A and C col-major, B row-major".
This allows for easier reuse of the DoGemm() routine with different
kernels.
|
|
though
|
|
slow reference kernel as a place-holder
|
|
|
|
case of fp16 arguments are cast on host and in kernel
|
|
|
|
|
|
|
|
|
|
templated function
|
|
them directly now
|
|
class
|
|
functions in a separate file
|
|
and/or transposing
|
|
|
|
and renamed files and functions appropriately
|
|
|
|
|
|
|
|
|
|
|
|
user to wait for event completion
|
|
|
|
K40 and Iris supported now
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|