Age | Commit message (Collapse) | Author |
|
and 2015
|
|
GEMM kernels
|
|
incomplete rectangles
|
|
transposing/non-transposing: NN, NT, TN, TT
|
|
to 256-256-256
|
|
NWGD and KWGD into one WGD parameter
|
|
indirect version
|
|
|
|
can't handle long strings
|
|
|
|
|
|
|
|
|
|
enabling better memory performance
|
|
support empty LWS
|
|
|
|
Do not hardcode the knowledge about "A and C col-major, B row-major".
This allows for easier reuse of the DoGemm() routine with different
kernels.
|
|
though
|
|
slow reference kernel as a place-holder
|
|
|
|
case of fp16 arguments are cast on host and in kernel
|
|
compilation and kernel execution to screen
|
|
|
|
|
|
|
|
|
|
|
|
templated function
|
|
them directly now
|
|
class
|
|
functions in a separate file
|
|
and/or transposing
|
|
|
|
and renamed files and functions appropriately
|
|
|
|
|
|
|
|
HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
|
|
|
|
|
|
|
|
|
|
to transfer half-precision values as well
|
|
|
|
buffersize checking
|
|
|
|
|
|
|
|
user to wait for event completion
|
|
|