Age | Commit message (Collapse) | Author |
|
https://jira-dc.qualcomm.com/jira/browse/OSR-8731
|
|
Resolves https://github.com/CNugteren/CLBlast/issues/440
|
|
kernels to improve performance
|
|
|
|
changed transpose kernel code
|
|
|
|
NVIDIA and ARM GPUs
|
|
struct; fixes bug on Apple OpenCL
|
|
|
|
|
|
|
|
|
|
TRSM
|
|
OpenCL 1.1 or lower
|
|
|
|
problems if C contains NaNs
|
|
slow reference kernel as a place-holder
|
|
case of fp16 arguments are cast on host and in kernel
|
|
|
|
|
|
to transfer half-precision values as well
|
|
|
|
|
|
and IxAMAX
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|