Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
couple of devices
|
|
|
|
|
|
and kernels
|
|
Groundwork for device-specific routines
|
|
now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up
|
|
|
|
|
|
enabling better memory performance
|
|
by routine implementation
|
|
support empty LWS
|
|
|
|
|
|
Do not hardcode the knowledge about "A and C col-major, B row-major".
This allows for easier reuse of the DoGemm() routine with different
kernels.
|
|
|
|
though
|
|
slow reference kernel as a place-holder
|
|
|
|
Make sure the passed types are large enough.
|
|
|
|
Make sure all out parameters that are passed to functions such
as clGetDeviceInfo are large enough to contain the replies.
|
|
case of fp16 arguments are cast on host and in kernel
|
|
|
|
OpenCL context
|
|
verbose print statements to the cache
|
|
compilation and kernel execution to screen
|
|
|
|
|
|
|
|
Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems
|
|
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo
to get the "bytes" amount needed to store the result from
CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an
"auto result = size_t", which in 32-bit mode is 4 bytes, regardless
of the previous return value. The spec describes that it will actually
be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure
we are in fact passing a cl_ulong.
Also adjust all callers to take the changed type into account.
|
|
|
|
|
|
declspec(dllimport) when not building the library
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
'artifact'
|
|
|
|
is present
|
|
|
|
|
|
|