Age | Commit message (Collapse) | Author |
|
Replace the looped test by a single one with the offset of the last batch.
|
|
Replace the looped test by a single one with the maximal found offset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
inline PTX to support subgroup shuffle for Nvidia GPUs
|
|
|
|
|
|
|
|
|
|
and standard-deviation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
results based on kernel pre-processor
|
|
|
|
|
|
|
|
|
|
support for multi-kernel routines
|
|
|
|
|
|
the CLBlast library
|
|
|
|
|
|
|
|
|
|
|
|
GEMM kernel selection tuner
|
|
results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|