Age | Commit message (Collapse) | Author | |
---|---|---|---|
2022-05-23 | Fix API inconsistency in cupp11.hpp | Cedric Nugteren | |
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that. | |||
2022-05-17 | Merge pull request #437 from umar456/blas_fix | Cedric Nugteren | |
Add logic to find intel OpenMP on oneMKL. | |||
2022-05-16 | Merge pull request #432 from justingra/sum-fix | Cedric Nugteren | |
sum fix | |||
2022-05-15 | Add logic to find intel OpenMP on oneMKL. | Umar Arshad | |
2022-05-13 | dev version | Justin Graham | |
2022-05-13 | changelog message | Justin Graham | |
2022-04-25 | Merge pull request #436 from CNugteren/add_tuning_results | Cedric Nugteren | |
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU | |||
2022-04-25 | Add tuning results for Adreno 540 | Cedric Nugteren | |
2022-04-25 | Add tuning results for Radeon RX 6500 XT | Cedric Nugteren | |
2022-04-25 | Add tuning results for Radeon RX 6800 XT | Cedric Nugteren | |
2022-04-25 | Merge pull request #434 from CNugteren/update_test_status_machines | Cedric Nugteren | |
Remove old test machines and add new ones | |||
2022-04-25 | Remove old test machines and add new ones | Cedric Nugteren | |
2022-04-22 | sum fix | Justin Graham | |
2022-04-14 | Merge pull request #431 from danyougle/patch-2 | Cedric Nugteren | |
android.hpp: custom header guard _clang_ | |||
2022-04-13 | android.hpp: custom header guard of _clang_ | danyougle | |
In order not to have ambiguous definitions, exclude the functions for other compilers | |||
2022-04-13 | Merge pull request #430 from danyougle/patch-1 | Cedric Nugteren | |
add AMD OCL SDK light path in ENV section | |||
2022-04-13 | add AMD OCL SDK light path in ENV section | danyougle | |
2021-08-27 | Merge pull request #425 from CNugteren/tesla_t4_correctness | Cedric Nugteren | |
Tesla T4 tuning parameters | |||
2021-08-27 | Add Quadro T2000 tuning parameters for the Tesla T4 | Cedric Nugteren | |
2021-08-27 | Remove Tesla T4 tuning results | Cedric Nugteren | |
2021-08-24 | Merge pull request #424 from gspr/gspr/prebuilt | Cedric Nugteren | |
Update documentation to reflect CLBlast in Debian & Ubuntu | |||
2021-08-24 | PPA for older Ubuntus | Gard Spreemann | |
2021-08-24 | Let the installation documentation reflect the fact that CLBlast is now in ↵ | Gard Spreemann | |
Debian and Ubuntu | |||
2021-08-20 | Merge pull request #423 from CNugteren/new_tuning_results | Cedric Nugteren | |
New tuning results for 1 Intel CPU and 5 NVIDIA GPUs | |||
2021-08-19 | Added a note on clock frequencies for tuning | Cedric Nugteren | |
2021-08-19 | Updated README and tuning list | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA Tesla V100 | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA Tesla T4 | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA Quadro T2000 | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA Quadro GV100 | Cedric Nugteren | |
2021-08-19 | Add tuning results for Intel Core i9-9980HK | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA A100 | Cedric Nugteren | |
2021-05-23 | Merge pull request #419 from CNugteren/fix_tuner_out_of_bounds_access | Cedric Nugteren | |
Fix tuner printing issue | |||
2021-05-22 | Fix issue with printing out-of-bounds local/global sizes for level 1 tuners | Cedric Nugteren | |
2021-04-30 | Merge pull request #417 from gspr/gspr/capitalization-typo | Cedric Nugteren | |
Correct capitalization typo | |||
2021-04-30 | Correct capitalization typo | Gard Spreemann | |
The CLBlastConfig.cmake file was installed to a directory named CLBLast (notice second capital l), which can cause issues for CMake's search path when looking for CLBlast on the system. This commit also fixes other occurrences of the wrong capitalization, all of it purely cosmetic (i.e. in comments). | |||
2021-03-15 | Merge pull request #416 from JishinMaster/master | Cedric Nugteren | |
set the correct flop count for xgemm | |||
2021-03-13 | set the correct flop count for xgemm | JishinMaster | |
2021-02-06 | Merge pull request #414 from CNugteren/CLBlast-412-python-runtime-libs-fix | Cedric Nugteren | |
Fix Windows paths in pyclblast | |||
2021-02-05 | Fix Windows paths in pyclblast | Cedric Nugteren | |
2021-02-04 | Merge pull request #413 from CNugteren/CLBlast-412-python-runtime-libs | Cedric Nugteren | |
Add library dir on Linux for pyclblast | |||
2021-02-04 | Added second Windows library path | Cedric Nugteren | |
2021-01-30 | Add library path for Windows as well | Cedric Nugteren | |
2021-01-29 | Add library dir on Linux for pyclblast | Cedric Nugteren | |
2021-01-21 | Update pyclblast package version number | Cedric Nugteren | |
2021-01-21 | Merge pull request #410 from jamesjer/master | Cedric Nugteren | |
Use reference types to prevent unnecessary copying | |||
2021-01-20 | Use reference types to prevent unnecessary copying | Jerry James | |
2021-01-19 | Updated to version 1.5.2 | Cedric Nugteren | |
2020-10-10 | Add tuning results for TITAN RTX | Cedric Nugteren | |
2020-10-10 | Add tuning results for Radeon RX Vega | Cedric Nugteren | |