Age | Commit message (Collapse) | Author | |
---|---|---|---|
2023-05-21 | Intel HD Graphics 770 and AMD RX 6600 XT tuning results (#474) | Cedric Nugteren | |
* Add tuning results for AMD Radeon RX 6600 XT * Add tuning results for Intel HD Graphics 770 * Update list of tuned devices | |||
2023-05-17 | Add 3 sets of tuning results: RX 5700 XT, 2080 Ti, and 3090 (#468) | Cedric Nugteren | |
* Add tuning results for AMD Radeon RX 5700 XT * Add tuning results for NVIDIA GeForce RTX 2080 Ti * Add tuning results for NVIDIA GeForce RTX 3090 | |||
2023-05-10 | Fixes an issue under Android when the driver was already unloaded (#462) | Cedric Nugteren | |
2023-05-10 | TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ↵ | Cedric Nugteren | |
(#461) | |||
2023-05-07 | TRMV: Use the minimum x buffer size for copying to a temp buffer (#458) | Cedric Nugteren | |
2023-05-07 | AMAX/AMIN integer testing and bug fixes (#457) | Cedric Nugteren | |
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result * Perform proper integer-output testing in XAMAX tests * A few changes towards getting it ready for a PR * Also fix compilation for clBLAS and cuBLAS references * Fix a bug that would only use the real part of complex numbers in the amax/amin routines * A few small fixes related to the AMAX tests | |||
2023-01-21 | Add tuning results for Intel FPGA emulation device | Cedric Nugteren | |
2023-01-21 | Add tuning results for Radeon Pro 450 | Cedric Nugteren | |
2023-01-21 | Add tuning results for Adreno 740 | Cedric Nugteren | |
2023-01-21 | Add tuning results for Adreno 730 | Cedric Nugteren | |
2023-01-17 | Updated according to feedback from CNugteren | Angus, Alexander | |
2023-01-03 | implemented changes to boost Adreno performance according to ↵ | Angus, Alexander | |
https://jira-dc.qualcomm.com/jira/browse/OSR-8731 | |||
2022-09-22 | Update PyCLBlast version number | Cedric Nugteren | |
2022-06-24 | Fix typo in comment | Cedric Nugteren | |
Resolves https://github.com/CNugteren/CLBlast/issues/440 | |||
2022-05-23 | Fix API inconsistency in cupp11.hpp | Cedric Nugteren | |
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that. | |||
2022-05-16 | Merge pull request #432 from justingra/sum-fix | Cedric Nugteren | |
sum fix | |||
2022-04-25 | Add tuning results for Adreno 540 | Cedric Nugteren | |
2022-04-25 | Add tuning results for Radeon RX 6500 XT | Cedric Nugteren | |
2022-04-25 | Add tuning results for Radeon RX 6800 XT | Cedric Nugteren | |
2022-04-22 | sum fix | Justin Graham | |
2022-04-13 | android.hpp: custom header guard of _clang_ | danyougle | |
In order not to have ambiguous definitions, exclude the functions for other compilers | |||
2021-08-27 | Add Quadro T2000 tuning parameters for the Tesla T4 | Cedric Nugteren | |
2021-08-27 | Remove Tesla T4 tuning results | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA Tesla V100 | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA Tesla T4 | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA Quadro T2000 | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA Quadro GV100 | Cedric Nugteren | |
2021-08-19 | Add tuning results for Intel Core i9-9980HK | Cedric Nugteren | |
2021-08-19 | Add tuning results for NVIDIA A100 | Cedric Nugteren | |
2021-05-22 | Fix issue with printing out-of-bounds local/global sizes for level 1 tuners | Cedric Nugteren | |
2021-03-13 | set the correct flop count for xgemm | JishinMaster | |
2021-02-05 | Fix Windows paths in pyclblast | Cedric Nugteren | |
2021-02-04 | Added second Windows library path | Cedric Nugteren | |
2021-01-30 | Add library path for Windows as well | Cedric Nugteren | |
2021-01-29 | Add library dir on Linux for pyclblast | Cedric Nugteren | |
2021-01-21 | Update pyclblast package version number | Cedric Nugteren | |
2021-01-20 | Use reference types to prevent unnecessary copying | Jerry James | |
2020-10-10 | Add tuning results for TITAN RTX | Cedric Nugteren | |
2020-10-10 | Add tuning results for Radeon RX Vega | Cedric Nugteren | |
2020-06-07 | Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG | Pradeep Garigipati | |
2020-06-05 | Fix Program::GetIR to handle programs with multiple devices | Pradeep Garigipati | |
2020-05-11 | Increase display width of the local/global sizes | Cedric Nugteren | |
2020-05-10 | Made sure that the global workgroup size is a multiple of the local size in ↵ | Cedric Nugteren | |
the tuners | |||
2020-05-10 | Added logging of local/global workgroup sizes when run the tuners | Cedric Nugteren | |
2020-05-10 | Updated PyCLBlast version number | Cedric Nugteren | |
2020-05-10 | Added a sample to demonstrate a batched routine | Cedric Nugteren | |
2020-05-10 | Added pyclblast bindings for the 3 batched routines | Cedric Nugteren | |
2020-05-03 | Move queue creation out of the tuner loop | Cedric Nugteren | |
2020-03-08 | Made it more likely (but no guarantees) for amax/amin to return the first index | Cedric Nugteren | |
2020-03-08 | Silenced a new OpenCL warning message | Cedric Nugteren | |