Age | Commit message (Collapse) | Author | |
---|---|---|---|
2021-08-19 | Add tuning results for NVIDIA A100 | Cedric Nugteren | |
2021-05-23 | Merge pull request #419 from CNugteren/fix_tuner_out_of_bounds_access | Cedric Nugteren | |
Fix tuner printing issue | |||
2021-05-22 | Fix issue with printing out-of-bounds local/global sizes for level 1 tuners | Cedric Nugteren | |
2021-04-30 | Merge pull request #417 from gspr/gspr/capitalization-typo | Cedric Nugteren | |
Correct capitalization typo | |||
2021-04-30 | Correct capitalization typo | Gard Spreemann | |
The CLBlastConfig.cmake file was installed to a directory named CLBLast (notice second capital l), which can cause issues for CMake's search path when looking for CLBlast on the system. This commit also fixes other occurrences of the wrong capitalization, all of it purely cosmetic (i.e. in comments). | |||
2021-03-15 | Merge pull request #416 from JishinMaster/master | Cedric Nugteren | |
set the correct flop count for xgemm | |||
2021-03-13 | set the correct flop count for xgemm | JishinMaster | |
2021-02-06 | Merge pull request #414 from CNugteren/CLBlast-412-python-runtime-libs-fix | Cedric Nugteren | |
Fix Windows paths in pyclblast | |||
2021-02-05 | Fix Windows paths in pyclblast | Cedric Nugteren | |
2021-02-04 | Merge pull request #413 from CNugteren/CLBlast-412-python-runtime-libs | Cedric Nugteren | |
Add library dir on Linux for pyclblast | |||
2021-02-04 | Added second Windows library path | Cedric Nugteren | |
2021-01-30 | Add library path for Windows as well | Cedric Nugteren | |
2021-01-29 | Add library dir on Linux for pyclblast | Cedric Nugteren | |
2021-01-21 | Update pyclblast package version number | Cedric Nugteren | |
2021-01-21 | Merge pull request #410 from jamesjer/master | Cedric Nugteren | |
Use reference types to prevent unnecessary copying | |||
2021-01-20 | Use reference types to prevent unnecessary copying | Jerry James | |
2021-01-19 | Updated to version 1.5.2 | Cedric Nugteren | |
2020-10-10 | Add tuning results for TITAN RTX | Cedric Nugteren | |
2020-10-10 | Add tuning results for Radeon RX Vega | Cedric Nugteren | |
2020-10-05 | Merge pull request #400 from baryluk/patch-6 | Cedric Nugteren | |
Allow single graph / subplot on plot | |||
2020-10-05 | Allow single graph / subplot on plot | Witold Baryluk | |
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots. It is not actually helpful, and IMHO bad design. Make it always `ndarray`. The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it. Also, no need for `ndarray.flat` really. Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1). | |||
2020-10-04 | Merge pull request #399 from baryluk/patch-3 | Cedric Nugteren | |
Fix a typo in benchmark when running fp 16 vs 32 | |||
2020-10-04 | Fix a typo in benchmark when running fp 16 vs 32 | Witold Baryluk | |
The intention here was to limit the iteration range to common indexes only. Fix that. | |||
2020-10-04 | Merge pull request #397 from baryluk/patch-1 | Cedric Nugteren | |
Fix Python SyntaxWarning | |||
2020-10-04 | Merge pull request #398 from baryluk/patch-2 | Cedric Nugteren | |
Fix --load_from_disk argument help message | |||
2020-10-04 | Fix --load_from_disk argument help message | Witold Baryluk | |
2020-10-04 | Fix Python SyntaxWarning | Witold Baryluk | |
There is no guarantee that all empty strings objects are the same or share object with `""` literal. | |||
2020-10-03 | Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-script | Cedric Nugteren | |
Fix a Python 3 bug in the benchmark script | |||
2020-10-02 | Fix a Python 3 bug in the benchmark script | Cedric Nugteren | |
2020-08-16 | Added FUNDING.yml file | Cedric Nugteren | |
2020-06-07 | Merge pull request #392 from 9prady9/fix_Program_getIR | Cedric Nugteren | |
Fix Program::GetIR to handle programs with multiple devices | |||
2020-06-07 | Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG | Pradeep Garigipati | |
2020-06-05 | Fix Program::GetIR to handle programs with multiple devices | Pradeep Garigipati | |
2020-05-13 | Merge pull request #389 from CNugteren/CLBlast-385-version-defines | Cedric Nugteren | |
Added version number defines | |||
2020-05-12 | Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version ↵ | Cedric Nugteren | |
numbering | |||
2020-05-11 | Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure | Cedric Nugteren | |
Fixed tuners global workgroup size | |||
2020-05-11 | Increase display width of the local/global sizes | Cedric Nugteren | |
2020-05-10 | Made sure that the global workgroup size is a multiple of the local size in ↵ | Cedric Nugteren | |
the tuners | |||
2020-05-10 | Added logging of local/global workgroup sizes when run the tuners | Cedric Nugteren | |
2020-05-10 | Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines | Cedric Nugteren | |
PyCLBlast: add missing batched routines | |||
2020-05-10 | Updated PyCLBlast version number | Cedric Nugteren | |
2020-05-10 | Added a sample to demonstrate a batched routine | Cedric Nugteren | |
2020-05-10 | Added pyclblast bindings for the 3 batched routines | Cedric Nugteren | |
2020-05-04 | Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner | Cedric Nugteren | |
Move queue creation out of the tuner loop | |||
2020-05-03 | Move queue creation out of the tuner loop | Cedric Nugteren | |
2020-03-15 | Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin | Cedric Nugteren | |
Change amax/amin behaviour | |||
2020-03-08 | Update API documentation | Cedric Nugteren | |
2020-03-08 | Made it more likely (but no guarantees) for amax/amin to return the first index | Cedric Nugteren | |
2020-03-08 | Added sample to play around with XAMAX routine | Cedric Nugteren | |
2020-03-08 | Silenced a new OpenCL warning message | Cedric Nugteren | |