- Feb 10, 2019
-
-
psychocrypt authored
Combine the shared memory for a hash within one struct. Reduce the shared memory footprint per hash by 64 byte.
-
psychocrypt authored
- rename variable names like `b` and `bb` to something with a little bit of meaning.
-
- Feb 09, 2019
-
-
psychocrypt authored
Optimize cn_gpu
-
- Feb 07, 2019
-
-
psychocrypt authored
cryptonight_turtle is only cryptonight_v8 with a different scratchpad, iteration and mask value. We are using now the new machanism to describe such derived POWs.
-
psychocrypt authored
@xmrig provided the information that the driver 19.2.1 for vega also create invalid results if pragma unroll is used for the groestl algo.
-
- Feb 06, 2019
-
-
psychocrypt authored
- use the user defined unroll - auto suggestion: - only tune for cn_gpu if this is the main user currency (after a fork) - set unroll to 1 for cn_gpu
-
- Feb 04, 2019
-
-
psychocrypt authored
If comp_mode is used the code will not compile. - fix compile issue - fix wrong conditions to handle `comp_mode`
-
- Feb 02, 2019
-
-
psychocrypt authored
Windows driver creates wrong code if unroll is used.
-
- Feb 01, 2019
-
-
psychocrypt authored
Use the algorithm names from `cryptonight.hpp` instead if number within the OpenCL kernel.
-
- Jan 30, 2019
-
-
psychocrypt authored
- fix broken trutle coin - fix non cn_gpu algorithms
-
fireice-uk authored
Co-authored-by:
psychocrypt <psychocryptHPC@gmail.com> Co-authored-by:
fireice-uk <fireice-uk@users.noreply.github.com>
-
- Jan 25, 2019
-
-
Brandon Lehmann authored
-
- Dec 06, 2018
-
-
psychocrypt authored
Since #2080 bittube2 is broken. - reintroduce special AES function for bittube2
-
- Dec 03, 2018
-
-
psychocrypt authored
NVIDIA is using clang as device compiler so the reciprocal optimizations was disabled with #2104. - re-enable optimized reciprocal calculation
-
- Dec 02, 2018
-
-
psychocrypt authored
- fix broken compile: change used `ULL` to `UL` because `UL` is defined as 64bit - fix memory copy to shared memory via vload8 (somehow it create wrong access)
-
- Nov 30, 2018
-
-
psychocrypt authored
use for non clang (Rocm) OpenCL a optimized reciprocal calculation without lookup table. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
- Nov 29, 2018
-
-
LPHuynh authored
-
- Nov 21, 2018
-
-
psychocrypt authored
Use `mul24` to speedup the scratchpad index calculation. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
Add new striding index where the memory is chunked by the size of the work group (worksize). Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
small optimization for non cryptonight_v8 algorithms
-
- Nov 20, 2018
-
-
SChernykh authored
- optimize division
-
SChernykh authored
optimize cryptonight_heavy diff
-
psychocrypt authored
- change a few 64bit variables into 32bit. - provide defines type quallified
-
- Nov 19, 2018
-
-
psychocrypt authored
- remove useless `clFinish` - avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)
-
psychocrypt authored
Reduce local memory foot print to increase the occupancy. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
- Nov 16, 2018
-
-
psychocrypt authored
define shared memory in the outer scope
-
SChernykh authored
x-ref: https://github.com/xmrig/xmrig-amd/pull/192
-
SChernykh authored
- optimize kernel cn0 and cn2 - optimize vast int math - use more 32bit variables Co-authored-by:
psychocrypt <psychocryptHPC@gmail.com>
-
- Nov 06, 2018
-
-
SChernykh authored
optimize the devision in cryptonight_heavy and cryptonight_haven import of https://github.com/xmrig/xmrig-amd/pull/185/commits/5d9b9334654df25cea7707f667990fd1577ed290
-
- Oct 16, 2018
-
-
psychocrypt authored
Fix the fix from #1945. The initial fix produces invalid results.
-
- Oct 15, 2018
-
-
psychocrypt authored
The AMD compiler for OpenCL shipped with the driver 14XX is broken and can not compile xmr-stak since the monero v8 changes are introduced. - workaround a simple compare. - add new device define `OPENCL_DRIVER_MAJOR`
-
- Oct 10, 2018
-
-
psychocrypt authored
In the current implementation the bit align is using signed integer which results in pulling in ones in the case the sign bit is set. - cast to unsigned integer before using bitshift
-
- Oct 05, 2018
-
-
psychocrypt authored
With rocm we fighted very long with invalid shares. This is now solved with rocm 1.9 and this tiny fix. It is not fully clear where a memory optimization is kicking in and break the kernel `Groestl` if the variables `M` and `H` are not `volatile`. The performance ill not change with this fix. The fix is tested with rocm 1.9 with a VEGA64 and a RX570
-
- Oct 04, 2018
-
-
Tony Butler authored
-
- Sep 30, 2018
-
-
psychocrypt authored
add cpu implementation for the final monero POW
-
- Sep 21, 2018
-
-
psychocrypt authored
- remove unused host function (relict from old refactoring) - remove unused OpenCL full div function
-
- Sep 19, 2018
-
-
psychocrypt authored
- fix code style issues - fix spelling issue - fix asm to support newer clang versions
-
psychocrypt authored
add option `unroll` for OpenCL to allow better tuning the main POW kernel.
-
psychocrypt authored
Create a special pass for NVIDIA GPUs to load memory chunks first into the shared memory. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
- implement cryptonight_v8 - update auto adjust to fit the special requirements of `cryptonight_v8` - add fast math integer implementation for `sqrt`, `reciprocal` and `division` Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-