- Nov 27, 2018
-
-
psychocrypt authored
If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes. This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details. - introduce a new config option `interleave` - implement thread interleaving
-
- Nov 22, 2018
-
-
fireice-uk authored
OpenCl: optimize strided index 1
-
- Nov 21, 2018
-
-
fireice-uk authored
OpenCL: add strided_index 3
-
psychocrypt authored
Use `mul24` to speedup the scratchpad index calculation. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
Add new striding index where the memory is chunked by the size of the work group (worksize). Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
fireice-uk authored
OpenCL: cnv8 optimization
-
fireice-uk authored
OpenCl: optimize cn-v8 div
-
fireice-uk authored
AMD: use more 32bit operations
-
fireice-uk authored
OpenCL reduce API overhead
-
psychocrypt authored
small optimization for non cryptonight_v8 algorithms
-
- Nov 20, 2018
-
-
fireice-uk authored
OpenCL: optimize cn-heavy div
-
SChernykh authored
- optimize division
-
fireice-uk authored
CUDA: reduce cn-v8 shared mem footprint
-
fireice-uk authored
OpenCL: reduce local mem footprint
-
fireice-uk authored
CUDA: optimize cn-v8 div
-
SChernykh authored
optimize cryptonight_heavy diff
-
psychocrypt authored
- change a few 64bit variables into 32bit. - provide defines type quallified
-
fireice-uk authored
CUDA: optimize cn-heavy div
-
- Nov 19, 2018
-
-
psychocrypt authored
- remove useless `clFinish` - avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)
-
psychocrypt authored
Reduce local memory foot print to increase the occupancy. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
port optimizations from OpenCL. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
Use only the half AES matrix and compute the other half in place. This PR increases the possible occupancy.
-
psychocrypt authored
port OpenCl optimized division to CUDA Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
- Nov 17, 2018
-
-
fireice-uk authored
change load order for backends
-
fireice-uk authored
Topic amd refactoring
-
psychocrypt authored
If CUDA is loaded before AMD but no CUDA is available it can be happen that the embadded OpenCL code is empty. This is only an issue if the binary is builded static on a different system.
-
- Nov 16, 2018
-
-
psychocrypt authored
define shared memory in the outer scope
-
SChernykh authored
x-ref: https://github.com/xmrig/xmrig-amd/pull/192
-
SChernykh authored
- optimize kernel cn0 and cn2 - optimize vast int math - use more 32bit variables Co-authored-by:
psychocrypt <psychocryptHPC@gmail.com>
-
- Nov 14, 2018
-
-
fireice-uk authored
version increase to 2.6.0
-
psychocrypt authored
bumo version for next release
-
- Nov 11, 2018
-
-
fireice-uk authored
AMD: speedup cryptonight_heavy division
-
- Nov 06, 2018
-
-
psychocrypt authored
(doc/compile) Add better CUDA SDK vs Driver information
-
SChernykh authored
optimize the devision in cryptonight_heavy and cryptonight_haven import of https://github.com/xmrig/xmrig-amd/pull/185/commits/5d9b9334654df25cea7707f667990fd1577ed290
-
- Oct 27, 2018
-
-
Tony Butler authored
-
- Oct 25, 2018
-
-
fireice-uk authored
update version to 2.5.2
-
fireice-uk authored
NVIDIA: fix wrong number of threads
-
psychocrypt authored
-
- Oct 24, 2018
-
-
psychocrypt authored
Update for upcoming Graft CNv2 fork
-
psychocrypt authored
In the cuda backend for monero we start always twice as much threads as needed. Those threads are than removed after the AES matrix is copied to the shared memory. Never the less it is the result of an copy past bug. - start correct number of threads for `monero`
-