Commits · d8316f7d9245973ab2c8467d85da06a58d566131 · Recolic / azure-cloud-mining-script

Nov 27, 2018

psychocrypt authored Nov 27, 2018

If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes.

This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details.

- introduce a new config option `interleave`
- implement thread interleaving

d8316f7d

Nov 22, 2018
- Merge pull request #2089 from psychocrypt/topic-OpenCLOptimizeStridedIndex1 · 76f0de7f
  fireice-uk authored Nov 22, 2018
```
OpenCl: optimize strided index 1
```
  76f0de7f
Nov 21, 2018
- Merge pull request #2088 from psychocrypt/topic-newStridedIndex · ff204b22
  fireice-uk authored Nov 21, 2018
```
OpenCL: add strided_index 3
```
  ff204b22
- OpenCl: optimize strided index 1 · 39fa7c62
  psychocrypt authored Nov 21, 2018
```
Use `mul24` to speedup the scratchpad index calculation.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
```
  39fa7c62
- OpenCL: add strided_index 3 · 3c9442ce
  psychocrypt authored Nov 21, 2018
```
Add new striding index where the memory is chunked by the size of the work group (worksize).

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
```
  3c9442ce
- Merge pull request #2087 from psychocrypt/topic-cn1Optimization · 11387f7c
  fireice-uk authored Nov 21, 2018
```
OpenCL: cnv8 optimization
```
  11387f7c
- Merge pull request #2086 from psychocrypt/topic-amdOptimizeCNv8Div · c6846a82
  fireice-uk authored Nov 21, 2018
```
OpenCl: optimize cn-v8 div
```
  c6846a82
- Merge pull request #2084 from psychocrypt/topic-amd32bit · b06747f9
  fireice-uk authored Nov 21, 2018
```
AMD: use more 32bit operations
```
  b06747f9
- Merge pull request #2081 from psychocrypt/topic-reduceAPIOverhead · 7b7d4492
  fireice-uk authored Nov 21, 2018
```
OpenCL reduce API overhead
```
  7b7d4492
- OpenCL: cn1 optimization · 33e5825c
  psychocrypt authored Nov 21, 2018
```
small optimization for non cryptonight_v8 algorithms
```
  33e5825c
Nov 20, 2018
- Merge pull request #2085 from psychocrypt/topic-amdOptimizeDiv · 1b2b4d30
  fireice-uk authored Nov 20, 2018
```
OpenCL: optimize cn-heavy div
```
  1b2b4d30
- OpenCl: optimize cn-v8 div · bff5b000
  SChernykh authored Nov 20, 2018
```
- optimize division
```
  bff5b000
- Merge pull request #2078 from psychocrypt/topic-cudaReduceSharedMemFootprint · b7ffd6b9
  fireice-uk authored Nov 20, 2018
```
CUDA: reduce cn-v8 shared mem footprint
```
  b7ffd6b9
- Merge pull request #2080 from psychocrypt/topic-reduceSharedMemUsage · 26830090
  fireice-uk authored Nov 20, 2018
```
OpenCL: reduce local mem footprint
```
  26830090
- Merge pull request #2079 from psychocrypt/topic-cnv8OptimizeDiv · a7e30eb5
  fireice-uk authored Nov 20, 2018
```
CUDA: optimize cn-v8 div
```
  a7e30eb5
- OpenCL: optimize cn-heavy div · 9813e1c0
  SChernykh authored Nov 20, 2018
```
optimize cryptonight_heavy diff
```
  9813e1c0
- AMD: use more 32bit operations · f40c54e3
  psychocrypt authored Nov 20, 2018
```
- change a few 64bit variables into 32bit.
- provide defines type quallified
```
  f40c54e3
- Merge pull request #2077 from psychocrypt/topic-optimizeCUDAHeavyDiv · 6a95f0bb
  fireice-uk authored Nov 20, 2018
```
CUDA: optimize cn-heavy div
```
  6a95f0bb
Nov 19, 2018

OpenCL reduce API overhead · 6c563c9d

psychocrypt authored Nov 19, 2018

- remove useless `clFinish`
- avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)

6c563c9d

OpenCL: reduce local mem footprint · 6f283928

psychocrypt authored Nov 19, 2018



Reduce local memory foot print to increase the occupancy.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

6f283928

CUDA: optimize cn-v8 div · 4a7fde13

psychocrypt authored Nov 19, 2018



port optimizations from OpenCL.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

4a7fde13

CUDA: reduce cn-v8 shared mem footprint · ae8ba7f0

psychocrypt authored Nov 19, 2018

Use only the half AES matrix and compute the other half in place.
This PR increases the possible occupancy.

ae8ba7f0

CUDA: optimize cn-heavy div · 0c1d805a

psychocrypt authored Nov 19, 2018



port OpenCl optimized division to CUDA

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

0c1d805a

Nov 17, 2018
- Merge pull request #2071 from psychocrypt/topic-changeBackendLoadOrder · 447fef4b
  fireice-uk authored Nov 17, 2018
```
change load order for backends
```
  447fef4b
- Merge pull request #2070 from psychocrypt/topic-amdRefactoring · 1755f5e8
  fireice-uk authored Nov 17, 2018
```
Topic amd refactoring
```
  1755f5e8
- change load order for backends · cf959a1c
  psychocrypt authored Nov 17, 2018
```
If CUDA is loaded before AMD but no CUDA is available it can be happen that the embadded OpenCL code is empty.
This is only an issue if the binary is builded static on a different system.
```
  cf959a1c
Nov 16, 2018

fix ROCm compile · 18dbff68
psychocrypt authored Nov 17, 2018
```
define shared memory in the outer scope
```
18dbff68
optimize cn-heavy div · e6177f1c
SChernykh authored Nov 17, 2018
```
x-ref: https://github.com/xmrig/xmrig-amd/pull/192
```
e6177f1c

Optimize OpenCl · 28ef8e3d

SChernykh authored Nov 16, 2018



- optimize kernel cn0 and cn2
- optimize vast int math
- use more 32bit variables

Co-authored-by: psychocrypt <psychocryptHPC@gmail.com>

28ef8e3d

Nov 14, 2018
- Merge pull request #2063 from psychocrypt/topic-version2.6.0 · 76456d13
  fireice-uk authored Nov 14, 2018
```
version increase to 2.6.0
```
  76456d13
- version increase to 2.6.0 · 6ac129fe
  psychocrypt authored Nov 14, 2018
```
bumo version for next release
```
  6ac129fe
Nov 11, 2018
- Merge pull request #2045 from psychocrypt/topic-speedupAMDCNHeavy · c1eac4ac
  fireice-uk authored Nov 11, 2018
```
AMD: speedup cryptonight_heavy division
```
  c1eac4ac
Nov 06, 2018

Merge pull request #1994 from Spudz76/dev-cudaDocs · e9f7b837
psychocrypt authored Nov 06, 2018
```
(doc/compile) Add better CUDA SDK vs Driver information
```
e9f7b837

AMD: speedup cryptonight_heavy division · bfb3243c

SChernykh authored Nov 06, 2018

optimize the devision in cryptonight_heavy and cryptonight_haven

import of https://github.com/xmrig/xmrig-amd/pull/185/commits/5d9b9334654df25cea7707f667990fd1577ed290

bfb3243c

Oct 27, 2018
- (doc/compile) Add better CUDA SDK vs Driver information · 57e8615e
  Tony Butler authored Oct 21, 2018
  
  57e8615e
Oct 25, 2018
- Merge pull request #2010 from psychocrypt/topic-versionIncreaseTo2.5.2 · 07d2de33
  fireice-uk authored Oct 25, 2018
```
update version to 2.5.2
```
  07d2de33
- Merge pull request #2003 from psychocrypt/fix-cudaWrongNumberOfThreads · eefd057d
  fireice-uk authored Oct 25, 2018
```
NVIDIA: fix wrong number of threads
```
  eefd057d
- update version to 2.5.2 · c5b7c80b
  psychocrypt authored Oct 25, 2018
  
  c5b7c80b
Oct 24, 2018

Merge pull request #2001 from jagerman/graft-cnv2 · 4c64ffe1
psychocrypt authored Oct 24, 2018
```
Update for upcoming Graft CNv2 fork
```
4c64ffe1

NVIDIA: fix wrong number of threads · 954296ed

psychocrypt authored Oct 24, 2018

In the cuda backend for monero we start always twice as much threads as needed.
Those threads are than removed after the AES matrix is copied to the shared memory.
Never the less it is the result of an copy past bug.

- start correct number of threads for `monero`

954296ed