Commits · 307dda8377052d80bd54d4f52e67add9749f2077 · Recolic / azure-cloud-mining-script

Nov 30, 2018

OpenCL: comp mode optimization · 307dda83

psychocrypt authored 6 years ago

Disable compatibility mode if intensity is a multiple of worksize. In that case enabled compaibility mode will only slow down the miner.

307dda83

Nov 27, 2018

OpenCL: thread interleaving · d8316f7d

psychocrypt authored 6 years ago

If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes.

This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details.

- introduce a new config option `interleave`
- implement thread interleaving

d8316f7d

Nov 21, 2018

OpenCl: optimize strided index 1 · 39fa7c62

psychocrypt authored 6 years ago


Use `mul24` to speedup the scratchpad index calculation.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

39fa7c62

OpenCL: add strided_index 3 · 3c9442ce

psychocrypt authored 6 years ago


Add new striding index where the memory is chunked by the size of the work group (worksize).

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

3c9442ce

OpenCL: cn1 optimization · 33e5825c
psychocrypt authored 6 years ago
```
small optimization for non cryptonight_v8 algorithms
```
33e5825c

Nov 20, 2018
- OpenCl: optimize cn-v8 div · bff5b000
  SChernykh authored 6 years ago
```
- optimize division
```
  bff5b000
- OpenCL: optimize cn-heavy div · 9813e1c0
  SChernykh authored 6 years ago
```
optimize cryptonight_heavy diff
```
  9813e1c0
- AMD: use more 32bit operations · f40c54e3
  psychocrypt authored 6 years ago
```
- change a few 64bit variables into 32bit.
- provide defines type quallified
```
  f40c54e3
Nov 19, 2018

OpenCL reduce API overhead · 6c563c9d

psychocrypt authored 6 years ago

- remove useless `clFinish`
- avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)

6c563c9d

OpenCL: reduce local mem footprint · 6f283928

psychocrypt authored 6 years ago


Reduce local memory foot print to increase the occupancy.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

6f283928

CUDA: optimize cn-v8 div · 4a7fde13

psychocrypt authored 6 years ago


port optimizations from OpenCL.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

4a7fde13

CUDA: reduce cn-v8 shared mem footprint · ae8ba7f0

psychocrypt authored 6 years ago

Use only the half AES matrix and compute the other half in place.
This PR increases the possible occupancy.

ae8ba7f0

CUDA: optimize cn-heavy div · 0c1d805a

psychocrypt authored 6 years ago


port OpenCl optimized division to CUDA

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

0c1d805a

Nov 17, 2018

change load order for backends · cf959a1c

psychocrypt authored 6 years ago

If CUDA is loaded before AMD but no CUDA is available it can be happen that the embadded OpenCL code is empty.
This is only an issue if the binary is builded static on a different system.

cf959a1c

Nov 16, 2018

fix ROCm compile · 18dbff68
psychocrypt authored 6 years ago
```
define shared memory in the outer scope
```
18dbff68
optimize cn-heavy div · e6177f1c
SChernykh authored 6 years ago
```
x-ref: https://github.com/xmrig/xmrig-amd/pull/192
```
e6177f1c

Optimize OpenCl · 28ef8e3d

SChernykh authored 6 years ago


- optimize kernel cn0 and cn2
- optimize vast int math
- use more 32bit variables

Co-authored-by: psychocrypt <psychocryptHPC@gmail.com>

28ef8e3d

Nov 06, 2018

AMD: speedup cryptonight_heavy division · bfb3243c

SChernykh authored 6 years ago

optimize the devision in cryptonight_heavy and cryptonight_haven

import of https://github.com/xmrig/xmrig-amd/pull/185/commits/5d9b9334654df25cea7707f667990fd1577ed290

bfb3243c

Oct 24, 2018

NVIDIA: fix wrong number of threads · 954296ed

psychocrypt authored 6 years ago

In the cuda backend for monero we start always twice as much threads as needed.
Those threads are than removed after the AES matrix is copied to the shared memory.
Never the less it is the result of an copy past bug.

- start correct number of threads for `monero`

954296ed

Oct 16, 2018
- fix AMD driver 14 · 6fc6e3a5
  psychocrypt authored 6 years ago
```
Fix the fix from #1945. The initial fix produces invalid results.
```
  6fc6e3a5
- Remove dead code · 13e35074
  Hans Kristian Rosbach authored 6 years ago
  
  13e35074
- Add missing test for cryptonight_lite · d79a4e7b
  Hans Kristian Rosbach authored 6 years ago
  
  d79a4e7b
Oct 15, 2018

fix broken AMD OpenCL compile · 2a0d565b

psychocrypt authored 6 years ago

The AMD compiler for OpenCL shipped with the driver 14XX is broken
and can not compile xmr-stak since the monero v8 changes are introduced.

- workaround a simple compare.
- add new device define `OPENCL_DRIVER_MAJOR`

2a0d565b

Oct 11, 2018

NVIDIA: support for multiple CUDA libs · 732b0e41

psychocrypt authored 6 years ago

Allow to ship the miner with multiple cuda backends those depends on different driver versions.
This will allow to support Turing/Volta and old Fermi GPU within one release.

- add support to search for the first working CUDA backend
- add some more messages to support better debugging (if a user has some issues)

732b0e41

Oct 10, 2018

NVIDIA: tweak `get_reciprocal` · b1504b36
SChernykh authored 6 years ago
```
- remove helper array to perform division
- tweak `get_reciprocal`
```
b1504b36

NVIDIA: rename config option `comp_mode` · bd4a4c94

psychocrypt authored 6 years ago

The name `comp_mode` for a memoy load pattern if a bad choosen name.
Therefore I changed it to `mem_mode` which also gives use the possibility
to add new mode later if needed.

- rename `comp_mode` to `mem_mode`
- fix documentation

bd4a4c94

fix right bitshift in `amd_bitalign` · b4387ac0

psychocrypt authored 6 years ago

In the current implementation the bit align is using signed integer which results in pulling in
ones in the case the sign bit is set.

- cast to unsigned integer before using bitshift

b4387ac0

CUDA: fix invalid results · ed2168b4

psychocrypt authored 6 years ago

If `comp_mode` is false the results on a windows platform will be invalid.
The reason for that is that `ulong4` is in windows 16byte and in linux 32byte.

thx @xmrig for finding and solving the issue

fix #1873

ed2168b4

Oct 08, 2018

improve error message · 58b7c66c

psychocrypt authored 6 years ago

Add a suggestion to an common line which is shown in the event of an crash under windows.

58b7c66c

CUDA: add compatibility mode · 594a5b4d
psychocrypt authored 6 years ago
```
Add compatibility mode for CUDA to avoid invalid shares.
```
594a5b4d
select hash function from function array · 801556f6
psychocrypt authored 6 years ago
```
Use an array  instead of a if cascade to select the hasing function for CUDA.
```
801556f6

compatibility and better messages · 9e592ec5

psychocrypt authored 6 years ago


- add more descriptive messages if memory allocation fails
- add gnu compiler flags: `noexecstack` to support systemd
- handle cases where memroy allocation fails

Co-authored-by: Tony Butler <spudz76@gmail.com>

9e592ec5

CUDA: use volatile pointer · eb8376fa

psychocrypt authored 6 years ago

Use volatile pointer to be sure that the compiler is not caching the values.

eb8376fa

CPU: fix logical error · 53652d35
psychocrypt authored 6 years ago
```
Fix wrong warning about unknown ASM type
```
53652d35

Oct 07, 2018

fix crash with monero and strided_index · 1c0ef154

psychocrypt authored 6 years ago

Strided index 1 is not allowed for cryptonight_v8 and monero.
In the case the dev pool is set to monero and the user tuned there settings for
an other currency the miner will crash if strided index or memChunk is not
fitting the requirement to mine monero.
This PR detects wrong configurations and will set strided index and memChunk to a valid
value but only for cryptonight_v8. The user pool settings will only be changed if monero or
cryptonight_v8 is selected.

1c0ef154

OpenCL: fix definition range for unroll · 746037d8
psychocrypt authored 6 years ago
```
fix #1870

- remove zero from the valod definition range for the loop unroll option
```
746037d8

Oct 06, 2018
- Fix two new warnings within new code · 2370aeef
  Tony Butler authored 6 years ago
  
  2370aeef
Oct 05, 2018

fix invalid shares · 8e1e7447

psychocrypt authored 6 years ago

With rocm we fighted very long with invalid shares. This is now solved with rocm 1.9 and
this tiny fix.
It is not fully clear where a memory optimization is kicking in and break the kernel `Groestl` if the variables `M` and `H` are not `volatile`.
The performance ill not change with this fix.

The fix is tested with rocm 1.9 with a VEGA64 and a RX570

8e1e7447

CUDA: tine cryptonight_v8 · 99a12cb6

psychocrypt authored 6 years ago

Read memory in bigger chunks per thread to increase the used memory bandwith.
Use for Kepla and Fermi GPUs the old autosuggestion instead of the new settings for cryptonight_v8.

99a12cb6

add cpu family and model detection · 21ce0385

psychocrypt authored 6 years ago


Helper functions to select the asm version based on the number of used hashes per threads and the family name of the cpu.

- use the noew cpu type functions to fix the wrong AMD family detection in `autoAdjust.hpp`
- allow to set the asm version to `auto`
- rename asm option `intel` to `intel_avx`
- rename asm option `ryzen` to `amd_avx`

Co-authored-by: fireice-uk <fireice-uk@users.noreply.github.com>

21ce0385