Commits · ab19d370439f566666596a0b7d3ce9c8d9489ea7 · Recolic / azure-cloud-mining-script

Dec 03, 2018

OpenCL: enable cn_v8 optimization for NVIDIA · ab19d370

psychocrypt authored 6 years ago

NVIDIA is using clang as device compiler so the reciprocal optimizations was disabled with #2104.

- re-enable optimized reciprocal calculation

ab19d370

Dec 02, 2018

OpenCl: fix NVIDIA · 1b27f0f3

psychocrypt authored 6 years ago

- fix broken compile: change used `ULL` to `UL` because `UL` is defined as 64bit
- fix memory copy to shared memory via vload8 (somehow it create wrong access)

1b27f0f3

OpenCL: auto config two threads per GPU · e46226fa

psychocrypt authored 6 years ago

The auto config generates for AMD devices now by default two threads per GPU.

- remove the savety 128MiB memory now only from the max available GPU memory not from the avaialble memory for one alloc call
- extend the memory documentation in amd.txt

e46226fa

fix clamp implementation · b606304b
psychocrypt authored 6 years ago
```
Due to a wrong implementation clamp was not working.
```
b606304b

Nov 30, 2018

OpenCL: opimize reciprocal calculation · bc91088a

psychocrypt authored 6 years ago


use for non clang (Rocm) OpenCL a optimized reciprocal calculation without lookup table.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

bc91088a

OpenCL: comp mode optimization · 307dda83

psychocrypt authored 6 years ago

Disable compatibility mode if intensity is a multiple of worksize. In that case enabled compaibility mode will only slow down the miner.

307dda83

Nov 29, 2018
- Added Cryptonight-Superfast · 053190bb
  LPHuynh authored 6 years ago
  
  053190bb
Nov 27, 2018

OpenCL: thread interleaving · d8316f7d

psychocrypt authored 6 years ago

If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes.

This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details.

- introduce a new config option `interleave`
- implement thread interleaving

d8316f7d

Nov 21, 2018

OpenCl: optimize strided index 1 · 39fa7c62

psychocrypt authored 6 years ago


Use `mul24` to speedup the scratchpad index calculation.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

39fa7c62

OpenCL: add strided_index 3 · 3c9442ce

psychocrypt authored 6 years ago


Add new striding index where the memory is chunked by the size of the work group (worksize).

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

3c9442ce

OpenCL: cn1 optimization · 33e5825c
psychocrypt authored 6 years ago
```
small optimization for non cryptonight_v8 algorithms
```
33e5825c

Nov 20, 2018
- OpenCl: optimize cn-v8 div · bff5b000
  SChernykh authored 6 years ago
```
- optimize division
```
  bff5b000
- OpenCL: optimize cn-heavy div · 9813e1c0
  SChernykh authored 6 years ago
```
optimize cryptonight_heavy diff
```
  9813e1c0
- AMD: use more 32bit operations · f40c54e3
  psychocrypt authored 6 years ago
```
- change a few 64bit variables into 32bit.
- provide defines type quallified
```
  f40c54e3
Nov 19, 2018

OpenCL reduce API overhead · 6c563c9d

psychocrypt authored 6 years ago

- remove useless `clFinish`
- avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)

6c563c9d

OpenCL: reduce local mem footprint · 6f283928

psychocrypt authored 6 years ago


Reduce local memory foot print to increase the occupancy.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

6f283928

CUDA: optimize cn-v8 div · 4a7fde13

psychocrypt authored 6 years ago


port optimizations from OpenCL.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

4a7fde13

CUDA: reduce cn-v8 shared mem footprint · ae8ba7f0

psychocrypt authored 6 years ago

Use only the half AES matrix and compute the other half in place.
This PR increases the possible occupancy.

ae8ba7f0

CUDA: optimize cn-heavy div · 0c1d805a

psychocrypt authored 6 years ago


port OpenCl optimized division to CUDA

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

0c1d805a

Nov 17, 2018

change load order for backends · cf959a1c

psychocrypt authored 6 years ago

If CUDA is loaded before AMD but no CUDA is available it can be happen that the embadded OpenCL code is empty.
This is only an issue if the binary is builded static on a different system.

cf959a1c

Nov 16, 2018

fix ROCm compile · 18dbff68
psychocrypt authored 6 years ago
```
define shared memory in the outer scope
```
18dbff68
optimize cn-heavy div · e6177f1c
SChernykh authored 6 years ago
```
x-ref: https://github.com/xmrig/xmrig-amd/pull/192
```
e6177f1c

Optimize OpenCl · 28ef8e3d

SChernykh authored 6 years ago


- optimize kernel cn0 and cn2
- optimize vast int math
- use more 32bit variables

Co-authored-by: psychocrypt <psychocryptHPC@gmail.com>

28ef8e3d

Nov 06, 2018

AMD: speedup cryptonight_heavy division · bfb3243c

SChernykh authored 6 years ago

optimize the devision in cryptonight_heavy and cryptonight_haven

import of https://github.com/xmrig/xmrig-amd/pull/185/commits/5d9b9334654df25cea7707f667990fd1577ed290

bfb3243c

Oct 24, 2018

NVIDIA: fix wrong number of threads · 954296ed

psychocrypt authored 6 years ago

In the cuda backend for monero we start always twice as much threads as needed.
Those threads are than removed after the AES matrix is copied to the shared memory.
Never the less it is the result of an copy past bug.

- start correct number of threads for `monero`

954296ed

Oct 16, 2018
- fix AMD driver 14 · 6fc6e3a5
  psychocrypt authored 6 years ago
```
Fix the fix from #1945. The initial fix produces invalid results.
```
  6fc6e3a5
- Remove dead code · 13e35074
  Hans Kristian Rosbach authored 6 years ago
  
  13e35074
- Add missing test for cryptonight_lite · d79a4e7b
  Hans Kristian Rosbach authored 6 years ago
  
  d79a4e7b
Oct 15, 2018

fix broken AMD OpenCL compile · 2a0d565b

psychocrypt authored 6 years ago

The AMD compiler for OpenCL shipped with the driver 14XX is broken
and can not compile xmr-stak since the monero v8 changes are introduced.

- workaround a simple compare.
- add new device define `OPENCL_DRIVER_MAJOR`

2a0d565b

Oct 11, 2018

NVIDIA: support for multiple CUDA libs · 732b0e41

psychocrypt authored 6 years ago

Allow to ship the miner with multiple cuda backends those depends on different driver versions.
This will allow to support Turing/Volta and old Fermi GPU within one release.

- add support to search for the first working CUDA backend
- add some more messages to support better debugging (if a user has some issues)

732b0e41

Oct 10, 2018

NVIDIA: tweak `get_reciprocal` · b1504b36
SChernykh authored 6 years ago
```
- remove helper array to perform division
- tweak `get_reciprocal`
```
b1504b36

NVIDIA: rename config option `comp_mode` · bd4a4c94

psychocrypt authored 6 years ago

The name `comp_mode` for a memoy load pattern if a bad choosen name.
Therefore I changed it to `mem_mode` which also gives use the possibility
to add new mode later if needed.

- rename `comp_mode` to `mem_mode`
- fix documentation

bd4a4c94

fix right bitshift in `amd_bitalign` · b4387ac0

psychocrypt authored 6 years ago

In the current implementation the bit align is using signed integer which results in pulling in
ones in the case the sign bit is set.

- cast to unsigned integer before using bitshift

b4387ac0

CUDA: fix invalid results · ed2168b4

psychocrypt authored 6 years ago

If `comp_mode` is false the results on a windows platform will be invalid.
The reason for that is that `ulong4` is in windows 16byte and in linux 32byte.

thx @xmrig for finding and solving the issue

fix #1873

ed2168b4

Oct 08, 2018

improve error message · 58b7c66c

psychocrypt authored 6 years ago

Add a suggestion to an common line which is shown in the event of an crash under windows.

58b7c66c

CUDA: add compatibility mode · 594a5b4d
psychocrypt authored 6 years ago
```
Add compatibility mode for CUDA to avoid invalid shares.
```
594a5b4d
select hash function from function array · 801556f6
psychocrypt authored 6 years ago
```
Use an array  instead of a if cascade to select the hasing function for CUDA.
```
801556f6

compatibility and better messages · 9e592ec5

psychocrypt authored 6 years ago


- add more descriptive messages if memory allocation fails
- add gnu compiler flags: `noexecstack` to support systemd
- handle cases where memroy allocation fails

Co-authored-by: Tony Butler <spudz76@gmail.com>

9e592ec5

CUDA: use volatile pointer · eb8376fa

psychocrypt authored 6 years ago

Use volatile pointer to be sure that the compiler is not caching the values.

eb8376fa

CPU: fix logical error · 53652d35
psychocrypt authored 6 years ago
```
Fix wrong warning about unknown ASM type
```
53652d35