Commits · a39ee0886cf613b70490164c4f33b066230709bc · Recolic / azure-cloud-mining-script

Dec 29, 2018

OpenCL: allow more than two algorithms · a39ee088

psychocrypt authored 6 years ago

In the current implementation the POW algorithm in dev pool section of a
currency will not be taken into account during the binary creation.
This PR changes the behavior and allow to create binaries for more than two POW algorihms.

a39ee088

Dec 06, 2018

fix bittube2 · e01eebc2

psychocrypt authored 6 years ago

Since #2080 bittube2 is broken.

- reintroduce special AES function for bittube2

e01eebc2

Dec 03, 2018

OpenCL: enable cn_v8 optimization for NVIDIA · ab19d370

psychocrypt authored 6 years ago

NVIDIA is using clang as device compiler so the reciprocal optimizations was disabled with #2104.

- re-enable optimized reciprocal calculation

ab19d370

Dec 02, 2018

OpenCL: auto tuning option · af87b408

psychocrypt authored 6 years ago

Add an option to brute force intensity settings and lock in at the intensity with the highest hashrate.

- update decumentation of the `interleave` option to mention the side effect with `auto-tune`
- disable `interleave` auto adjustment if `auto-tune` is enabled
- jconf: add `auto-tune` as optional option

af87b408

OpenCl: fix NVIDIA · 1b27f0f3

psychocrypt authored 6 years ago

- fix broken compile: change used `ULL` to `UL` because `UL` is defined as 64bit
- fix memory copy to shared memory via vload8 (somehow it create wrong access)

1b27f0f3

OpenCL: auto config two threads per GPU · e46226fa

psychocrypt authored 6 years ago

The auto config generates for AMD devices now by default two threads per GPU.

- remove the savety 128MiB memory now only from the max available GPU memory not from the avaialble memory for one alloc call
- extend the memory documentation in amd.txt

e46226fa

fix clamp implementation · b606304b
psychocrypt authored 6 years ago
```
Due to a wrong implementation clamp was not working.
```
b606304b

Nov 30, 2018

OpenCL: opimize reciprocal calculation · bc91088a

psychocrypt authored 6 years ago


use for non clang (Rocm) OpenCL a optimized reciprocal calculation without lookup table.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

bc91088a

OpenCL: comp mode optimization · 307dda83

psychocrypt authored 6 years ago

Disable compatibility mode if intensity is a multiple of worksize. In that case enabled compaibility mode will only slow down the miner.

307dda83

Nov 29, 2018
- Added Cryptonight-Superfast · 053190bb
  LPHuynh authored 6 years ago
  
  053190bb
Nov 27, 2018

OpenCL: thread interleaving · d8316f7d

psychocrypt authored 6 years ago

If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes.

This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details.

- introduce a new config option `interleave`
- implement thread interleaving

d8316f7d

Nov 21, 2018

OpenCl: optimize strided index 1 · 39fa7c62

psychocrypt authored 6 years ago


Use `mul24` to speedup the scratchpad index calculation.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

39fa7c62

OpenCL: add strided_index 3 · 3c9442ce

psychocrypt authored 6 years ago


Add new striding index where the memory is chunked by the size of the work group (worksize).

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

3c9442ce

OpenCL: cn1 optimization · 33e5825c
psychocrypt authored 6 years ago
```
small optimization for non cryptonight_v8 algorithms
```
33e5825c

Nov 20, 2018
- OpenCl: optimize cn-v8 div · bff5b000
  SChernykh authored 6 years ago
```
- optimize division
```
  bff5b000
- OpenCL: optimize cn-heavy div · 9813e1c0
  SChernykh authored 6 years ago
```
optimize cryptonight_heavy diff
```
  9813e1c0
- AMD: use more 32bit operations · f40c54e3
  psychocrypt authored 6 years ago
```
- change a few 64bit variables into 32bit.
- provide defines type quallified
```
  f40c54e3
Nov 19, 2018

OpenCL reduce API overhead · 6c563c9d

psychocrypt authored 6 years ago

- remove useless `clFinish`
- avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)

6c563c9d

OpenCL: reduce local mem footprint · 6f283928

psychocrypt authored 6 years ago


Reduce local memory foot print to increase the occupancy.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

6f283928

Nov 16, 2018

fix ROCm compile · 18dbff68
psychocrypt authored 6 years ago
```
define shared memory in the outer scope
```
18dbff68
optimize cn-heavy div · e6177f1c
SChernykh authored 6 years ago
```
x-ref: https://github.com/xmrig/xmrig-amd/pull/192
```
e6177f1c

Optimize OpenCl · 28ef8e3d

SChernykh authored 6 years ago


- optimize kernel cn0 and cn2
- optimize vast int math
- use more 32bit variables

Co-authored-by: psychocrypt <psychocryptHPC@gmail.com>

28ef8e3d

Nov 06, 2018

AMD: speedup cryptonight_heavy division · bfb3243c

SChernykh authored 6 years ago

optimize the devision in cryptonight_heavy and cryptonight_haven

import of https://github.com/xmrig/xmrig-amd/pull/185/commits/5d9b9334654df25cea7707f667990fd1577ed290

bfb3243c

Oct 16, 2018
- fix AMD driver 14 · 6fc6e3a5
  psychocrypt authored 6 years ago
```
Fix the fix from #1945. The initial fix produces invalid results.
```
  6fc6e3a5
- Remove dead code · 13e35074
  Hans Kristian Rosbach authored 6 years ago
  
  13e35074
Oct 15, 2018

fix broken AMD OpenCL compile · 2a0d565b

psychocrypt authored 6 years ago

The AMD compiler for OpenCL shipped with the driver 14XX is broken
and can not compile xmr-stak since the monero v8 changes are introduced.

- workaround a simple compare.
- add new device define `OPENCL_DRIVER_MAJOR`

2a0d565b

Oct 10, 2018

fix right bitshift in `amd_bitalign` · b4387ac0

psychocrypt authored 6 years ago

In the current implementation the bit align is using signed integer which results in pulling in
ones in the case the sign bit is set.

- cast to unsigned integer before using bitshift

b4387ac0

Oct 07, 2018

fix crash with monero and strided_index · 1c0ef154

psychocrypt authored 6 years ago

Strided index 1 is not allowed for cryptonight_v8 and monero.
In the case the dev pool is set to monero and the user tuned there settings for
an other currency the miner will crash if strided index or memChunk is not
fitting the requirement to mine monero.
This PR detects wrong configurations and will set strided index and memChunk to a valid
value but only for cryptonight_v8. The user pool settings will only be changed if monero or
cryptonight_v8 is selected.

1c0ef154

Oct 06, 2018
- Fix two new warnings within new code · 2370aeef
  Tony Butler authored 6 years ago
  
  2370aeef
Oct 05, 2018

fix invalid shares · 8e1e7447

psychocrypt authored 6 years ago

With rocm we fighted very long with invalid shares. This is now solved with rocm 1.9 and
this tiny fix.
It is not fully clear where a memory optimization is kicking in and break the kernel `Groestl` if the variables `M` and `H` are not `volatile`.
The performance ill not change with this fix.

The fix is tested with rocm 1.9 with a VEGA64 and a RX570

8e1e7447

Oct 04, 2018
- whitespace trims · 17e0b06e
  Tony Butler authored 6 years ago
  
  17e0b06e
Sep 30, 2018
- iadd cryptonight_v8 tweak 2.2 · cac26b96
  psychocrypt authored 6 years ago
```
add cpu implementation for the final monero POW
```
  cac26b96
Sep 21, 2018

AMD: remove unused functions · fce822e5

psychocrypt authored 6 years ago

- remove unused host function (relict from old refactoring)
- remove unused OpenCL full div function

fce822e5

Sep 19, 2018

asm, style and spelling fixes · 1692c543

psychocrypt authored 6 years ago

- fix code style issues
- fix spelling issue
- fix asm to support newer clang versions

1692c543

AMD: add unroll option · 28f41a6e

psychocrypt authored 6 years ago

add option `unroll` for OpenCL to allow better tuning the main POW kernel.

28f41a6e

OpenCL: optimize NVIDIA pass · df1a4200

psychocrypt authored 6 years ago


Create a special pass for NVIDIA GPUs to load memory chunks first into the shared memory.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

df1a4200

OpenCl: cryptonight_v8 · 5608f8df

psychocrypt authored 6 years ago


- implement cryptonight_v8
- update auto adjust to fit the special requirements of `cryptonight_v8`
- add fast math integer implementation for `sqrt`, `reciprocal`  and `division`

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

5608f8df

OpenCL: avoid out of memory access · 16da9886

psychocrypt authored 6 years ago

During the initialization of the compile parameter for OpenCL it could be that the
fixed size buffer is to small. To avoid this we are now using `std::string`.
There is no problem by using `std::string` because this part of code is not perfromance critical.

16da9886

fix nicehash `invalid results` · 77160cf1

psychocrypt authored 6 years ago

If the first bit of the nonce is `1` (this is very often if we use a nicehash pool)
than it could be that some OpenCL implementations handle the 64bit representation of the 32bit
nonce on the device side as signed integer.
During a right bitshift we pull wrong ones from the wrong higher part of the 64bit
nonce representation into the 32bit part of the nonce.
The result will be that the computed share is invalid.

- explicit cast the nonce on the device to `uint` to avoid any side effects

77160cf1

Sep 17, 2018

avoid OpenCL binary missmatch · 2742ef09

psychocrypt authored 6 years ago

Avoid that a OpenCL binary from the cache is used if the driver or xmr-stak version has changed.

2742ef09