Commits · 38e7eb0c74ba631c340d64ad190732603dfb16d6 · Recolic / azure-cloud-mining-script

Apr 01, 2019

psychocrypt authored 6 years ago

fix nvrtc deadlock with `cuda version != 10.1`: https://github.com/xmrig/xmrig-nvidia/issues/260

38e7eb0c

Mar 25, 2019

format all files · 87e146e3

xmr-stak-team authored 6 years ago

- add .clang-format from RYO

This PR removes the space indention by tabs.

87e146e3

Mar 10, 2019

fix opencl and cuda cryptonight_r caching · 3ebf66a3

psychocrypt authored 6 years ago

The original implementation of the cache release and create always new
kernel, this can lead into performance issues and my crashes.

3ebf66a3

fix masari · be2144d6

psychocrypt authored 6 years ago

Since masari increased the block size the miner crashed each time it
gets connected with a masari pool.

This PR extent the possible size of a block to 128 byte and updated the
kernel.

be2144d6

Mar 07, 2019

Support of CryptoNight v8 ReverseWaltz · 2d9087c7

EDDragonWolf authored 6 years ago

rebased version of #2261

Added support of CryptoNight v8 Reverse Waltz (named cryptonight_v8_reversewaltz here) - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v8 and with reversed shuffle operation

We plan to use CryptoNight v8 Reverse Waltz as new PoW algorithm for Graft (graft-project/GraftNetwork#234).

2d9087c7

fix CUDA compile · cd36058e
psychocrypt authored 6 years ago
```
- fix linker issues with CUDA8
- fix device selection
```
cd36058e

Mar 03, 2019
- NVIDIA: cryptonight_r · b7702413
  psychocrypt authored 6 years ago
```
implementation is derived from the reverence implementation in xmrig
```
  b7702413
Feb 13, 2019
- Implement cn-conceal · 2010eb5d
  wt4smith authored 6 years ago
```
fix bfactor


fix
```
  2010eb5d
Feb 10, 2019

CUDA: opdate cn_gpu auto adjust values · 71d313b7

psychocrypt authored 6 years ago

Optimize the auto adjustment for cn_gpu based on precomputed occupancy
values from CUDA10.

71d313b7

CUDA: use shared mem object · b361b395

psychocrypt authored 6 years ago

Combine the shared memory for a hash within one struct.
Reduce the shared memory footprint per hash by 64 byte.

b361b395

better variable nameining · 0c26cb7e

psychocrypt authored 6 years ago

- rename variable names like `b` and `bb` to something with a little bit
of meaning.

0c26cb7e

Feb 07, 2019

remove cn_turtle as native POW · 1033dc28

psychocrypt authored 6 years ago

cryptonight_turtle is only cryptonight_v8 with a different scratchpad,
iteration and mask value.
We are using now the new machanism to describe such derived POWs.

1033dc28

refactor POW definition · 3426e185

psychocrypt authored 6 years ago

A POW is now defined by a function `f` and three degrees of freedom `f(iteration, scratchpad, mask)`.
`f` is the base algorithm like `cryptonight, cryptonight_gpu`
An easy to pars snytax to write the full POW definition down is: `cryptonight_gpu:0x0000c000:0x00200000:0x001fffc0`

This change make it very easy to integrate the new trend of variate the
number of iteations or the scratchpad size without modifying the full
code.

3426e185

Feb 04, 2019

CUDA. fix static in global kernel · 7d07af6b

psychocrypt authored 6 years ago

Remove `static constexpr` within the global kernel. This is not
supported by all CUDA versions.

7d07af6b

Feb 01, 2019

CUDA: optimze cn_gpu auto suggestion · 3f6bd5a2
psychocrypt authored 6 years ago
```
optimize the algorithm for cryptonight_gpu autosuggestion
```
3f6bd5a2

cuda: optimize cn-gpu · e8ec9921

psychocrypt authored 6 years ago

psychocrypt committed 9 minutes ago
 - use precomuted indicies within the loop
 - `cn_explode_gpu` use all threads to load the state

e8ec9921

Jan 30, 2019

fix cuda 10 · adeeab6f

psychocrypt authored 6 years ago

- fix race condition during shared memory access
- optimize memory access

adeeab6f

Implement CN-GPU Proof-of-Work Algo · 346933d1

fireice-uk authored 6 years ago


Co-authored-by: psychocrypt <psychocryptHPC@gmail.com>
Co-authored-by: fireice-uk <fireice-uk@users.noreply.github.com>

346933d1

Jan 25, 2019
- Add CryptoNight Turtle Support. Special thanks to @DaveLong for his hard work in getting this done. · 749751e3
  Brandon Lehmann authored 6 years ago
  
  Unverified
  
  749751e3
Dec 29, 2018

improve POW algorithm selection · 758dbfb1

psychocrypt authored 6 years ago

- add helper method `GetAllAlgorithms()` to get all active POW
algorithms
- select max scratchpad memory size based on the dev pool and user
algorithms

758dbfb1

Nov 29, 2018
- Added Cryptonight-Superfast · 053190bb
  LPHuynh authored 6 years ago
  
  053190bb
Nov 19, 2018

CUDA: optimize cn-v8 div · 4a7fde13

psychocrypt authored 6 years ago


port optimizations from OpenCL.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

4a7fde13

CUDA: reduce cn-v8 shared mem footprint · ae8ba7f0

psychocrypt authored 6 years ago

Use only the half AES matrix and compute the other half in place.
This PR increases the possible occupancy.

ae8ba7f0

CUDA: optimize cn-heavy div · 0c1d805a

psychocrypt authored 6 years ago


port OpenCl optimized division to CUDA

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

0c1d805a

Oct 24, 2018

NVIDIA: fix wrong number of threads · 954296ed

psychocrypt authored 6 years ago

In the cuda backend for monero we start always twice as much threads as needed.
Those threads are than removed after the AES matrix is copied to the shared memory.
Never the less it is the result of an copy past bug.

- start correct number of threads for `monero`

954296ed

Oct 10, 2018

NVIDIA: tweak `get_reciprocal` · b1504b36
SChernykh authored 6 years ago
```
- remove helper array to perform division
- tweak `get_reciprocal`
```
b1504b36

NVIDIA: rename config option `comp_mode` · bd4a4c94

psychocrypt authored 6 years ago

The name `comp_mode` for a memoy load pattern if a bad choosen name.
Therefore I changed it to `mem_mode` which also gives use the possibility
to add new mode later if needed.

- rename `comp_mode` to `mem_mode`
- fix documentation

bd4a4c94

CUDA: fix invalid results · ed2168b4

psychocrypt authored 6 years ago

If `comp_mode` is false the results on a windows platform will be invalid.
The reason for that is that `ulong4` is in windows 16byte and in linux 32byte.

thx @xmrig for finding and solving the issue

fix #1873

ed2168b4

Oct 08, 2018
- improve error message · 58b7c66c
  psychocrypt authored 6 years ago
```
Add a suggestion to an common line which is shown in the event of an crash under windows.
```
  58b7c66c
- CUDA: add compatibility mode · 594a5b4d
  psychocrypt authored 6 years ago
```
Add compatibility mode for CUDA to avoid invalid shares.
```
  594a5b4d
- select hash function from function array · 801556f6
  psychocrypt authored 6 years ago
```
Use an array  instead of a if cascade to select the hasing function for CUDA.
```
  801556f6
- CUDA: use volatile pointer · eb8376fa
  psychocrypt authored 6 years ago
```
Use volatile pointer to be sure that the compiler is not caching the values.
```
  eb8376fa
Oct 05, 2018

CUDA: tine cryptonight_v8 · 99a12cb6

psychocrypt authored 6 years ago

Read memory in bigger chunks per thread to increase the used memory bandwith.
Use for Kepla and Fermi GPUs the old autosuggestion instead of the new settings for cryptonight_v8.

99a12cb6

Oct 04, 2018
- whitespace trims · 17e0b06e
  Tony Butler authored 6 years ago
  
  17e0b06e
Oct 01, 2018
- re-enable algorithm for cuda · 1e5bb803
  psychocrypt authored 6 years ago
```
I disabled a few algorithms for fatser compile and missed to re-enable them.
```
  1e5bb803
- remove using of type `uint` · 22e63ceb
  psychocrypt authored 6 years ago
```
`uint` is unknown in windows, therefore switch to the better type `uint32_t`
```
  22e63ceb
Sep 30, 2018

cuda: implement cryptonight_v8 · 5db405c2

psychocrypt authored 6 years ago

- introduce a new schema where two threads work together on one hash
- update autoadjustment
- remove an mistake where shared memory was shrinked for gpus < sm_70

5db405c2

Sep 21, 2018
- NVIDIA: sqrt optimization cryptonight_v8 · 2818a448
  psychocrypt authored 6 years ago
```
Avoid branche differegence
```
  2818a448
Sep 19, 2018

NVIDIA: optimze v8 · fd27561b

psychocrypt authored 6 years ago

- fix that shared memory for fast div is always used even if an algorithm is not using it
- optimize fast div algo
- store `division_result` (64_bit) per thread instead of shuffle around and store it as 32bit

fd27561b

NVIDIA: optimize div and sqrt · 659918f2
psychocrypt authored 6 years ago
```
- use optimzed div and sqrt
- reduce memory footprint
```
659918f2