- Feb 10, 2019
-
-
psychocrypt authored
Combine the shared memory for a hash within one struct. Reduce the shared memory footprint per hash by 64 byte.
-
psychocrypt authored
- rename variable names like `b` and `bb` to something with a little bit of meaning.
-
- Feb 07, 2019
-
-
psychocrypt authored
cryptonight_turtle is only cryptonight_v8 with a different scratchpad, iteration and mask value. We are using now the new machanism to describe such derived POWs.
-
psychocrypt authored
A POW is now defined by a function `f` and three degrees of freedom `f(iteration, scratchpad, mask)`. `f` is the base algorithm like `cryptonight, cryptonight_gpu` An easy to pars snytax to write the full POW definition down is: `cryptonight_gpu:0x0000c000:0x00200000:0x001fffc0` This change make it very easy to integrate the new trend of variate the number of iteations or the scratchpad size without modifying the full code.
-
- Feb 04, 2019
-
-
psychocrypt authored
Remove `static constexpr` within the global kernel. This is not supported by all CUDA versions.
-
- Feb 01, 2019
-
-
psychocrypt authored
optimize the algorithm for cryptonight_gpu autosuggestion
-
psychocrypt authored
psychocrypt committed 9 minutes ago - use precomuted indicies within the loop - `cn_explode_gpu` use all threads to load the state
-
- Jan 30, 2019
-
-
psychocrypt authored
- fix race condition during shared memory access - optimize memory access
-
fireice-uk authored
Co-authored-by:
psychocrypt <psychocryptHPC@gmail.com> Co-authored-by:
fireice-uk <fireice-uk@users.noreply.github.com>
-
- Jan 25, 2019
-
-
Brandon Lehmann authored
-
- Dec 29, 2018
-
-
psychocrypt authored
- add helper method `GetAllAlgorithms()` to get all active POW algorithms - select max scratchpad memory size based on the dev pool and user algorithms
-
- Nov 29, 2018
-
-
LPHuynh authored
-
- Nov 19, 2018
-
-
psychocrypt authored
port optimizations from OpenCL. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
Use only the half AES matrix and compute the other half in place. This PR increases the possible occupancy.
-
psychocrypt authored
port OpenCl optimized division to CUDA Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
- Oct 24, 2018
-
-
psychocrypt authored
In the cuda backend for monero we start always twice as much threads as needed. Those threads are than removed after the AES matrix is copied to the shared memory. Never the less it is the result of an copy past bug. - start correct number of threads for `monero`
-
- Oct 11, 2018
-
-
psychocrypt authored
Allow to ship the miner with multiple cuda backends those depends on different driver versions. This will allow to support Turing/Volta and old Fermi GPU within one release. - add support to search for the first working CUDA backend - add some more messages to support better debugging (if a user has some issues)
-
- Oct 10, 2018
-
-
SChernykh authored
- remove helper array to perform division - tweak `get_reciprocal`
-
psychocrypt authored
The name `comp_mode` for a memoy load pattern if a bad choosen name. Therefore I changed it to `mem_mode` which also gives use the possibility to add new mode later if needed. - rename `comp_mode` to `mem_mode` - fix documentation
-
psychocrypt authored
If `comp_mode` is false the results on a windows platform will be invalid. The reason for that is that `ulong4` is in windows 16byte and in linux 32byte. thx @xmrig for finding and solving the issue fix #1873
-
- Oct 08, 2018
-
-
psychocrypt authored
Add a suggestion to an common line which is shown in the event of an crash under windows.
-
psychocrypt authored
Add compatibility mode for CUDA to avoid invalid shares.
-
psychocrypt authored
Use an array instead of a if cascade to select the hasing function for CUDA.
-
psychocrypt authored
Use volatile pointer to be sure that the compiler is not caching the values.
-
- Oct 05, 2018
-
-
psychocrypt authored
Read memory in bigger chunks per thread to increase the used memory bandwith. Use for Kepla and Fermi GPUs the old autosuggestion instead of the new settings for cryptonight_v8.
-
- Oct 04, 2018
-
-
Tony Butler authored
-
Tony Butler authored
-
- Oct 01, 2018
-
-
psychocrypt authored
I disabled a few algorithms for fatser compile and missed to re-enable them.
-
psychocrypt authored
`uint` is unknown in windows, therefore switch to the better type `uint32_t`
-
- Sep 30, 2018
-
-
psychocrypt authored
- introduce a new schema where two threads work together on one hash - update autoadjustment - remove an mistake where shared memory was shrinked for gpus < sm_70
-
psychocrypt authored
-
- Sep 22, 2018
-
-
Tony Butler authored
-
- Sep 21, 2018
-
-
psychocrypt authored
Avoid branche differegence
-
- Sep 19, 2018
-
-
psychocrypt authored
- fix that shared memory for fast div is always used even if an algorithm is not using it - optimize fast div algo - store `division_result` (64_bit) per thread instead of shuffle around and store it as 32bit
-
psychocrypt authored
- use optimzed div and sqrt - reduce memory footprint
-
SChernykh authored
Add fast version for div and sqrt for the cuda backend
-
psychocrypt authored
- use shared memory to exchange
-
psychocrypt authored
implement `cryptonight_v8`
-
- Sep 13, 2018
-
-
psychocrypt authored
xmr-stak has several implementations for multi hash per thread. The results into 3 intepedent implementations. Each time the algorithm must be changed the possibility to introduce errors is very large. - unify the different cryptonight CPU implementations - simplify the function selection array to find the specilized cryptonight implementation - add a intermediat pointer to access the large state (similar to the old multi hash implementation) As side effect this change increases the speed of the single and multi hash algorithm.
-
- Aug 08, 2018
-
-
Tony Butler authored
-