- Nov 30, 2018
-
-
psychocrypt authored
Disable compatibility mode if intensity is a multiple of worksize. In that case enabled compaibility mode will only slow down the miner.
-
- Nov 27, 2018
-
-
psychocrypt authored
If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes. This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details. - introduce a new config option `interleave` - implement thread interleaving
-
- Nov 21, 2018
-
-
psychocrypt authored
Use `mul24` to speedup the scratchpad index calculation. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
Add new striding index where the memory is chunked by the size of the work group (worksize). Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
small optimization for non cryptonight_v8 algorithms
-
- Nov 20, 2018
-
-
SChernykh authored
- optimize division
-
SChernykh authored
optimize cryptonight_heavy diff
-
psychocrypt authored
- change a few 64bit variables into 32bit. - provide defines type quallified
-
- Nov 19, 2018
-
-
psychocrypt authored
- remove useless `clFinish` - avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)
-
psychocrypt authored
Reduce local memory foot print to increase the occupancy. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
- Nov 16, 2018
-
-
psychocrypt authored
define shared memory in the outer scope
-
SChernykh authored
x-ref: https://github.com/xmrig/xmrig-amd/pull/192
-
SChernykh authored
- optimize kernel cn0 and cn2 - optimize vast int math - use more 32bit variables Co-authored-by:
psychocrypt <psychocryptHPC@gmail.com>
-
- Nov 06, 2018
-
-
SChernykh authored
optimize the devision in cryptonight_heavy and cryptonight_haven import of https://github.com/xmrig/xmrig-amd/pull/185/commits/5d9b9334654df25cea7707f667990fd1577ed290
-
- Oct 16, 2018
-
-
psychocrypt authored
Fix the fix from #1945. The initial fix produces invalid results.
-
Hans Kristian Rosbach authored
-
- Oct 15, 2018
-
-
psychocrypt authored
The AMD compiler for OpenCL shipped with the driver 14XX is broken and can not compile xmr-stak since the monero v8 changes are introduced. - workaround a simple compare. - add new device define `OPENCL_DRIVER_MAJOR`
-
- Oct 10, 2018
-
-
psychocrypt authored
In the current implementation the bit align is using signed integer which results in pulling in ones in the case the sign bit is set. - cast to unsigned integer before using bitshift
-
- Oct 08, 2018
-
-
psychocrypt authored
- add more descriptive messages if memory allocation fails - add gnu compiler flags: `noexecstack` to support systemd - handle cases where memroy allocation fails Co-authored-by:
Tony Butler <spudz76@gmail.com>
-
- Oct 07, 2018
-
-
psychocrypt authored
Strided index 1 is not allowed for cryptonight_v8 and monero. In the case the dev pool is set to monero and the user tuned there settings for an other currency the miner will crash if strided index or memChunk is not fitting the requirement to mine monero. This PR detects wrong configurations and will set strided index and memChunk to a valid value but only for cryptonight_v8. The user pool settings will only be changed if monero or cryptonight_v8 is selected.
-
psychocrypt authored
fix #1870 - remove zero from the valod definition range for the loop unroll option
-
- Oct 06, 2018
-
-
Tony Butler authored
-
- Oct 05, 2018
-
-
psychocrypt authored
With rocm we fighted very long with invalid shares. This is now solved with rocm 1.9 and this tiny fix. It is not fully clear where a memory optimization is kicking in and break the kernel `Groestl` if the variables `M` and `H` are not `volatile`. The performance ill not change with this fix. The fix is tested with rocm 1.9 with a VEGA64 and a RX570
-
- Oct 04, 2018
-
-
Tony Butler authored
-
- Oct 03, 2018
-
-
psychocrypt authored
- introduce monero oct 2018 fork as currency `monero` - remove monero7 - change all dev pools - those miner monero7 to handle the fork to monero - if the dev pool can not handle the fork to monero the currency is fixed set to `monero` (we can only handle 2 different currencies for user and dev pool) - remove guards those prevent to use the currency `monero`
-
- Sep 30, 2018
-
-
psychocrypt authored
add cpu implementation for the final monero POW
-
- Sep 22, 2018
-
-
Tony Butler authored
-
- Sep 21, 2018
-
-
psychocrypt authored
- remove unused host function (relict from old refactoring) - remove unused OpenCL full div function
-
- Sep 19, 2018
-
-
psychocrypt authored
- use optimzed div and sqrt - reduce memory footprint
-
psychocrypt authored
- fix code style issues - fix spelling issue - fix asm to support newer clang versions
-
psychocrypt authored
- reintroduce monero7 until the POW is final - update docs (add cryptonigh_v8)
-
psychocrypt authored
add option `unroll` for OpenCL to allow better tuning the main POW kernel.
-
psychocrypt authored
Create a special pass for NVIDIA GPUs to load memory chunks first into the shared memory. Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
- implement cryptonight_v8 - update auto adjust to fit the special requirements of `cryptonight_v8` - add fast math integer implementation for `sqrt`, `reciprocal` and `division` Co-authored-by:
SChernykh <sergey.v.chernykh@gmail.com>
-
psychocrypt authored
During the initialization of the compile parameter for OpenCL it could be that the fixed size buffer is to small. To avoid this we are now using `std::string`. There is no problem by using `std::string` because this part of code is not perfromance critical.
-
psychocrypt authored
If the first bit of the nonce is `1` (this is very often if we use a nicehash pool) than it could be that some OpenCL implementations handle the 64bit representation of the 32bit nonce on the device side as signed integer. During a right bitshift we pull wrong ones from the wrong higher part of the 64bit nonce representation into the 32bit part of the nonce. The result will be that the computed share is invalid. - explicit cast the nonce on the device to `uint` to avoid any side effects
-
- Sep 17, 2018
-
-
psychocrypt authored
Avoid that a OpenCL binary from the cache is used if the driver or xmr-stak version has changed.
-
- Sep 16, 2018
-
-
psychocrypt authored
There is a copy past mistake tha tthe type of the variable `memChunk` is not tested.
-
- Sep 13, 2018
-
-
psychocrypt authored
xmr-stak has several implementations for multi hash per thread. The results into 3 intepedent implementations. Each time the algorithm must be changed the possibility to introduce errors is very large. - unify the different cryptonight CPU implementations - simplify the function selection array to find the specilized cryptonight implementation - add a intermediat pointer to access the large state (similar to the old multi hash implementation) As side effect this change increases the speed of the single and multi hash algorithm.
-
- Jul 17, 2018
-
-
psychocrypt authored
OpenCl 1.2.is not allowing the subscript operator on buildin vector types. fix: use `.sX` to access vector components
-