NVIDIA: optimze v8
- fix that shared memory for fast div is always used even if an algorithm is not using it - optimize fast div algo - store `division_result` (64_bit) per thread instead of shuffle around and store it as 32bit
Please register or sign in to comment