NVIDIA: fix sumokoin
sumokoin is broken if `bfactor >= 5` is used (default for windows) sumokoin for `sm_20` is broken due to the missing extern shared memory - call phase3 kernel two times if sumokoin is enabled - create extern shared memory for phase3 kernel
Please register or sign in to comment