There was an error fetching the commit references. Please try again later.
CUDA: reduce cn-v8 shared mem footprint
Use only the half AES matrix and compute the other half in place. This PR increases the possible occupancy.
Loading
Please register or sign in to comment