originally planned to add cpu version bfs. now merge it when finishing cpu debugging and get correct result.