Basically there are several ways to do it, either use a sort, there’s some examples in cuda/directx sdk.
You can also build a grid data structure, either do a linked list (fast to build, slow to lookup), or a histopyramid (bit slower to build, but much faster to lookup, specially since you can sample larger cells in one go).
Explains histopyramid principles. In your case it’s a bit different but it show the concept.
Depending of what you need to do with neighbour, each technique can have advantages.
For simple forms of “connect all” (with a rather low radius) linked list will do just fine.
If you need a lot of neighbour lookup (swarms/sph), histopyramid will give you a better tradeoff despite the slower build.