you’ll have a hard time comparing all the pixels.
gpu is good if you do parallel operations on all the pixels, independent of what the neighbourpixel is up to.
as soon you want to have data of a pixel further away than just the neighbour, it’s going to be a pain.
generally you are limited on how many loops you can do in a shader and loops have quite an impact on performance too.
all gpu stuff (cuda inc) is parallel stream based
meaning lots of parallel calculations are great (apply same thing everywhere)
but lots of sequential is not (go through 1 by 1, e.g. search)
that’s not to say there aren’t ways of this working.
i’d start with rendering the vertices into some kind of compressed output 2D map as 1 pixel each, colour values represent the original ID.
To compress whilst not overwriting each other, use different colour channels (3 times redundancy), offset in interleaves based on pixel ID (3x3 = 9 times redundnacy). together you’ve got 27 times redundancy = max 26 possible connections per vertex
Then each vertex only need a 3x3 pixel grid lookup in compressed space + 9 px lookup in original xyz space to find possible connections
Each connection is then defined in the output map as unidirectional connections as target ID in float values (possible overwrites here as well, max 2? connections per output pixel in this compressed space)
then another pass to move connections into input XYZ space in vertex shader
then another pass to draw lines in a geometry(?) shader… there’ll be a way around that as well (simply input all the linex in the mesh to start with)
yeah oct-tree or other binning approximation would be the way to go on cpu
if you can get a multi-core oct-tree implementation with threshold cutoff (no connections beyond radius R) then you can probably get decent fps up to maybe 10,000 (if you can get that into gpu quickly enough)
best acceleration structure for it would certainly be a spatial hash, it’s quite simple to build and region based queries are also simple.
10000 samples has a worst case scenario of 49+ million connections, so need to be careful with the threshold cutoff, that also makes a brute force gpu implementation quite heavy (since you need to process those 49m tests).
Just out of interest how many objects do you plan to have?