ConnectAll gpu


I want to replicate connect all connectall in a shader.

Is there a reason this wouldn’t be possible?

I’m thinking - Dottore’s gpu particle method of passing xyz position as rgb texture into shader

for loop in shader to build an array, for each pixel, paired with every other pixel, using uv texture offset. (will be a large array! too large?)

check distance between all pairs

Draw line between each pair, depending on threshold of distance.

you’ll have a hard time comparing all the pixels.
gpu is good if you do parallel operations on all the pixels, independent of what the neighbourpixel is up to.
as soon you want to have data of a pixel further away than just the neighbour, it’s going to be a pain.
generally you are limited on how many loops you can do in a shader and loops have quite an impact on performance too.

but correct me if i’m wrong

cuda?.. no text …

all gpu stuff (cuda inc) is parallel stream based
meaning lots of parallel calculations are great (apply same thing everywhere)
but lots of sequential is not (go through 1 by 1, e.g. search)

that’s not to say there aren’t ways of this working.
i’d start with rendering the vertices into some kind of compressed output 2D map as 1 pixel each, colour values represent the original ID.
To compress whilst not overwriting each other, use different colour channels (3 times redundancy), offset in interleaves based on pixel ID (3x3 = 9 times redundnacy). together you’ve got 27 times redundancy = max 26 possible connections per vertex

Then each vertex only need a 3x3 pixel grid lookup in compressed space + 9 px lookup in original xyz space to find possible connections

Each connection is then defined in the output map as unidirectional connections as target ID in float values (possible overwrites here as well, max 2? connections per output pixel in this compressed space)

then another pass to move connections into input XYZ space in vertex shader

then another pass to draw lines in a geometry(?) shader… there’ll be a way around that as well (simply input all the linex in the mesh to start with)

Thanks for the advice elliot (and woei). I understand about 60% of what you’re saying :)

Think I’m going to leave the gpu version alone for the moment and work on an optimised cpu version. (patched initially)

I used the proximity dynamic subpatch from here boubles-proximity-dynamic to make an optimised 2d connect all.

In researching how to do something similar in 3d I encountered quadtrees and octrees and am currently attempting a 2d quadtree version before trying a 3d octree version.

The main obstacle so far is how to check the ‘contents’ of neighbouring quadtree quads. I’ll post a progress patch soon. Any tips welcome.

Do you know QuadTree (2d) and OctTree (3d) / vux-plugins-ntrees?

Hi Bjoern, yep I saw those. Work well but I’m trying to figure out how to work out which cells are neighbour cells, and which ‘particle’ groups are within those cells

Checking x/y/z ranges works fine to see if a particle is in a cell, though maybe there is a more efficient way.

The big problem is finding cell neighbours

Dunno how performant it will be within your system, but “NearestNeighbour (3d)” ? There’s also Vmath.Dis if you’re writing a dynamic plugin.


a friend of mine did a force directed graph plugin for me.
he uses a simple grid approach to find the neighbouring cells. works pretty good and fast.

the plug is still for interface v1 but maybe it helps you?

Don’t think that’ll help in this case. The problem is I need to find neighbouring quads to a selected quad, and there may be more than quad along each edge as they vary in size.

Have a look here -

yeah oct-tree or other binning approximation would be the way to go on cpu
if you can get a multi-core oct-tree implementation with threshold cutoff (no connections beyond radius R) then you can probably get decent fps up to maybe 10,000 (if you can get that into gpu quickly enough)

Found this csharp octree class

Has search within radius functionality built in!

I’ll see what I can do with it. May take me a while :)

What’s involved in making a plugin like this would be, multi threaded?

Can you point me towards some digestible documentation? Some before/after (ie single/multithreaded)code would be great.

And can you do multithreading satisfactorily in a dynamic plugin?



best acceleration structure for it would certainly be a spatial hash, it’s quite simple to build and region based queries are also simple.

10000 samples has a worst case scenario of 49+ million connections, so need to be careful with the threshold cutoff, that also makes a brute force gpu implementation quite heavy (since you need to process those 49m tests).

Just out of interest how many objects do you plan to have?

As many as I can get away with, but >2000 would be great

Nice link. I like the method of checking which cell lies under each corner of a bounding box round the object, to determine which cells are neighbours.

This only works with the grid way of dividing space rather than the using a tree. How much more efficient do you think it would be using trees (in 2d and 3d) rather than equally dividing the space?

And did you see that octree code I linked to? How’d it look?