ForceDirected (Graph) problems w/ large graphs

I’m using tonfilm’s ForceDirected (Graph) to visualize a rather big graph. Running a current test with only a fraction of the data, there’re 2x 899 IDs fed into _To ID_s and From ID of ForceDirected (Graph).

The bug:
Testing a smaller set of data is all fine and dandy, but the current set of data fails to be computed by ForceDirected (Graph): A spread of 2541 values returned by From XYZ and To XYZ is all NAN.00/MaxFloat, lacking their values, the interconnections fail to be drawn. Still the text labels are where they should be. My Dual Quad-Xeon 3.0 GHz with 16GB RAM, W7, b29.2 slows down to 1 FPS. Urks.

The first 100 slices of the 2541 values spread returned by From XYZ:

The assumed final set of data will be 10 - 25x larger, I wonder if this will ever work out. What’s the current status on using vvvv on a multicore architecture? Would be DX11 beneficial in terms of performance?

Your thoughts and comments are appreciated.

  • patrick

Kind of bumping this thread with a suggestion:
Constantly calculating attraction & repulsion forces for large data sets might be the underlying problem. Usually you set and forget the attraction/repulsion values, allowing to switch off calculations and save FPS. Like S+H’ing this.

Any opinions?

but i think if you have MaxFloats in there, you have some troubles on the input
about dx, you can prolly port it to compute, sure you will have significant fps improvement. To be real:
yes you can batch, and it will be 10 times faster then 2.5k drawcalls

Think I’m getting closer to what causes the Max.Float problem. Computing large data sets is heavy on the CPU (rendering all those texts is hard on the GPU too) as long as the graph hasn’t reached a more or less stable state, as long as the vertiex positions are dynamic due to attraction and repulsion forces not having settled (yet). Currently I’d need to set specific XYZW values which instantly render Max.Floats, when large data sets are applied, but significantly smaller data sets compute just fine. @readme already significantly improved the performance of my patch, but IMHO the core problem remains within the node itself.

It’d be best to generate/compute all to their final positions. This too would speed up finding values for good looking relative distances of clusters, branches etc. Maybe this is contradictory to a graph based on spring, /attraction and repulsion forces, but ATM it’s barely usable this way.

Regarding your last paragraph: it won’t get significantly faster. If it’s CPU bound now, calculating x steps in a single frame won’t be any faster than calculating x steps over x frames. And that layout algorithm is iterative, so there is no “shortcut” to the final position. Maybe you have to find and possibly implement another graph layout algorithm - there are quite some.

True that, Felix, but I just made it this far, not without help.

There’re many nice algorithms out there. A linear-linear model (attraction and repulsion proportional to distance between nodes), force-directed edge bundling algorithms (which looks very nice btw.), large scale graph optimized force-directed algorithms like the 2000 algorithm of Gajer et al, the Walshaw multilevel algorithm or Quigley and Eades and whatnot … alas, I’m unable to write such a plugin (implementing a different algorithm). Are you?

I’m aware this ain’t no hit a bang and-hey presto-there’s my graph at all. I can’t help the impression though, that I should rather use Gephi for the display part of my work (the parser built in vvvv runs fine by now), albeit this means dumping the installation-relevant parts of it like 3D visualization & the interactive part completely. This would render parts of the original concept useless. OTOH, I really need to get past the programming part, at last.

Still I’d propose efforts to make shaping large graphs easier, however that may be achieved.

Worth having a look at:

Shouldn’t be too hard implementing those in a plugin.

@tonfilm)) already implemented quickgraph (and I think ((user:patrick is already using it?)


Yes, also true, Björn.

Currently all output values are S+H’ed, enabling a freshly started patch to actually display the graph, but, alas, in a very early stage which hasn’t much to do with the expected layout. Not S+H’ed, we’re almost instantly to Max.Float. Dividing the input values (to avoid Max.Floats) and multiplying the output values won’t help too, the whole graph is just too ‘hot’. Letting the patch sit with the output S+H’ed for as long as 8hrs to have all values cool down bzw. iterate until reaching a cooler stage of the calculations didn’t help either: the moment you resample we’re Max.Float again. Or maybe there are no calculations going on while the output values are S+H’ed, resulting in a deadlock. Gähn.

A brief research for large scale optimized algorithms: (as seen above, but the latter seem more promising)

ForceAtlas2, a rather new algorithm which has recently been added to Gephi (a preview and comparison) (a multithreaded repository, java-based) (as above but single threaded)

OpenOrd (preview) (this code also stems from Gephi) (another repo, poss. the same code)

ForceAtlas2 seems to be the most fit of all for large scale networks/graphs. I’m not sure if porting from Java is a sensible approach, I couldn’t find a C based code for those. If you can think of competitors or even better algorithms, please post 'em here.

wow, patrick.

thanks for those nice explanations! this ForceAtlas2 thing seems to be really nice, and - in connection with these java sources - it might be possible to work that into the Graphlib.

one stupid question, just for the records:
what does Max.Float exactly mean? what can lead to a Max.Float?

Gerne sebl. (Wann machste denn an den JSON Knoten weiter? Sind die schon Contribution-reif?)

ForceAtlas2 could be a killer node, even more so when being multithreaded! I suppose Max.Float is being returned when a calculation renders values that are out of vvvv’s floating point range.

Just noticed that setting Speed to 0.1 avoids Max.Float for now, but it’ll take a loooong time for the graph to cool down (you’ll need to set S+H to 1 for the CPU heavy calculations to run). Once you try different values for the Constants, you likely produce Max.Floats again, forcing you to restart the patch. This way it’s a time hogging trial & error approach to get somewhere near a good looking graph :(

Regarding mutlithreading, testing a similarily sized data set (1538 nodes, 8032 edges) with the Gephi implementation of ForceAtlas2 (multithreaded), calculations are done on four of eight cores. Mind you, that’s bloody quick.

While porting the Java based ForceAtlas2 algorithm to vvvv would be favourable, it needs permission/support (probably) of the original author and someone to tackle this. Another approach could be parsing files (.gexf/.gml/.graphml/.json all transport the same, but differently ;-) ) made in Gephi and visualizing the computed 2D graph in vvvv (3D), skipping the computation altogether apart from generating the Z values.

Concluding this, parsing data generated by Gephi has the bonus of editing data in place as well leaving the CPU heavy part in a multithreaded environment, alas with the drawback of no 3D in the first place. A vvvv only approach, implementing ForceAtlas2, might be more flexible and easier on providing the data to compute, as you need to take care of separate Node and Edges tables in Gephi (which I consider to be more tedious to work with).

Comments, thoughts?