Vvvv instances communication

dottore · January 28, 2014, 8:46pm

Hello,

Some month ago I started to do some experiments with multiple vvvv instances and I found there are incredible computational power resources if you build your application in a proper way:

Most of the time you have big modules-engines that evaluate complex algorithms and just output simple data.
Allocating them on different thread (vvvv instances assigned to different CPU cores) really opens new degrees of possibilities in terms of CPU power usage.
From a sad 15% CPU usage I can jump to 100% if the patches are well structured, using all the 8 CPU cores instead of single one.

The most annoying part is vvvv instances data communications:
I’m wondering if there’s a better way than UDP to send data between two instances.
sendind raw data to local host is quite fast, but I’m thinking at something more practical, which requires no organization of data, like we do with S R nodes.
Also, if you want to send the same spread of values to two different instances, they can’t receive UDP from the same port, so you need to send twice, keeping track and organizing all the incoming-outcoming ports for every instance.

Shared Memory.
From what I understood this is the most direct way to give to the instances the access to the same resource.
I found shared memory nodes to share Textures (both dx9 and dx11), DShow9, Windows.
Nothing for raw or values.

Would not be cool to have a AsSharedMemory (Raw) node? also SharedMemory (Value) directly…

Apart these practical things I found the multi-vvvv approach extremely interesting.
What do you devvvvs thing about it? Maybe improving communication and facilitating software development on multiple instance could be one solution to single thread bottleneck?

I imagine a scenario where you can place a SharedMemory (raw) in your module, tag it with a string (like for Send nodes).

Then internally all the sharedMemory tags (with the associated pointers info) are sent to all the instances.

In another instance my GetSharedMemory (raw) node allows me to choose from available tags, or simply I just put the tag I need (like we do with string2enum in R nodes).

simple as R S routines…

am I missing something fundamental? :)

cheers

tekcor · January 28, 2014, 10:40pm

hey very interesting. I thought the same some times when i worked with multiple instances.

Maybe totally utopia but crazy would be a hotkey that let you asign nodes and modules to be run by one of the cores.

for exampl ctrl+c+1 vs ctrl+c+2. i mean the hotkeys are in this scenario probably the minor problem :D but the basic idea would be using the different cores within one instance, which is of course not a new idea. the development team for sure knows what difficulties this brings.

regarding more realizable stuff you could make S and R modules that have a central data management module. all the udp routing can work fully automatic. you just need to specify a port range. that would make totaly sense and would be kind of easy to make.

shared memory for values i dont know nothing about. but i wouldnt want to mess with the indexing here by hand. if you could write a spread to memory, asign an index to the spread and recall it in another application, fine, but still you would need to program something that asings again human write and readable string ids to it.

sounds reasonable at least to make this in the one or another way…

elektromeier · January 28, 2014, 11:54pm

shared memory (windows) let you share strings… wrap it in a module which converts to values is not an option?

guest · January 29, 2014, 12:07am

Some implentation of this might be what’s called for -

dottore · January 29, 2014, 12:25am

@elektromeier:
value2string conversion and reverse is definitively too heavy. not an option :)
I’m talking of sending thousand of values.

@guest:
I’m not a code programmer and I don’t know if that topic is relevant in this case scenario. Let’s see if someone can enlight about it.

bjoern · January 29, 2014, 12:32am

I somehow like the idea of “self-propagating shared memories”.
There are some tools like AccessChk that allow to enumerate shared memories (sections) systemwide. Unfortunately I didn’t find any code that does the same.

Patched a “little proof of concept” relying on AccessChk.
Adapt bat file to your needs.
Run it as admin (necessary for AccessChk).

SharedMemories.7z (90.8 kB)

sebl · January 29, 2014, 12:55am

@bjoern verynice! and it’s even sorted alphabetically!

vux · January 29, 2014, 12:57am

Assigning subpatches to core is something I guess many users always craved for, and in some extent is there wasn’t that many “polluter nodes”, this could be much easier (not said easy ;)

Funnily R/S are of that kind, they in total contradiction with modular approach, except in few rare cases eg: R/S in same patch, or in rare cases as pure globals, eg: application invariants.

In cross patch they generally create more mess than they solve (FrameDelay series closely following in creating junk, but that’s another story), and should never been used for modules (since your module becomes unusable without the corresponding S node, it’s not really a module anymore).

Next you have the issue of rendering (since for example a resource update MUST be on the rendering thread). In that case it’s a pain (you can double buffer resource but doing this in automatic way is rather hard, and you can easily run into issues with some buffer upload).

In the case you have a patch which doesn’t have any render node (eg: dx), just plain “in frame” computation then threading is rather trivial (it requires some GUI work of course to assist user).

And about multi instance, I feel it adds quite some logistics (it can have some really interesting benefits in some cases I admit, when you have your own patch that you start yourself before your gig for example and don’t mind a couple of copy/paste handles). UDP as localhost is really not that bad for this (as is TCP).

If amount of data is rather big, shared memory is more efficient (raw version would be pretty handy actually).

My two cents.

Noir · January 29, 2014, 9:59am

IMHO one of the most interesting post ever

tonfilm · January 30, 2014, 8:48pm

bjoerns tool is a very good solution already.
just dynamic size management is missing… should be a quick one. who is up for it?

guest · January 31, 2014, 3:19am

nice tool but like the sharedMemory value the limit is a spread of 72 values
per send/receive!

dottore · January 31, 2014, 5:42pm

Hey,
I tried bjoerns modules and it’s nice, we just need to put a size label in the shared memory name.
I’m not really confident with SharedMemory nodes and I didn’t really figured out how memory works.
you can set a “size in bytes” pin and i suppose it must match on the receiver.
Anyway seems there’s a limit on size (as guest is saying);
I couldn’t go higher then 2048 bytes.

Apart from this size limit, I think a SharedMemory (Raw) node would be nice and fister.

yo

Elias · January 31, 2014, 9:47pm

hm, what about some new nodes around ZeroMQ using the official .net implementation clrzmq.
they support various communication patterns, including the so called (and in this context very interesting)
pipeline pattern - used to distribute tasks among so called workers and collecting the results afterwards in the task collector. a distributor, a worker and a collector would just be a patch in our case, with pull/push nodes acting as its in-/outputs. say we’d have those pull and push nodes, where the pull node blocks until data is available and the push node doesn’t block, the whole distributed patch would even run in sync.

sebl · January 31, 2014, 11:20pm

ZeroMQ sounds great. there’s also this https://github.com/smakhtin/VVVV.Nodes.ZMQ
(never tried it out)

dottore · February 2, 2014, 11:41am

pipeline pattern seems quite interesting indeed. It could open some doors in distributed computing.
At this point would be nice to think at these nodes not only on single computer scenario. What if this approach would include also boygrouping?
you could really get a small renderfarm from few network connected pc.

4 pc
8 cores each
32 vvvv potentially running processes.
This is an extreme scenario and would be quite tricky to distribute properly all the tasks. Anyway it gives the idea of the amount of power we could get out.
I’m not very much into boygroup but from what I understand it goes one way: it distributes tasks to render machines.
one input (server)
several outputs (projections)
This distributed computing would allow easily the opposite approach:
several inputs (of course managed from the server)
one output (server or a different machine…or many boygrouped clients…:)
it would be needed to label each vvvv instance not just with an ip address but also with a thread ID. maybe the thread ID could be automatically assigned…

Ok, i’m running a bit… coming back to the main topic:
talking about realtime stuff it’s important to get data to be rendered as fast as possible.
this mean no waiting times.
the server needs to use data coming from other instances each frame, also if the worker instance has not finished to process.
In this scenario there’s no need to sync data.
eg.

I’ve a particle system
I allocate some complex interaction evaluation on a different vvvv instance. this instance makes heavy math and returns just a velocity vector.
if the allocated instance can’t complete the process at the same FPS of the main server, this one will use the latest available processed data it previously received (let’s say an S+H approach).
We could also split complex evaluations in blocks:
take the particle system: we have a 1 million slices spread (just an example). You could split this huge spread in blocks, give each to a different vvvv instance, then collect in server and render.
Again, you don’t need to wait and sync instances to the same framerate. the fast they can evaluate the best is. who cares if not all the blocks are updated at the same time (in some scenarios).

In the end,
would be nice to have these new nodes that send-receive data between instances (and eventually between boygrouped pcs) and support both sync and not-sync data sharing.

sorry for the epic post…

bjoern · February 2, 2014, 3:54pm

Made some tests with dynamic size management / labeling.
Unfortunately things get really unstable, at least when size and thus labels change constantly.
Also performance degrades a lot – suppose that’s related to opening/closing the handles all the time?

elektromeier · February 3, 2014, 12:07pm

reading dottores last post made me think that boygrouping actually allready offers a very convenient method to share data between different vvvv processes.

its actually possible to have several inputs with boygroup too by setting up a multiboygroup which can have several servers.

Now the problem is of course that boygroup works over network and not on the same machine which lead me to a stupid and probably not very practical idea.

what about setting up virtual machines, each one assigned to each core and each one with its own virtual network adapter. then start the servers on the virtual machines and one client on the real machine for rendering…

maybe theres a way to modify the boygroup functionality to work across several cores without having virtual machines or network???

bjoern · February 3, 2014, 3:59pm

Actually you can run a boygroup on one machine.

velcrome · February 3, 2014, 7:34pm

zach liebermann was recommending zeromq to us @ccl recently. seems a sensible recommendation too

elektromeier · February 3, 2014, 8:39pm

ok? without vm? how you adress the clients/servers without having individual ips?