We are currently working on an installation and are experiencing some performance issues. We thought some of you might have experienced something similar and have some ideas or suggestions.
Following situation. We have 4 different patches/visualizations.
On their own they all run with about 50/60 fps constantly.
For our installation we need to switch between them in a fixed interval. Therefore we combine them in one patch and use a timeline for activating/deactivating & stop evaluating them.
Even though, only one of the patches is active and evaluated at a time, the framerate drops to about 25-30fps.
We tried to investigate and started with a blank patch and found out that when running one visualization on its own it works fine.
After dragging a second visualization/patch next to it and connecting it to the rendered group the framerate shortly drops significantly as expected.
Surprisingly, disconnecting it afterwards does not help to recover the previous framerate.
Even deleting the second visualization patch completely does not help to get back to 50-60fps.
So we assume there are some leftover processes running in the background that we are not aware of.
Has anybody experienced something similar?
Is there a preferred way of switching between different patches and making sure that the deactivated ones need as little computing power/memory as possible?
Any help, tips, ideas that help us debug are highly appreciated.
Here is some technical background information:
We are running vvvv32.4 64bit, DX11 pack
2 visualizations use instance noodles.
Intel i7, Windows 10 Prof.
32 Gb Ram
2x Geforce 1080TI with 2 x Dual Link DVI
It’s a little late. So just off the top of my head:
Have you got a Renderer TTY in your patch?
If not create one and see if it shows any errors.
Also enable the exception dialog to see if one pops up that doesn’t log in tty.
Enable Debug Timing mode. See how many ticks the scene-subpatches are consuming. The amount should decrease significantly when the patches are not evaluated. If it doesn’t decrease that much dig down deeper and look for the subpatch with the largest number on it. Continue till you find the culprit.
What seems to be the bottleneck cpu or gpu / what does the perfmeter say?
Did you have a look at the taskmanager? How is the processor load?
Make sure any powersaving settings for your CPU are disabled.
Power Control Panel > Select Power Plan > High performance.
And/or boot into BIOS and try disabling CPU C-States.
Use a tool like GPU-Z to determine the performance (clock speed etc) of your gfx card.
Does the clock speed fluctuate?
Goto your Nvidia Control Panel > Manage 3D Settigs > Global Settings Tab
and set Power Management Mode to “Prefer Maximal Performance”.
edit/add2: Also in GPU-Z have a look at the memory utilization.
Do you have more than one mainloop node in your subpatches / scenes?
Do the subpatches/scenes contain any (hidden) renderers or preview nodes that are by any chance on a different monitor/ device than the scene that is currently active?
Have you tried with one Card disabled (using the device manager) or even better removed completely?
Is vsync enabled and does the behaviour change with the setting on/off?
If nothing else helps and your scenes are really (CPU) performance hogs:
Run every scene in it’s own vvvv instance (Four which is nice because your cpu has four cores.
Make sure to look up “set process affinity mask”),
Share the visual outputs as textures with a fifth instance that contains the final renderer(s) and solely does the scene switching/blending. The fifth instance can of course still maintain some sort of logic that disables stuff (via shared memory) in the other 4 instances / scenes.
thanks for this extensive list.
One important thing to mention is that we currently use 4 outputs per card, each at 1280x720.
(Originally at 1920x1080), arranged side by side in the display settings
We’re spreading one texture across 4 screens per graphics card, so we hoped that this separates the rendering calculations internally.
Have you got a Renderer TTY in your patch?
sometimes a division by zero pops up when a video texture input is not yet available, but no permanent errors
I put all other patches to sleep by diabling and then non-evaluating them so they rest at 300 ticks compared to 5000 when running.
Power settings “high performance” are activated
no, what does this do? Will google …
We’re using this, I checked the load, it was spread 60 to 30 on the 2nd card. Will check fluctuation
more than one mainloop node: no
hidden previews / renderers: no
with only one card dedicated to the running vvvv instance (from the nvidia settings tab) the GPU load goes up to 99% on this card. I tried to use both cards from two separate instances of vvvv, each with their own dedicated card: does not work.
vsync is off
I tried running the patches separately in their own instances but the performance settled in the same range when having opened two of 4 visualisations, even when minimizing one of them, so rendering only to a small texture, or “putting them to sleep”
The weird part is this:
Each patch runs on 60 on its own. "Surprisingly, disconnecting it afterwards does not help to recover the previous framerate. Even deleting the second visualization patch completely does not help to get back to 50-60fps."
OK. I’d really go for disabling one of the cards via device manager or removing it completely.
In my experience multi gpu setups are a real pita, hard to setup / debug and generally not advisable.
To simulate the same performance requirements with one card, “just” attach two 4K Monitors.
If this solves your initial perf problems you can go for two multistream transport splitters or more professional ( more expensive) two datapath Fx4.
be aware that non-evaluating keeps loaded things in memory still. even though it might give you quite a framedrop it’s worth trying to disconnect the rendergraph and setting spreadcounts to zero before disabling the patch. at least it would give you a hint whether it’s memory swapping that kills your framerate.
in case your gpu memory is not quite loaded but not maxed out. did you check how many resources you are (re)pushing to it each frame instead of just keeping it in memory there and reusing it? (null indirect drawing instead of direct instancing, fixed spreadcounts on dynamic buffers,…
another thing worth trying is checking the raw fps of the idle patches cpu and gpu together. Mainloop on raw mode, and fps to 600 or higher. set the renderer v-sync off and i guess it’s called ‘do not wait’ enabled (dx9 equivalent to immediate mode). knowing the idle and active cycle time per patch you knowh whether you are on a general performance limit already. you can go further and do that individually for cpu and gpu via timing node (preparegraph - render+present time)
even though your overall pixelamount isn’t high and it should be a bottleneck on the fillrate. how’s performance if you just use 2 outputs on the computer (still calculating the whole scene)? 2 computers with one card each if its not too hard to convert and sync?
with 4 separate instances, did you try running two rendergraphs on the second card and share it back? kind of a weird way of balancing the gpu load… but might work if the instances don’t have to be in sync.
I will try out your suggestions and report back. The spreadcount zeroing & using 2 splitters on one card being the lowest hanging fruits to test.
Ok, so I have managed to gain about 4-5 fps by doing the following for when patches go to sleep, so before I disable all shaders and groups:
reducing spreadcount in two of four visualisations using instance noodles instancer and splines
set particle system buffer from 8192 to 128 particles
Then I removed cards with the device manager and simulated double of the final output by attaching four screens, so
3440x1440 + 2560x1440 + 2560x1440 + 1920x1080 =
14.400.000 instead of 1280x720 x 8 = 7.372.800
result: almost solid 60Hz
Looks like this (one card + splitter*) is definitely the way to go.
@dimix yep, we share a 512x512 texture from another vvvv x86 instance (uses openCV) to the x64 instance.
As far as I remember the framerate was not changing significantly with or without the camera tracking/texture sharing.
hey everybody. After some days of patching we managed to get if from 27 frames to steady 40, some patches even 60 now. There was not that one solution that solved it but a lot of the small tips helped to constantly improve it. What i can definitely say and what is an important learning, two cards are not necessarily better than one. maybe even worse. So if you have the option and dont need the full resolution of 2 cards and its more about the number of outputs, go with one card and a splitter instead. Also SetProcessAffinityMask helped to get some more frames by separating the visuals from the rest.