Slow FPS, framedrop cycling through multiple patches

Hi, everybody.
We are currently working on an installation and are experiencing some performance issues. We thought some of you might have experienced something similar and have some ideas or suggestions.

Following situation. We have 4 different patches/visualizations.
On their own they all run with about 50/60 fps constantly.
For our installation we need to switch between them in a fixed interval. Therefore we combine them in one patch and use a timeline for activating/deactivating & stop evaluating them.
Even though, only one of the patches is active and evaluated at a time, the framerate drops to about 25-30fps.

We tried to investigate and started with a blank patch and found out that when running one visualization on its own it works fine.
After dragging a second visualization/patch next to it and connecting it to the rendered group the framerate shortly drops significantly as expected.
Surprisingly, disconnecting it afterwards does not help to recover the previous framerate.
Even deleting the second visualization patch completely does not help to get back to 50-60fps.

So we assume there are some leftover processes running in the background that we are not aware of.
Has anybody experienced something similar?
Is there a preferred way of switching between different patches and making sure that the deactivated ones need as little computing power/memory as possible?

Any help, tips, ideas that help us debug are highly appreciated.

Here is some technical background information:
We are running vvvv32.4 64bit, DX11 pack
2 visualizations use instance noodles.

Hardware:
Intel i7, Windows 10 Prof.
32 Gb Ram
2x Geforce 1080TI with 2 x Dual Link DVI
SSD Harddrive

Thanks a lot
Best
Cedric

It’s a little late. So just off the top of my head:

Have you got a Renderer TTY in your patch?
If not create one and see if it shows any errors.
Also enable the exception dialog to see if one pops up that doesn’t log in tty.


edit/add1:
Enable Debug Timing mode. See how many ticks the scene-subpatches are consuming. The amount should decrease significantly when the patches are not evaluated. If it doesn’t decrease that much dig down deeper and look for the subpatch with the largest number on it. Continue till you find the culprit.

Also have a look at :
https://vvvv.org/documentation/debugging

What seems to be the bottleneck cpu or gpu / what does the perfmeter say?

Did you have a look at the taskmanager? How is the processor load?
Make sure any powersaving settings for your CPU are disabled.

Power Control Panel > Select Power Plan > High performance.

And/or boot into BIOS and try disabling CPU C-States.

Use a tool like GPU-Z to determine the performance (clock speed etc) of your gfx card.
Does the clock speed fluctuate?
Goto your Nvidia Control Panel > Manage 3D Settigs > Global Settings Tab
and set Power Management Mode to “Prefer Maximal Performance”.


edit/add2: Also in GPU-Z have a look at the memory utilization.

Do you have more than one mainloop node in your subpatches / scenes?

Do the subpatches/scenes contain any (hidden) renderers or preview nodes that are by any chance on a different monitor/ device than the scene that is currently active?

Have you tried with one Card disabled (using the device manager) or even better removed completely?


edit/add3:
Is vsync enabled and does the behaviour change with the setting on/off?

If nothing else helps and your scenes are really (CPU) performance hogs:
Run every scene in it’s own vvvv instance (Four which is nice because your cpu has four cores.
Make sure to look up “set process affinity mask”),
Share the visual outputs as textures with a fifth instance that contains the final renderer(s) and solely does the scene switching/blending. The fifth instance can of course still maintain some sort of logic that disables stuff (via shared memory) in the other 4 instances / scenes.

Hey Björn,

thanks for this extensive list.
One important thing to mention is that we currently use 4 outputs per card, each at 1280x720.
(Originally at 1920x1080), arranged side by side in the display settings
We’re spreading one texture across 4 screens per graphics card, so we hoped that this separates the rendering calculations internally.

  1. Have you got a Renderer TTY in your patch?
    yep
  • sometimes a division by zero pops up when a video texture input is not yet available, but no permanent errors
  1. Bottleneck
    = GPU
    I put all other patches to sleep by diabling and then non-evaluating them so they rest at 300 ticks compared to 5000 when running.

  2. Power settings “high performance” are activated

  3. CPU C-States
    no, what does this do? Will google …

  4. GPU-Z
    We’re using this, I checked the load, it was spread 60 to 30 on the 2nd card. Will check fluctuation

  5. more than one mainloop node: no

  6. hidden previews / renderers: no

  7. disabling cards
    with only one card dedicated to the running vvvv instance (from the nvidia settings tab) the GPU load goes up to 99% on this card. I tried to use both cards from two separate instances of vvvv, each with their own dedicated card: does not work.

  8. vsync is off

  9. sharing textures
    I tried running the patches separately in their own instances but the performance settled in the same range when having opened two of 4 visualisations, even when minimizing one of them, so rendering only to a small texture, or “putting them to sleep”

The weird part is this:
Each patch runs on 60 on its own.
"Surprisingly, disconnecting it afterwards does not help to recover the previous framerate.
Even deleting the second visualization patch completely does not help to get back to 50-60fps."

hey there,
there is no beta32.4. which version are you running?
34.2?

OK. I’d really go for disabling one of the cards via device manager or removing it completely.
In my experience multi gpu setups are a real pita, hard to setup / debug and generally not advisable.
To simulate the same performance requirements with one card, “just” attach two 4K Monitors.
If this solves your initial perf problems you can go for two multistream transport splitters or more professional ( more expensive) two datapath Fx4.

Also have a look at this somehow related thread:
https://discourse.vvvv.org/t/dx11-multiscreen-fullscreen-without-eyefinity-surround

Also have a look at this somehow related thread:

Thanks Bjoern, its actually not only related but the same issue because Dominik has been helping us defining the setup a while ago. Still having problems though.

be aware that non-evaluating keeps loaded things in memory still. even though it might give you quite a framedrop it’s worth trying to disconnect the rendergraph and setting spreadcounts to zero before disabling the patch. at least it would give you a hint whether it’s memory swapping that kills your framerate.

  • in case your gpu memory is not quite loaded but not maxed out. did you check how many resources you are (re)pushing to it each frame instead of just keeping it in memory there and reusing it? (null indirect drawing instead of direct instancing, fixed spreadcounts on dynamic buffers,…

  • another thing worth trying is checking the raw fps of the idle patches cpu and gpu together. Mainloop on raw mode, and fps to 600 or higher. set the renderer v-sync off and i guess it’s called ‘do not wait’ enabled (dx9 equivalent to immediate mode). knowing the idle and active cycle time per patch you knowh whether you are on a general performance limit already. you can go further and do that individually for cpu and gpu via timing node (preparegraph - render+present time)

  • even though your overall pixelamount isn’t high and it should be a bottleneck on the fillrate. how’s performance if you just use 2 outputs on the computer (still calculating the whole scene)? 2 computers with one card each if its not too hard to convert and sync?

  • with 4 separate instances, did you try running two rendergraphs on the second card and share it back? kind of a weird way of balancing the gpu load… but might work if the instances don’t have to be in sync.

sorry, yes, it’s vvvv_45beta34.2_x64

are you using shared texture in patch?

@woei @bjoern thanks for your help!

I will try out your suggestions and report back. The spreadcount zeroing & using 2 splitters on one card being the lowest hanging fruits to test.


EDIT

Ok, so I have managed to gain about 4-5 fps by doing the following for when patches go to sleep, so before I disable all shaders and groups:

  • reducing spreadcount in two of four visualisations using instance noodles instancer and splines
  • set particle system buffer from 8192 to 128 particles

Then I removed cards with the device manager and simulated double of the final output by attaching four screens, so
3440x1440 + 2560x1440 + 2560x1440 + 1920x1080 =
14.400.000 instead of 1280x720 x 8 = 7.372.800

result: almost solid 60Hz
:)

Looks like this (one card + splitter*) is definitely the way to go.

Thanks for the quick replies!

*just came out May 2016:
http://www.datapath.de/multi-display-products/datapath-fx4

616 MP/s
8K max


EDIT II

@id144 SetProcessAffinityMask worked well
I separated OSC Comms and a few other things from the main patch
= increase by 3-4 fps

@dimix yep, we share a 512x512 texture from another vvvv x86 instance (uses openCV) to the x64 instance.
As far as I remember the framerate was not changing significantly with or without the camera tracking/texture sharing.

well… you can try to run /dx9 and emulate camera input in the same patch.
I had similar with 34.2 with dx9 texture, but was able to repatch to 31.2 what solved the problem

We did this project at Expo2015 http://dotdotdot.it/en/portfolio/future-food-district/
It’s 6x1920x1200@60fps

From my previous experience with optimizing that project:

  • avoid large spreads, use compute shaders instead
  • get most powerful single core CPU, maybe even overclock with care (i7-6700)
  • check if disabling hyper-threading helps
  • don’t assume multiple cards will magically split the render task.
  • write custom plugins for parts which handle lot of spreads
  • debug timings
  • run separate instances and use shared memory values and shared textures
  • use different framerate settings for the separate instances, some parts of the projects need to run 60fps(visuals), some are OK with 5fps(dataprocessing)
  • change mainloop fps dynamically according to what is needed
  • avoid render targets (rendering into textures) as much as possible.
  • if using rendertargets, check if you can go away with less precise texture (ie. R32G32B32A32 > R8G8B8A8)
  • compare x86 x64 performance, often x86 will give you few extra fps. if the pack you need is x64 only, try to recompile for x86

Usefull topic about optimization tricks
thanks

One more trick - use SetProcessAffinityMask, dedicate two separate cores for the process that is FPS sensitive, let all the other processes use other cores.

i’ve also seen the bahavior descibed by @Cedric on VR projects realized with dx11 lately. i hope @vux has an idea whether it could have something to do with dx11 resource handling.

hey everybody. After some days of patching we managed to get if from 27 frames to steady 40, some patches even 60 now. There was not that one solution that solved it but a lot of the small tips helped to constantly improve it. What i can definitely say and what is an important learning, two cards are not necessarily better than one. maybe even worse. So if you have the option and dont need the full resolution of 2 cards and its more about the number of outputs, go with one card and a splitter instead. Also SetProcessAffinityMask helped to get some more frames by separating the visuals from the rest.

Thanks again!

also, if you haven’t already, if the patch is finished go into the middle click menu and disable Update View… also gives some frames.

also check for all nodes that might block the mainloop, these are mainly renderers (enable ‘Do Not Wait’ DX11 or present to immediately for DX9) and videotextures (wait for frame = 0).

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.