Vram usage

ggml · October 21, 2015, 4:46pm

can someone please explain when vram becomes a bottleneck in theory and in v4 praxis?
i.e.
case1: 4 triplehead extenders on a 5-head gpu + a mapping pass for each display before fullscreen
case2: ~heavy passes like glow/metallica/antialiasing
case3: more mre resolution

thanks

ggml · October 22, 2015, 12:20pm

also, in what way can sli add to gpu performance in v4 ?

antokhio · October 22, 2015, 12:52pm

hi, as far as my experience goes, vvvv gonna heat up only GPU with Set as Main Display mark. All the rest GPU’s gonna bee in slave mode just streaming data from one to another. So doin heavy post, mrt and whatever u can expect only main GPU to go crazy.
sli reduces performance with vvvv badly on my experience.

vux · October 22, 2015, 2:33pm

Well as a simple answer, all the cases above ;)

Simply, any new resource cost memory, more resolution = more memory and more post processing = more memory, all those are obviously also linked, so at some point you reach a level where you don’t have enough vram anymore, it’s not easy to calculate so you have to use some decent test setups and profile, then decide what is the limit depending on your setup, really not so possible to do it for you ;)

Some things to take into consideration memory wise (speaking dx11 here).

Switch (Node) and disabling group will block the render path, if a resource has not been used in a frame it is generally freed (unless you use “keep in memory”).
So you can swap resource allocations that way when using multiple scenes (but you might have a glitch when swapping scene as it will have to reload, so again, trade off you have to test).

Many intermediate resources are pooled, so changing main resolution will not destroy them, in Info (DX11), you have a Clear Unlocked input bang, so you can trigger it from time to time (ideally right after changing main resolution)

Small resources cost more than many people think, due to the way GPU allocate memory , so 100 quads will cost more than the 1 kilobyte of memory per quad that you’d expect.

Dynamic resources, input textures (from kinect/video/and so on), also cost more than people think, generally expect 2 to 5 times the memory footprint from a single texture.

All shader inputs obviously also accumulate, as they are stored in constant buffers, same rule as small resources + dynamic also applies there.
Actually constant buffers can accumulate even more than many people think, so calling the same shader 500 times will cost much more memory than calling it 5 times.

DX11 doesn’t free the memory right away, since your cpu and gpu run concurrently,
that means that for example, switching filename on a filetexture will not free that texture right away, so for some time you’ll have both textures in memory at the same time (generally it will kill it at the end of the frame, but can be later, or in some cases it can also be within the same frame).

And of course you have all the parts that the card needs for itself (shaders, pipeline states, intermediate memory…), that also can accumulate and count at the end.

In general I tend to try to have a spare of half gigabyte of vram as buffer when doing project (assuming a 3-4 gig card) on my projects before main setup (which also allows to add all the “last minute content” and not already be at the limit).

One very useful thing to do to help (if you can), is to use block compression (bc1 to bc7) for any texture resource (make sure to enable “no mips” on filetexture in such case). That allows to get faster loads, and generally can also increase performance in case you already have a lot of memory pressure (you trade a few instructions for much less bandwidth).

Also if you use dynamic buffers that don’t change, make sure that you set apply to 0 (will also save you cpu time on the way). It seems obvious but that’s easily forgotten.

Check

For command line conversion tool

About SLI it has some use cases when you have gains and cases where you have losses, so there’s also no single answer either.

Mostly you’d want to limit the memory transfer between cards, since that’s the main bottleneck, and the ideal scenario is to finally have eventually cards doing alternate frames rendering.

So if you use RWBuffers (or do anything where you depend from previous frame) this scenario is obviously not possible anymore, and you will likely lose performance.

Only small amount of games support SLI for a simple reason, because it’s hard to do properly, it requires much more work than buying 2 cards and click : SLI (or Crossfire). So you might want to start getting deep in pipeline (and often for proper support you might want to rely on vendor specific features).

Hope that helps )

Noir · October 22, 2015, 5:01pm

vux docet… no text …

ggml · October 23, 2015, 5:22pm

thanks for the extended info/guide