Beta DX11 VS Gamma Stride GPU instancing performance

Out of curiosity I tried to compare both like this :

Same shader for both, simple instancing with only one color.

I believe those two patch are the same, except everything that is going on under the hood (which might be quite different).

First comparison is with both editor side by side. As Gamma editor is using Skia the GPU usage is higher (but a lot !)

And finally after exporting to .exe Gamma/Stride still 400% higher but memory is half. (9.2 KB)

PS. Hey @vux , you’ve earned a 400% Legendary Badge :)

To compare the code paths more equally, make sure you use RenderWindow instead of SceneWindow and set the instancing count to a very high number, at least 10k, and make sure all matrices change in every frame.

And yes, you have to export the gamma patch to avoid other overhead.

You can also look into the Stride profiler to see how much time the actual draw call is using. And I believe in beta dx11 there is a query for the draw call time too.

Ok (8.9 KB)
10000 instances with position changing every frame.



the beta Renderer has no depth buffer set, you need to enable that in the inspector. and the pixel shader seems to take more time than the vertex shader in this simple patch, so window size matters a lot.

if I set them to the same size on my 4k screen, they both use exactly the same GPU resources with 100k objects, about 7% on my RTX 3070 mobile.

every other outcome would be quite strange since they both use DX11 as the backend, have the same shader, and do the same draw calls.

the only differences are on the CPU, where beta is faster as it doesn’t allocate memory for the large spreads in every frame, if you use an allocation-free approach in gamma, it is the same CPU utilization too, about 12% on my laptop with 100k objects…

how would this look like in gamma ?

I guess don’t use spreads with large spreadcounts / avoid using output splicers in loops.
Gave it a try.
InstancingTest_b.vl (74.2 KB)

yes, quite similar to björns patch. mine looks like this:

Unfortunately output splicers and the spread generator nodes both allocate new memory when animated. you won’t notice with lower spread counts but at 100k it is quite heavy, having gen2 collections every few frames which block the main loop.

it also uses the Translation node which doesn’t do any trigonometry for rotations and no matrix multiplications.

the patch above avoids both and re-uses the memory of a MutableList which doesn’t get downsized by the .NET runtime when you clear it.


Ah didn’t even think about directly connecting the Builder to the DynamicBuffer…

Does this happen with SpreadBuilder?

Instancing_b2.7z (15.6 KB)

1 Like

No, it should also keep the last size as backing storage. Clearing it will just set the count to 0.