ReadBack performance - what's causing this?

I was just investigating what was causing a significant performance impact of a readback, that I didn’t expect regardless of a ReadBack’s nature.

So I compared it to the Pipet CS (~ ReadBack around 400 ticks), while mine was at 8000.
Same stride for the Backbuffer … I event pasted the Pipet code over into my shader and it stayed high.

Eventually I discovered it changed once I swapped out my input buffers for a simple DynamicBuffer.
Once I connect one of the other buffers back to my shader the performance drops, even though the executing code doesn’t even access that buffer.

And it’s not one of these cases where the tick number just pops up on a different node, because I can see the performance drop …

it’s technically not really “performance” you are measuring in this case, but the downtime the cpu has to wait for the gpu
so depending on what else the gpu is doing, this can vary dramatically even within an unchanged patch

Let me rephrase and describe the case that confuses me the most here:

I have a buffer input to the shader via StructuredBuffer Foo;
I don’t even access that buffer in my code - I may just write the thread id to the backbuffer.
If I don’t connect anything or a dynamicbuffer with 1k elements, everything is fine - ReadBack stays as low as ~400, CPU doesn’t wait much.
If I connect one of the other buffers in my patch to the same input and still don’t access the buffer, cpu waits ~10 times as long.

I think this is complexity problem eg. Most likely there is events that you have on gpu like on render started on render progress on render finished etc. Then if you think about that in the global way, anything in dx11 is resource, and produce resource (some kind of memory stream) so you can read back the resource only if it releases the stream, so it waits until the stream is blocked finishes… so more streams, more waiting time… not sure that it’s correct, tho that’s the feeling out of that… Depends on implementation but I’m sure since there is no management for resources for end user it would wait until all threads are finished and read back basically after end of frame

1 Like

Yeah, since the shader upstream that’s outputting those buffers is pretty complex (and linked to even more shaders) this was my feeling as well somehow …