I am using an append buffer in a compute shader, and the amount of data being appended depends on processing the incoming image. I have “Reset Counter” and “Appendable” set on the Renderer, but the count of the data output sticks at the highest level seen.
I am using DX11.BufferReadbackDynamic to get the data from the Renderer, which is what is reporting the data count.
It is my understanding that setting “Reset Counter” true should reset buffer count to the “Reset Counter Value” each frame, which I have at the default of zero - is that not the case?
However, I noticed if the “Element Count” pin is even just disconnected/reconnected (so the value does not change) the output count resets to zero and starts sticking to the highest value seen again.
I tried a hack workaround to toggle “Element Count” by one each frame to force a count reset, only to find no data output at all, a result apparently of an “Element Count” bug described elsewhere.
Poked around a bit more, and what seems to be happening is that the buffer is cleared initially, and not afterward. The actual element count is always the number specified into the Renderer. The count I was seeing getting stuck at the high point was the number of elements that had actually been written to at some point.
The Appends in the compute shader start at the beginning of the buffer each frame, which is cool, but how on earth to know how many appends were actually done each frame?
Append buffers are fixed size, so the real (or maximum) buffer size is always the one specified in the Element Count pin.
Append buffers hold an internal counter which is then used for processing only “valid elements”.
To retrieve that number -> CopyCounter + ReadBack (DX11Buffer Raw), with a size of 16 with give you how many elements got appended.
Changing element count size every frame is a pretty bad idea (actually it should not really change at all for your whole application lifetime, except at design time of course), since you recreate resource every frame, which will at some point lead to memory fragmentation, and cause some potential slowdowns.
I’m a tad confused. I hooked up a CopyCount node, and then a ReadBack (Raw) node, but the ReadBack node has no output pins! What am I missing?
Is there a way to access that counter inside a dynamic plugin? I checked the available elements for a IDX11RWStructureBuffer and only see ElementCount, which is why I thought it must be storing the count. Should another buffer data type be used? I tried some names to no avail (IDX11AppendStructuredBuffer, etc.).
I am using B32, and the latest DX11 pack. Thanks vux!
you need to tell the readback node what kind of structures it should output via inspector.
like “float3, float4, int” … depends on what you’re calculating in your above shader. (and don’t forget to set a proper stride on the upward Renderer.
Thanks sebl, that was pretty non-obvious! I saw the “format” hidden field, but it is not a pull-down and there is no help patch. Putting in “int” did the trick!
So there must be a way of getting this directly in a dynamic plugin, though. Looking at the source for CopyCounter is revealing, and certainly not just a simple structure reference.
But this works for now, thanks all!
Indeed readback will get helppatch in next release, but basically you need to set the layout, since buffer is just a raw gpu stream there’s no what to know which type of data it contains.
To get it in a dynamic plugin you do exactly the same as in copy counter (since counter is not stored in the buffer, but in the UnorderedAccessView)
Important command is : CopyStructureCount
Then you need one gpu buffer (for copy), and one staging buffer (to copy back to cpu).
Once you copied your counter in your gpu buffer, call CopyResource from gpu buffer to staging buffer.
Finally you need to call staging.MapForRead, to get read access to the staging buffer (don’t forget to unmap when you’re done!).
Please note that calling MapForRead will flush and stall GPU, so that can have some impact on performance.
Out of interest why do you need to get counter back to cpu?
Thanks vux for the detailed explanation. Duly noted about the GPU stall - is that also happening with CopyCounter?
I need this as the shader can have a variable size output. I’m converting a dynamic plugin I made for taking a depth camera depth image, converting the depth to XYZ, applying a camera transform to it to get world-relative data, then applying bounding boxes for the interaction areas (with per-BBox sub sampling). This allows me to dynamically focus on areas of interest (such as moving hands) and easily combine point clouds from multiple cameras.
I’m seeing conservatively a 10x speed up moving this code to the GPU. I’ll roll this into the Kinect nodes when I get it all working. Oh, and as I still use Primesense cameras, my next move is to convert the OpenNI nodes to DX11 to save the texture conversion, which appears to introduce a frame lag.
I am writing some indices within a compute shader into an AppendStructuredBuffer.
Now I want to access these indices in a further computeshader. At the moment I am using the copycounter/readback solution you mentioned above to get the count of appended indices. Then I use this count to setup the Dispatcher.
Unfortunately this is very unperformant because of the readback.
Is there a way to access the AppendStructuredBuffer without dynamically setting the Dispatcher Threads? Do I have to utilize DispatchIndirect? And if yes - how?