Starting with vvvv for a Kinect2 project - need some advice

Hi there,

I am totally new to vvvv (well, played with it for the last about 6 weeks or so) and want to create a small private project using a Kinect2. Maybe at some point I can make it professional and public, but for now its very experimental and I want to see if I can make it in vvvv and don’t need to jump on Unity instead, as vvvv looked more promising.

I have programming background, and 3D and compositing (eyeon/BMD Fusion) is fine as well. My project works already perfectly fine in Fusion, but not realtime. Doing it in realtime would be fantastic. So proof of concept is done.

I managed to get live depth data out of the Kinect2 and manipulated the image in realtime, but I am stuck with some problems, for which I didn’t find a real answer so far. Trying to find tutorials or the like lead me not as far as I needed, so I think now I need to ask here the “pro’s”. Its fairly simple things, but maybe I haven’t yet got the concept of vvvv right.

The Kinect2 nodes output only DX11. I wonder how I can convert the DX11 depth for instance into a grayscale image, which I can then manipulate using e.g. a noise field or similar?

I haven’t understood how I can transform from an DX11 texture to a 2D image for further processing. Is that explained somewhere? Sampe patches would be most interesting, as this is the way I learned everything so far…

I managed to apply a noise in DX11 instead, but I was not able to scale the noise 2D at all - but I want it to be bigger, to reduce its detail.

And how can I mirror the Kinect2 output to match the real world? I tried scaling with -1 on x axis, but I get black screens then… Which node is the right one? How?

Is there a way to make a freeze frame from the depth channel, so I can substract the live image from the freezed depth channel for further processing? That would be important for me.

Can I safe a picture or record a video to disk?

How can I get a value of a pixel/depth so that I can modify the image e.g. based on the measured depth?

I found that I can directly modify fx in c# and that helps for some things I need, but coming from Fusion compositing, I am stuck as of how to transform from 3D DX11 space into 2D screen space and reliable “sizing” such as 1920x1080y images etc.

I also stumbled of diverse demo patches in tutorials, which do not seem to work as they should in the 50 Beta 35x64 version I use. Should I go back to an older version then?

Is there any documentation on the DX11 stuff beyond this page:


Looking forward to some insights from you!


hello and welcome,

haven’t worked with kinect for a while, but a 2d dx11 texture is exactly what you want to have. there is a huge library of texture fx nodes in the DX11 pack that can help you to manipulate the texture with lightning speed on the GPU. open the node brwoser and type “dx11 fx” to get the list. also here is an intro video on how to write your own:

to apply noise for example, you would create a second texture input and add it to the first input. to adjust the strength of the noise you can create a float input to scale the noise before you add it. you can of course also clone an existing one and modify it.

also this might contain quite some helpful stuff:

1 Like

For recording and playback of Kinect for Windows device stream you may want to use this set of tools

Or work with Kinect Studio, which is part of Kinect 2.0 SDK. It saves you lot of time.

1 Like

Thanks for the links. I’ll have a look into this today!

So am I understanding this right:
You would not try to leave the DX11 “world” and convert to a standart 2D image/texture, as this is slower and the available effects should be the same?
If so - makes sense… :-)
If not, could you drop a line?

Thanks, I’ve stumbled over this last night too, just after I wrote my questions here.

I wasn’t aware that these tools before, thought its just a driver.

I was able to record a stream and play it back in the tools. Am I getting this right: With this I do not even need the Kinect2 being connected and can “emulate” it?

Cool for working on the project “on the go”… in bus/train.

I hope its fine to ask further questions, if I am not gettings this right…

Exactly, you don’t need Kinect and with playback from Kinect Studio all the applications that use Kinect will behave like the sensor is attached. The application is not even able to tell, whether the stream is arriving from a physical device or a playback.
XEF playback is much more than playback of the frames.
Just get the VVVV.DX11.Nodes.Kinect2.dll as described in the help patch of the Playback node, current compiled distribution of DX11 pack works only with Kinect physically attached.

1 Like

Hi There,

working right now on my project.

Playing/Recording Kinect2 data works fine, helps a lot!
I guess I need to write some custom functions as it seems. When I modify existing ones, they are recognized immedately after saving, but when I save them with my own naming the don’t show up. I can’t find the link to where its explained. I saw it ones, but missed to note it :(

As I only want to use the true depth from the Kinect2 I am fighting these issues mainly:

  • rotating/mirroring the image
  • scaling
  • the depth shows contours, while I would prefer what I suspect is RAW-Depth. However, RAW-Depth only gives a pure black image. Is there a way to really see that RAW-Depth say as a grascale image?
  • Is there any way to convert from DX11 to a normal 2D image?
  • Where do I find the OpenCV installer?

I know, a lot of simple questions for the cracks, but I can’t find a solution right now.

Hoping for comment!


most likely you need to change Sampler state from linear to point

lineardepth TextureFX

you can write to dds sequence if you have ssd it works realtime

How and where do I need to change that Sample state?

I have the Kinect2 node feed into the Depth or RAW Depth node. I can’t see where I can change this?

I have tried the lineardepth TextureFX too. How does it work?
I can’t see the resulting image change at all when feed into a Preview, regardless of which parameter I change.

Is outputting to DDS the ONLY way to get from DX11 to a layer that I can process with OpenCV effects for instance?

What is the actual difference between the Kinect2 Depth and RAW Depth nodes?

Right now, the Depth only gives me a countoured Z-Buffer and RAW-Depth appears to be fully black.

What I want/need is the true depth image seen by the sensor WITHOUT the processed contours, so like a Z-Buffer only. Then I want a filter to adjust black and white point and the gamma between them, pretty much the “Level” tool in Photoshop.

How can I do this?

alright so:

the sampler state is part of the shader code you can see inside of the shader

RAW Depth, uses special format R16_Uint, it’s not gonna work with any regular shader, because instead of Samle you have to use Load function in the shader. Therefore that format cannot be displayed straight, only converted.
On the other hand Dept is outputting format R32_Float, and i suspect it is in 0 - 1 range, so to make it seen linear Depth should be on DepthTreshold Min 0 Max 1

I suspect there are no easy way on this, since OpenCV processing is on CPU and texture is a GPU resource this might be accomplished only by streaming back GPU mem to CPU, maybe shared mem with Syphon can help here, or you can also readback the data but it’s slow… You might look on to the image pack contribution there might be something like AsTexture DX11 to OpenCV

Alos basically nobody have a clue what is “OpenCV effects” you are talking about, prolly few screenshots to your project would cost a thousand words…

Also, depth is contoured, it’s called occlusion. There is repair depth contrib you can try…

here a little demo with RAW Depth and plain Depth (432.8 KB)

1 Like

Thanks for the help!

I still get only pure black when using RAW Depth. From your sample I can see a more B&W depth image which seems to have a higher contrast and less gradient detail. However, it also shows the contours of the objects.

I wonder if the ToF camera shouldn’t see a simple grayscale image (when we interpret depth as grey) without any contours. The problem I have is as follows: The contours are black, so they seem to be at a different depth. I want to substract the ToF depth of an empty room with the ToF depth of a filled room to extract the “fill” as a difference. The contour sacrifice the result completely, especially as its pretty noise and jumping. If if fight against it with blur, its totally useless.

Any idea if there is a way to get this done?

well there are no accurate solution on the holes in depth image…
this one prolly the first one…

then you can also try the trick when renderer clean pin is off and balck pixels of the texture are discarded in the pixel shader, so it’s kinda keeps whatever was inside of the black area…

1 Like


thanks for that link, that is pretty close to what I need (while I still wonder why the sensor does not show exactly such footage “out of the box”).

Now I need to figure out how to find the lowest/highest depth value in the image or an image I calculate from that. I thought I could use the DepthPipet but it gives me three values instead of one, two fixed ones and the third is jumping around. I just want the min value or max value of all pixels, which is the closest and the farest point.

How can I do that?

Well depth pipet gives you world cords of the positions of the object from a depth image and unless you feed it with correct view projection matrix of Kinect sensor (witch i’m pretty sure you dont) the result you are reciving is incorrect.

For that you have to do a compute shader or a pixelshader witch will analyze all the pixels and find the brightest and darkest. So then you can readback…

I’ll try to make you some example…
You want like a current frame min max or min max during the cycle (e.g. take previous frame min max in to account?)

1 Like

I am need either the nearest and/or the farest pixel in the current frame. If the shade has two outputs with a single number each it would be easiest, so I could directly use it as input for an expression, like performing another effect depending on the distance.

My goal is to capture the “empty” room once (like a frame grab) and calculate the difference to the live frame grab, which is not exactly “substract” but more a “compare”. When the compared values are within a close tolerance they show the background, so the result should be “empty”, while points that are strongly different will output that new point (usually closer to the kinect2, as its new objects, so potentially the max value between old and new pixel would do in that case).

The result should be the new objects (not necessarily persons, but anything that changed). I may also output a more or less black and white silhouette if it is a background old or a forground new pixel. The silhouette I get from the kinect2 directly is basically ok, but way too late, as it needs some seconds to detect e.g. a human at all. Further, its limited to humans, it will not detect a dog or chair or box or anything else that changed.

Then I want to apply this distance measure on the resulting difference image to see how close the new objects are to the kinect2, to apply a depth dependend effect on the entire image.

So I need a shader that has three inputs: 2 depth images old capture and live frame capture, and a tolerance value
As a result it should output the image with the new object only or the live image and an alpha channel for instance or only an alpha channel. I guess anything would do in the long run, as I could use then other shaders to move further. Also the shader should output the value of the nearest and farest points of the new object (so the mask applies for this calculation).

Thats my goal!

Hi i did try to hack up something quick but it doesn’t use parallel properly takes like 90000 ms per frame which is’t good
I need some more time to sort that out, prolly middle of next week or something here maybe you can still use it before i can fix it properly… (218.7 KB)

1 Like


I will give it a try later today. Beside making it work in parallel (which would be really perfect) it may help to add a stepping value such as to check only every n’th pixel in y and x axis, so we skip a lot of calculations. I believe that objects won’t be as small as a single pixel in most cases and that the depth wouldn’t vary too much between neighboring pixels. However, this may result in more noise than tracking really all pixels in parallel. I guess to make a good comparison I will slightly blur the images before the processing, to overcome a bit of the edge noise.

The filter trick to get rid of the black contours works pretty well - as a side node. Without, this would never work at all.

Thanks a lot!!!

Looking forward to what you come up with. I think a lot of Kinect2 users will love this, as I can imaging that this is often useful.

CS_MinMaxDepth (DX11.Texture 2d).zip (220.3 KB)

that should do it in real time

1 Like


I am still fighting the issue of combining the “empty” depth freez to the live depth.

Here is what I want to do:

Input1 = freezed empty room depth (Kinect2 recorded)
Input2 = live depth (Kinect2)
Tolerance = 3%
for each pixel:
if (Input1>(Input2-Tolerance/2)) and (Input1<(Input2+Tolerance/2)) then
pixel is within tolerance to freez frame, so its “background”
set pixel depth to maximum
set pixel depth to value of Input2 (live depth of Kinect2) as its forground
end if

I tried to modify e.g. the DX11 Mix texture effect, but can’t get it right, always syntax errors.
With the above working, you get the live moving obects seperated from the frozen background and extracted for further processing.

Help highly appreciated!

Use writer dx11 texture write few frames to dds, one empty, one with objects, post zip with patch and shaders I’ll take further look using your params. From what you wrote, I think you need simple lerp… mix is bit overcomplicated for your case… just clone tfx template do:
float4 c0= tex0.Sample(sampler,texcd);
float4 c1=tex1.Sample(sampler,texcd);
float4 col = lerp (c0,c1,tex3(sampler,texcd));
return col;

1 Like