I am looking for a way to deploy my pytorch models in Gamma. I did some tests using python and cv2, but it seems my image pre- and post processing kills my CPU. GPU is near idle. So before optimizing all that by myself I wanted to ask, if someone already has some kind of blueprint for that.
So basically I have a model, weights and my config for annotations and stuff.
All those CPU tensor transforms could be done as shader, so I guess I need some kind of csharp
node that takes a stride texture, feeds it into my model, reads out and outputs texture.
Did anyone of you guys already play around with this idea? I am looking at you @Hadasi :)
I am using DeepNetV3 with different ResNet Backbones for segmentation atm.
Firstly, there isn’t a way to go from texture to model without copying to CPU
Even so I was thinking about your question this morning and there are a couple of things we need to do to pre-process your camera input. The first issue is that the layout of the data needs to be ingestible for the model to process it properly. The data’s layout needs to be constructed right and from either OpenCV’s video in or MediaFoundation’s video in, they’re both wrong, but OpenCV is less wrong.
The model would expect a 2x2 image to arrive as RRRRGGGGBBBB but from OpenCV it arrives as RGBRGBRGBRGB.
From a texture its a little worse, because I haven’t found a way to change the texture format to anything other than RGBA or RFloat, so we get RGBARGBARGBARGBA. Maybe with a compute shader we could remove a alpha channel to read back RGB, but how do we transpose the data for the correct layout? I don’t know if we can do that in compute shader, but we can use a small onnx model to do that. It isn’t how i do it at the moment so I could sanity check the output, but we can keep this process in the GPU and pass it on to your actual model. This is as an OnnxValue
Onnx runner does the processing and returns a tensor or an array. From the Onnx runner you’ll get the tensor 4,28,28. This for me is the most expensive part of the process - the data getting copied to the cpu. If I remember correctly you can think of the of the tensor like a spread of spreads, taking the first dimension and slicing the result to return a 28x28. I think you can read this tensor data as a mutable array. This can easy be made into an OpenCV image, and probably a Texture too.
I spent quite a lot of time getting it to work with DirectML to avoid the Nvidia Cuda, so any GPU (even Intel’s) should be able to accelerate the process, but it does require a little bit of a hack for vvvv to see the correct dll.
All of this is a bit moot, because all that work i
s on a harddrive that I don’t have to hand at the moment.
I am flexible regarding input since I want to use a camera. IDK why I was that focused on texture in my last post. Processing from RGBRGBRGBRGB to RRRRGGGGBBBB, I believe, could be done with an RWByteAddressBuffer in CS. Readback to CPU with frame-delay would be OK. I can live with that.
I would really like to avoid that last part. reading back from GPU, processing on CPU and then uploading again. Ok, I believe if I want to do something like that, I would have to implement directX functionality into my node. that’s a tad too hardcore for me.
For offsetting channels like that, you would prolly use a vertex shader, a grid with point for each pixel, a render mode set to point, antialiasing set to off, and then you would move uv. But you gonna get rgba on the output unless you write a consumer for a custom format… (no clue how to do that)