VL.MediaPipe

joreg · December 18, 2023, 12:31am

I was curious to learn what this MediaPipe thing is that everyone is talking about and stumbled upon mediapipe-touchdesigner by Dom Scott and Torin Blankensmith. The way they implemented this for TouchDesigner made it possible to use it the same way in vvvv. So full credits to them!

To see what’s possible, watch their intro video:

Status: It runs the models and receives their results as json. For Face, FaceLandmark and Pose there is a node to config the models and receive an XElement back holding all the info. For the remaining models such similar nodes need to be built still. Then the big question is how to best return the data (instead of XElement) for each model so it can be most conveniently accessed. Any thoughts anyone?

Get the NuGet: VL.MediaPipe

Hadasi · December 18, 2023, 9:39am

Not compatible without the unreleased master of Stride

kikohs · December 18, 2023, 11:54am

I made some python code that would compute usual joint angles as a triplet of keypoints.

Based on COCO-keypoints, something like this

    associations = [
        (5, 7, 9),   # Left elbow
        (6, 8, 10),  # Right elbow
        (11, 13, 15),  # Left knee
        (12, 14, 16),  # Right knee
        (5, 11, 13),  # Left hip
        (6, 12, 14),  # Right hip
        (3, 5, 7),    # Left shoulder
        (4, 6, 8),    # Right shoulder
        (5, 0, 6),    # Neck
        (13, 15, 16),  # Left ankle
        (14, 16, 15)  # Right ankle
    ]

Per frame you would get these values.

In Cables I have another custom OP that takes a dictonnary of metrics that are computed on the fly per frame or couple of frames (centroid, distance, angle, etc.). Looks like this.

Maybe this is inspiring for vvvv.

const DEF_METRICS = `
[
    {
        "joint": "leftHip",
        "computations": ["pos", distance"]
    },
    {
        "joints": ["leftHip", "leftKnee", "leftAnkle"],
        "computations": ["angle"]
    },
    {
        "joints": ["leftShoulder", "rightShoulder", "leftHip", "rightHip"],
        "computations": ["distance"]
    },
    {
        "joint": "rightKnee",
        "computations": ["distance"]
    },
    {
        "joints": ["rightHip", "rightKnee", "rightAnkle"],
        "computations": ["angle"]
    },
    {
        "joints": ["leftHip", "rightHip"],
        "computations": ["rotation"]
    }
]
`;

bjoern · December 18, 2023, 12:18pm

What about: Universal Skeleton Bible

joreg · December 18, 2023, 12:53pm

please try again now. there is now a 0.0.5-preview available.

schlonzo · December 18, 2023, 2:54pm

unable to select my connected camera
alpha 0.0.5 preview

joreg · December 18, 2023, 3:51pm

@schlonzo hm… the possible cameras in this case are reportet from the web-app that is running under the hood. so the question would be why it reports that no cameras are available. VL.MediaPipe 0.0.5-alpha relies on VL.CEF.* >=0.5.3 and installs it. please doublecheck that it is installed and make sure any older versions are removed from your system.

schlonzo · December 18, 2023, 3:59pm

seems these two VL.CEF Versions were installed today.

joreg · December 18, 2023, 9:21pm

that looks good. version 0.0.6-alpha has a few debug messages added that hopefully help us shed some light on this. please do the following:

run vvvv latest preview
press ctrl+F2 to open the debug windows, then switch to the Log
there set the Severity pulldown to “Debug”
rightclick the Debug Filter toggle to solo it (so we only see messages of type debug)
then open the Mediapipe helppatch
you should now get a bunch of messages. please send a screenshot of those.

motzi · December 19, 2023, 12:56pm

works here with 0.0.6 and it looks really useful!

one thing i noticed: the webbrowser running this is clearly being executed on the iGPU of my laptop (telling this from the load displayed in the task manager). any hint how to make it run on the dedicated GPU instead?
(performance seems to be ok even even on the internal one but i’m just curious)…

joreg · December 19, 2023, 1:18pm

please try this: VL.CEF/README.md at master · vvvv/VL.CEF · GitHub
EDIT: and please use upcoming 0.0.7-alpha with it.

bjoern · December 19, 2023, 1:32pm

If tracking isn’t working with your webcam you can try SpoutCam which works for me.

image1209×880 44.6 KB
The camera image that comes from CEF / WebBrowser is somewhat scaled and translated. Thus the landmark positions don’t match the image. They match the “original” camera texture.

CamOffset1181×666 68.8 KB
The tracking result is about 4-5 Frames delayed compared to the original camera feed idk if this is caused by SpoutCam.
Extracting the postions from XElement is quite costly.

image1113×292 8.77 KB

–
The attached patch might not work out of the box because ApplicationPath is used inside MediaPipe to set the Content Base Directory. You can copy & paste the stuff into the original help patch.

HowTo Use MediaPipe - Edit.vl (56.6 KB)

bjoern · December 20, 2023, 9:25am

It’s because the WebBrowser / ToStrideRenderer scales with the window so if the window doesn’t have the same size as the video they don’t match.

joreg · December 20, 2023, 10:22am

such issues should mostly be fixed with 0.0.10-alpha.

good point, this is also fixed with 0.0.10-alpha.

amir · December 20, 2023, 6:42pm

On 0.0.10-alpha and gamm 5.3-0414 I don’t see the image texture anymore.
RenderWindow is only gray!

The Timing is running and even see the Xelement of the FaceDetector’s score changing…

The VL.CEF.Stride version should be 5.3 right?

joreg · December 20, 2023, 7:21pm

indeed i changed a default behavior here: if this happens you can now either follow the troubleshooting instructions i posted to motzis question above, or enable the hidden “…Shared Texture…” input on the MediaPipe node.

joreg · December 20, 2023, 7:33pm

update: 0.0.11-alpha comes with improved parsing: Face, FaceLandmark and Pose nodes now have properly typed outputs. still not perfect, but should be usable. thoughts?

bjoern · December 21, 2023, 10:08am

There are some typos like sometimes using camel case and sometimes not.
Getting the positions of the FaceLandmarks for example is way cheaper now than before with XElement, but still I think there should be Reactive versions of the nodes to be able to do that “extraction” off the mainloop.
The nodes should check for valid inputs, otherwise one needs to have IsAssigned / If combos all over the place.

joreg · December 21, 2023, 11:45am

update 0.0.12-alpha:

fixes many properties to camelcase
FaceLandmarksDetector now returns a spread of landmarks directly
@schlonzo @karistouf this provides a workaround if you have the problem with no webcams showing up in the enum: the MediaPipe now has a hidden “Websocket Port” input which you can change to a free port.

karistouf · December 22, 2023, 8:52am

Hum, spout now, no camera, no hidden socket. is it working with gamma 5.2 ?