VL.MediaPipe

I was curious to learn what this MediaPipe thing is that everyone is talking about and stumbled upon mediapipe-touchdesigner by Dom Scott and Torin Blankensmith. The way they implemented this for TouchDesigner made it possible to use it the same way in vvvv. So full credits to them!

To see what’s possible, watch their intro video:

Status: It runs the models and receives their results as json. For Face, FaceLandmark and Pose there is a node to config the models and receive an XElement back holding all the info. For the remaining models such similar nodes need to be built still. Then the big question is how to best return the data (instead of XElement) for each model so it can be most conveniently accessed. Any thoughts anyone?

Get the NuGet: VL.MediaPipe

11 Likes

Not compatible without the unreleased master of Stride

I made some python code that would compute usual joint angles as a triplet of keypoints.

Based on COCO-keypoints, something like this

    associations = [
        (5, 7, 9),   # Left elbow
        (6, 8, 10),  # Right elbow
        (11, 13, 15),  # Left knee
        (12, 14, 16),  # Right knee
        (5, 11, 13),  # Left hip
        (6, 12, 14),  # Right hip
        (3, 5, 7),    # Left shoulder
        (4, 6, 8),    # Right shoulder
        (5, 0, 6),    # Neck
        (13, 15, 16),  # Left ankle
        (14, 16, 15)  # Right ankle
    ]

Per frame you would get these values.

In Cables I have another custom OP that takes a dictonnary of metrics that are computed on the fly per frame or couple of frames (centroid, distance, angle, etc.). Looks like this.

Maybe this is inspiring for vvvv.

const DEF_METRICS = `
[
    {
        "joint": "leftHip",
        "computations": ["pos", distance"]
    },
    {
        "joints": ["leftHip", "leftKnee", "leftAnkle"],
        "computations": ["angle"]
    },
    {
        "joints": ["leftShoulder", "rightShoulder", "leftHip", "rightHip"],
        "computations": ["distance"]
    },
    {
        "joint": "rightKnee",
        "computations": ["distance"]
    },
    {
        "joints": ["rightHip", "rightKnee", "rightAnkle"],
        "computations": ["angle"]
    },
    {
        "joints": ["leftHip", "rightHip"],
        "computations": ["rotation"]
    }
]
`;

What about: Universal Skeleton Bible

please try again now. there is now a 0.0.5-preview available.

2 Likes

image

unable to select my connected camera
alpha 0.0.5 preview

@schlonzo hm… the possible cameras in this case are reportet from the web-app that is running under the hood. so the question would be why it reports that no cameras are available. VL.MediaPipe 0.0.5-alpha relies on VL.CEF.* >=0.5.3 and installs it. please doublecheck that it is installed and make sure any older versions are removed from your system.

image

seems these two VL.CEF Versions were installed today.

that looks good. version 0.0.6-alpha has a few debug messages added that hopefully help us shed some light on this. please do the following:

  • run vvvv latest preview
  • press ctrl+F2 to open the debug windows, then switch to the Log
  • there set the Severity pulldown to “Debug”
  • rightclick the Debug Filter toggle to solo it (so we only see messages of type debug)
  • then open the Mediapipe helppatch
    you should now get a bunch of messages. please send a screenshot of those.

works here with 0.0.6 and it looks really useful!

one thing i noticed: the webbrowser running this is clearly being executed on the iGPU of my laptop (telling this from the load displayed in the task manager). any hint how to make it run on the dedicated GPU instead?
(performance seems to be ok even even on the internal one but i’m just curious)…

please try this: VL.CEF/README.md at master · vvvv/VL.CEF · GitHub
EDIT: and please use upcoming 0.0.7-alpha with it.

1 Like
  • If tracking isn’t working with your webcam you can try SpoutCam which works for me.

  • The camera image that comes from CEF / WebBrowser is somewhat scaled and translated. Thus the landmark positions don’t match the image. They match the “original” camera texture.

  • The tracking result is about 4-5 Frames delayed compared to the original camera feed idk if this is caused by SpoutCam.

  • Extracting the postions from XElement is quite costly.


The attached patch might not work out of the box because ApplicationPath is used inside MediaPipe to set the Content Base Directory. You can copy & paste the stuff into the original help patch.

HowTo Use MediaPipe - Edit.vl (56.6 KB)

It’s because the WebBrowser / ToStrideRenderer scales with the window so if the window doesn’t have the same size as the video they don’t match.

such issues should mostly be fixed with 0.0.10-alpha.

good point, this is also fixed with 0.0.10-alpha.

On 0.0.10-alpha and gamm 5.3-0414 I don’t see the image texture anymore.
RenderWindow is only gray!

The Timing is running and even see the Xelement of the FaceDetector’s score changing…

The VL.CEF.Stride version should be 5.3 right?

indeed i changed a default behavior here: if this happens you can now either follow the troubleshooting instructions i posted to motzis question above, or enable the hidden “…Shared Texture…” input on the MediaPipe node.

1 Like

update: 0.0.11-alpha comes with improved parsing: Face, FaceLandmark and Pose nodes now have properly typed outputs. still not perfect, but should be usable. thoughts?

1 Like

There are some typos like sometimes using camel case and sometimes not.
Getting the positions of the FaceLandmarks for example is way cheaper now than before with XElement, but still I think there should be Reactive versions of the nodes to be able to do that “extraction” off the mainloop.
The nodes should check for valid inputs, otherwise one needs to have IsAssigned / If combos all over the place.

image

update 0.0.12-alpha:

  • fixes many properties to camelcase
  • FaceLandmarksDetector now returns a spread of landmarks directly
  • @schlonzo @karistouf this provides a workaround if you have the problem with no webcams showing up in the enum: the MediaPipe now has a hidden “Websocket Port” input which you can change to a free port.
1 Like

Hum, spout now, no camera, no hidden socket. is it working with gamma 5.2 ?