Azure Kinect- depth masking and body tracking issues

So after a long wait it has arrived. But already experiencing frustration doing simple depth masking

My Kinect Azure TOP settings are
Camera FPS 15 FPS
Color Resolution 1920 X 1080 (16:9)
Depth Mode Wide FOV -Unbinned (1024x1024)
CPU Body Tracking Off
Image Color
Align Image to Other Camera On
Sync Image to Body Tracking Off
Mirror Image On

I then connect a Kinect Azure Select TOP with Image set to Player Index.
With the v2 sensor I am able to easily use the Player Index image to create the mask I need for depth masking (background substitution or green screen type effects)

But the Player Index image with the Azure has HUGE amounts of latency which makes it unusable as a mask source.
Surely there must be a solution otherwise the ‘Align Image to Other Camera’ switch is useless!
I haven’t installed the official BodyTracking SDK so perhaps that is part of the issue?

On the subject of body or skeletal tracking 2 questions>

  1. Is the performance and accuracy of body tracking in part determined by what nVidia GPU you have?


  1. Since it is ML based, can we expect continued improvements in speed and accuracy thru firmware updates?

Thanks for any feedback.
azure_kinect_depthMasking.toe (4.5 KB)

I also have a Kinect Azure and I am quite happy with it. I only think that, contrary to V2, its not build for game but for professionnal use. I use it to mask but I use the pointcloud image with a glsl top, so I can focus on what interest me (using distance treshold and eliination of static points) and the result is quite good with very low latency. I use “align image…” if I need to align the color image with the pointcloud, but habitualy I use it as mask to project it on moving body.

I tried your file and had the same results until I switched to 2x2 Binned mode at 15fps, on my main media setup (1080ti).

flytrap> 2x2 binned mode just results in a lower 512x512 masking, no?

jacqueshoepffne > its really not clear to me how you can pull an accurate mask just using the point cloud image.
If its a single tracked user in an empty room, possibly.
I see a ramp going from white to magenta , blue and black (closest)

What happens if you are trying to mask more than 1 person?
That was the advantage of using the player Index image in v2.

the advantage its is a representation of the body in 3D, so you can discriminate body from their position and most interesting, you can have the model of your body and have the camera where she is in reality, not only frontal.
I made a project in a church with the V2, there is some lag but I use it and I can use the depth to have different color for different person.
Here is the link with some images of the prototype. I am working to do a better work with 3 azure and less lag.

1 Like

Thanks for sharing. Really love the effect starting at around 2:10 where the dancers are tossing balloons to each other. I need to look into accessing pointcloud data via glsl. Clearly there is more happening than just a depth image mapped to a color ramp.

1 Like

Yes there is a lot of info. I use principally three things:
A cache Top with the point cloud of the space without dancer, so I can subtract all the points who doesn’t change and isolate the dancers
A depth min and depth max to isolate something at the same distance from kinect
An algorithm to separate each form
Unfortunately, here at the moment, there is no kinect and no dancer, I can only show you how its made.
Everything is treated inside the pixel shader. I input 2 textures, the cache and the direct kinect and I output color and depth.


be sure that CPU Tracking is disabled in the Kinect Azure TOP. That can make things very laggy. In General I found that the sceletal tracking is laggier then that of the v2, but way more reliable. (90° angles, sees if you stand to it with your back, picks up users faster and reliable).

About masking: Try using a lookup top with a ramp. I found this to be a better way for masking then the playerindex as you can definde a nice fallof n stuff (instead of the rgb image).

1 Like

If you want to mix the body tracking or player index data with the camera images (color, ir or depth), make sure to turn on ‘Sync Image to Body Tracking’. This will make sure the camera image displayed matches the frame that the skeleton data (and player index) was taken from.

This will introduce some lag to your color image stream, since it takes a few frames for the kinect to identify the skeletons, but it will make it much easier to use the player index as a mask.

For reference, the TOP stores both the current video frame as well as the body tracking video frame, so you can have a real-time stream as well as one synced to the skeleton data if you need it.

Thanks. Would the results be better than normalizing the depth Image. Could you possibly share a .toe?

Thanks Rob. Unfortunately in addition to huge latency, one of the problems of Player Index is the imperfect body silhouette. Simply impossible for any useful masking!
The only real solution is to start with the Depth Image and somehow extract the proper masking needed (Lookups, Normalizing, Threshold etc) Still working on a useful solution.

MS team really dropped the ball with this device IMHO. They should have included legacy functionality from the V2 for body tracking and let the end user decide which one they want.