Gaze direction with Kinect Azure

Hi,
I am asking here since I have not dabbled much with the Azure Kinect in TD. Basically, I’m trying to mesh 4 azure kinects around a room in order to track several people at once from different directions. The goal is to infer their gaze direction using their head position and rotation data; then, they fire a vector from their head position towards the walls to calculate the area where a person is looking at a particular time. I am also planning to check for any intersections of gaze vectors (if there is more than one person). Mainly I want to know the rough world space coordinates (area) of where each person is looking in a room (without the use of eye trackers).

Before I go deeper into the problem, I was wondering what the best approach would be? Do I even need so many kinect azures? If so, would you recommend I merge into one view using Azure Pointcloud Merger | Derivative and then work on that - or should I work on each Kinect separately?

The idea is to extract the quaternion for the head joint and then convert to a direction vector. From there use the line tool to visualise the vector. My further question then is - is my approach over complicating things? Am I missing some out-of-the box implementation that I can use? Do I even need the depth information for such a problem - or would just using the colour camera with say mediapipe head detection be enough?

Other people may have some different suggestions, but regarding the Kinect approach: this should work in general, but just note that there isn’t really a good way of automatically matching up users between multiple kinect azure cameras. All of the body tracking is done separately, so user 1 for one camera might be user 2 on a different camera. I’ve seen users merge the point clouds together and then do blob detection on the results, but this won’t work if you need skeleton data. I’m not sure if there’s a good way to infer direction from just the point cloud blob.

Hi Rob,

Thanks for the info! I have decided to just work with blob tracking rather than the skeletal data. I have been trying to find the best way to stitch the two Azure Kinect outputs so that I have a FOV that captures the entire space (the azures are placed at opposite ends of a wall - both pointing outwards). Are there any suggestions on how to best achieve this please? Then I guess it should be possible to use that stitched image for blob tracking, right? So, player 1 remains as such across the entire space, irrespective of which area is being covered by which camera.
Thanks!

We’ve experimented with automated methods for aligning cameras, but I don’t know that it is really necessary for the accuracy needed for blob detection. If the cameras are in fixed positions, I think most users just manually adjust the transforms until the point clouds line up sufficiently.

This is easiest to do by placing a couple objects in your scene that are visible to both cameras and connecting each kinect azure top to a point transform top and then into the pointMerge component from the palette. You can manually adjust the translate/rotate parameters in the point transform top until the points of the reference objects match well enough.

The two clouds will never align perfectly because there is always some level of distortion form the camera lens, but hopefully it should be good enough to run the blob detection on.

If you do want to try automated alignment, I know users have worked with OpenCV algorithms like in this case: Multi Camera Alignment using opencv help