In TouchDesigner, how can you project real world 3d geometry onto the 2d screen using OpenCV's solvePnP camera pose estimation results?

I’m following methodology that uses opencv’s solvePnP method to estimate a camera position based on a correspondence of 2d to 3d world points.

You can read on a great explanation of how this works in this guide:
learnopencv.com/head-pose-e … -and-dlib/

A full code example from that guide for doing this:

import cv2
import numpy as np
 
# Read Image
im = cv2.imread("headPose.jpg");
size = im.shape
     
#2D image points. If you change the image, you need to change vector
image_points = np.array([
                            (359, 391),     # Nose tip
                            (399, 561),     # Chin
                            (337, 297),     # Left eye left corner
                            (513, 301),     # Right eye right corne
                            (345, 465),     # Left Mouth corner
                            (453, 469)      # Right mouth corner
                        ], dtype="double")
 
# 3D model points.
model_points = np.array([
                            (0.0, 0.0, 0.0),             # Nose tip
                            (0.0, -330.0, -65.0),        # Chin
                            (-225.0, 170.0, -135.0),     # Left eye left corner
                            (225.0, 170.0, -135.0),      # Right eye right corne
                            (-150.0, -150.0, -125.0),    # Left Mouth corner
                            (150.0, -150.0, -125.0)      # Right mouth corner
                         
                        ])
 
 
# Camera internals
 
focal_length = size[1]
center = (size[1]/2, size[0]/2)
camera_matrix = np.array(
                         [[focal_length, 0, center[0]],
                         [0, focal_length, center[1]],
                         [0, 0, 1]], dtype = "double"
                         )
 
print "Camera Matrix :\n {0}".format(camera_matrix)
 
dist_coeffs = np.zeros((4,1)) # Assuming no lens distortion
(success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs, flags=cv2.CV_ITERATIVE)
 
print "Rotation Vector:\n {0}".format(rotation_vector)
print "Translation Vector:\n {0}".format(translation_vector)
 
 
# Project a 3D point (0, 0, 1000.0) onto the image plane.
# We use this to draw a line sticking out of the nose
 
 
(nose_end_point2D, jacobian) = cv2.projectPoints(np.array([(0.0, 0.0, 1000.0)]), rotation_vector, translation_vector, camera_matrix, dist_coeffs)
 
for p in image_points:
    cv2.circle(im, (int(p[0]), int(p[1])), 3, (0,0,255), -1)
 
 
p1 = ( int(image_points[0][0]), int(image_points[0][1]))
p2 = ( int(nose_end_point2D[0][0][0]), int(nose_end_point2D[0][0][1]))
 
cv2.line(im, p1, p2, (255,0,0), 2)
 
# Display image
cv2.imshow("Output", im)
cv2.waitKey(0)[/code]

Diving into this code, you get a camera matrix, rotation vector and translation vector by:

To get the camera matrix, you do:

camera_matrix = np.array(
[[focal_length, 0, center[0]],
[0, focal_length, center[1]],
[0, 0, 1]], dtype = "double"
)

Where focal_length is assumed to be the width of the image in pixels, and center is the center in pixels of the image based on the resolution.

In the case of a 1080p image, it would look like:

camera_matrix = np.array( [[1920/2, 0, 1920/2], [0, 1920/2, 1080/2], [0, 0, 1]], dtype = "double" )

The rotation and translation vectors are gotten by running:

(success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs, flags=cv2.CV_ITERATIVE)

Where model_points are the real world points, and image_points are the corresponding points, in pixels, on the screen.

Once you have these values, in this code sample, you can project the world points onto the screen coordinates by using:

(nose_end_point2D, jacobian) = cv2.projectPoints(np.array([(0.0, 0.0, 1000.0)]), rotation_vector, translation_vector, camera_matrix, dist_coeffs)

Once you have the camera pose rotation and translation vectors, as well as an intrinsic matrix, how can you use these to project 3d world geometry on the screen using the TouchDesigner camera comp, similarly to how cv2.projectPoints does it with points?

In my opinion using pixel-related values for the focal length is a bad idea. I think it’s better to know the true horizontal FOV of your camera and place it on the z-axis such that the width of the camera’s view on the XY plane is 1 meter wide. This distance that you place it on the z-axis should then be added to the translation result you get from solvePnP. The XY Center would be zero, not width/2. You’ll have to account for aspect ratio in a few places and use 3D model points that are more reasonable for meters.

Hey,

to use the intrinsic returned by openCV in TouchDesigner, you need to do some conversions.
You can have a look at the extensions of camSchnappr in the palette or the Extension of the Kinect Calibration routine here: derivative.ca/Forum/viewtopi … 22&t=12895

The function in question is called returnIntrinsics()

Best
Markus

Ok, so I’ve finally figured it out!

I discovered some amazing code from the research paper Soccer on your TableTop. In the file camera.py, there is this bit of code that converts opencv calibration results to opengl:

In the below code, A is the intrinsic matrix (using pixels), R is the 3x3 Rodrigues matrix, T is the translation vector, h is the screen height and w is the screen width.

def opencv_to_opengl(A, R, T, h, w, near=1, far=1000):

    fx, fy, cx, cy = A[0, 0], A[1, 1], A[0, 2], A[1, 2]

    F = np.array([[fx/cx, 0, 0, 0],
                  [0, fy/cy, 0, 0],
                  [0, 0, -(far + near) / (far - near), -2 * far * near / (far - near)],
                  [0, 0, -1, 0]])

    projection_matrix = F.T

    deg = 180
    t=deg*np.pi/180.
    Rz = np.array([[np.cos(t), -np.sin(t), 0],
                    [np.sin(t), np.cos(t), 0],
                    [0,0,1]])

    Ry = np.array([[np.cos(t), 0, np.sin(t)],
                    [0,1,0],
                    [-np.sin(t), 0, np.cos(t)]])

    R_gl = Rz.dot(Ry.dot(R))

    view_matrix=np.zeros((4,4))
    view_matrix[0:3,0:3]=R_gl.T
    view_matrix[0][3] = 0.0
    view_matrix[1][3] = 0.0
    view_matrix[2][3] = 0.0
    view_matrix[3][0] = T[0]
    # 	also invert Y and Z of translation
    view_matrix[3][1] = -T[1]
    view_matrix[3][2] = -T[2]
    view_matrix[3][3] = 1.0

    return np.array(view_matrix).astype(np.float32), np.array(projection_matrix).astype(np.float32)

After looking at this code, I realize that in opencv, Y and Z are inverted compared to the standard computer graphics coordinate system. See this stackoverflow discussion.

OpenCV Coordinate System from that stackoverflow question:
OpenCv Coordinate System

So using the above code, which takes care of flipping y and z, I build the modelViewMatrix, and apply that as the Xform Matrix/Chop/Dat on the geometry, to transform the geometry from the world to the camera space:


In the above code, R_in is the a chop containing the Rodrigues matrix, and T_in is a chop containing the translation vector. Both are obtained from openCV’s solvePnP method.

Note that to get the Rodrigues matrix, you need to take the rvec from solvePnP and pass it to cv2.Rodrigues . The code to get R, and T can be seen here:

_, rvec, tvec = cv2.solvePnP(points3d, points2d, intrinsic_matrix,
                                 distortion_coefficients)
R, _ = cv2.Rodrigues(rvec)
T = tvec.T

I then, using the code from opencv_to_opengl above, build the opengl projection matrix, and set that as the custom projection matrix parameter for a camera:

In the code above, the toOpenGLProjectionMatrix base takes the intrinsic matrix and the screen res as inputs and builds the openGlProjectionMatrix, which is the output of the base.

I then can draw this on top of the original screen, and see that the geometry is projected from the 3d world onto the 2d screen correctly, based on the calibrated camera parameters:

7 Likes