Extracting rectangles from a numpyArray video and putting them into another video feed

I have code that detects faces in a video feed and draws rectangles around them.
Instead of drawing rectangles around the faces, I want to extract the faces and create a new video feed that is essentially a grid or “list” of the faces. Whenever there is a face detected, a new rectangle should show up in the new video feed with the contents of the detected face.

So you start with 1 video feed, and then end up with just the faces detected being displayed on another feed.

Here is my script that properly displays rectangles on the detected faces:

import cv2
import numpy as np

face_cascade = cv2.CascadeClassifier('//Users//noah//Desktop//TouchDesigner//haarcascades//haarcascade_frontalface_default.xml')

def onCook(scriptOp):
    full_frame = op('videodevin1').numpyArray(delayed=True)
    lowres_frame = op('null2').numpyArray(delayed=True)
    # frame = scriptOp.inputs[0].numpyArray(delayed=True)
    gray = cv2.cvtColor(lowres_frame, cv2.COLOR_RGB2GRAY)
    gray = (gray * 255).astype(np.uint8)
    
    faces_arrays = []
    faces = face_cascade.detectMultiScale(gray, 1.03, 1) * 4

    for (x, y, w, h) in faces:
        cv2.rectangle(full_frame, (x, y), (x+w, y+h), (255, 0, 255), 2)

    scriptOp.copyNumpyArray(full_frame)

    return

I think I have to use numpy array slicing to extract the faces, but this is a mystery to me with the 2 or 3 dimensional array that is a video feed numpy array

how can I achieve this?

Hi @nerk,

the for loop which is currently drawing the rectangles contains the necessary information for you to extract the faces: x, y, w, and h.

So instead of using a Script TOP, you could use a Script DAT or CHOP to extract the size and coordinates of the detected faces - this then could be used either with an array of Crop TOPs or perhaps rectangles being instanced and you calculate the UV coordinates and scaling from the position and size of the detected faces.

How do you imagine the list or grid to look like? Should all faces be displayed in same size rectangles?

Cheers
Markus

Hello again, Markus @snaut ! Thanks for finding me again :slight_smile:

So instead of using a Script TOP, you could use a Script DAT or CHOP to extract the size and coordinates of the detected faces - this then could be used either with an array of Crop TOPs or perhaps rectangles being instanced and you calculate the UV coordinates and scaling from the position and size of the detected faces.

I am very glad you mention this approach, because it is the new plan of attack I have come up with too. Problem is, I am still very new and am scratching my head as to how to implement it. I have been using my rudimentary knowledge of TD and python so far, combined with referencing forums and also using AI to help write code.

Allow me to bring you up to speed with my progress. Here is my toe if you want to take a look.
005.opencv-face.27.toe (5.0 KB)

Here is my current code for the script that detects the faces. It includes commented out lines from when I was still trying to do the crops in the code.

import cv2
import numpy as np

face_cascade = cv2.CascadeClassifier('//Users//noah//Desktop//TouchDesigner//haarcascades//haarcascade_frontalface_default.xml')

def onCook(scriptOp):
    full_frame = op('videodevin1').numpyArray(delayed=True)
    lowres_frame = op('fit1').numpyArray(delayed=True)
    # frame = scriptOp.inputs[0].numpyArray(delayed=True)
    gray = cv2.cvtColor(lowres_frame, cv2.COLOR_RGB2GRAY)
    gray = (gray * 255).astype(np.uint8)
    
    faces = face_cascade.detectMultiScale(gray, 1.2, 1) * 4
    scriptOp.store('faces', faces)

    # face_images = []

    for (x, y, w, h) in faces:
        cv2.rectangle(full_frame, (x, y), (x+w, y+h), (255, 0, 255), 2)        
        face = full_frame[y:y+h, x:x+w]
        # face_images.append(face)

    # Create a grid of detected faces
    # rows = 2  # Number of rows in the grid
    # cols = (len(face_images) + 1) // 2  # Number of columns in the grid
    # grid = np.zeros_like(full_frame)
    
    # for i, face in enumerate(face_images):
    #     row = i // cols
    #     col = i % cols
    #     grid[row*face.shape[0]:(row+1)*face.shape[0], col*face.shape[1]:(col+1)*face.shape[1]] = face

    scriptOp.copyNumpyArray(full_frame)

    return

The gist of it is that I get the faces and then store them in the Script TOP’s storage.
Then I extract those using a Script CHOP like so:

def onCook(scriptOp):
    scriptOp.clear()

    faces = op('script1').fetch('faces')

    x, y, w, h = scriptOp.appendChan('x'), scriptOp.appendChan('y'), scriptOp.appendChan('w'), scriptOp.appendChan('h')

    scriptOp.numSamples = len(faces)

    for i in range(len(faces)):
        x[i], y[i], w[i], h[i] = faces[i][0], faces[i][1], faces[i][2], faces[i][3]
    
    return

Here is my TD node set up:

So you can see I am extracting the x y w h from the faces and cropping to the rectangle that data creates.

  1. I noticed that the frontalface default haar cascade, when it does correctly detect my face, seems to be detecting my forehead and eyes and nose mostly…rarely my mouth. Any idea why that is or how to improve? Or is there maybe an issue with how I am implementing or drawing the rectangles?

Are there better face detection models I could use? This one seems very strict and obivously doesn’t detect profiles, etc. Interestingly enough, the profile face haarcascade doesn’t detect my profile at all! lol

The haarcascade captures a lot of false positives (not faces) but that is because I have it set to very low settings because when the settings are at some of the recommended values it was rarely detecting my face.

  1. I am using a crop node to crop the video to what I imagine is the first face or result that is detected. I could imagine setting up a bunch of crop nodes and and cropping to the same values from a different dimension in the data arrays (is that right?). I think an issue or question with this approach is how do we know how many faces are detected and how many of the crops to show on screen at a time? When no face is detected, the crop still happens and it just shows what I think is a single pixel - this isn’t great.

  2. I am not yet sure exactly how I want the end result to look. I am contemplating a couple ideas for now. I think I like the second idea below the best right now.

  • a grid of faces that always exactly matches the number of faces detected and are the same size. when 1 face, it is shown alone in a square that fills as much of the screen as it can. 2 faces, see them side by side equal size. 3 - a row of 3. etc. once it gets to a certain point it starts a second row and adds faces and adjusts the display as needed to fit how ever many faces are detected.

  • boxes of the faces pop up on screen at random spots and maybe random rotations. the faces may be all the same size, or they may be the size they are detected. when a face is no longer detected, the last frame that was detected likely stays on screen while newly detected faces are still popping up and eventually end up covering the old ones. this just becomes a collage of all the faces detected. This video seems helpful for this approach: https://www.youtube.com/watch?v=kNCnyUaMZSg

If you made it this far with me - THANK YOU THANK YOU THANK YOU

progress update!
I am now able to extract my different crops after injecting the data into a table DAT!

this will be helpful I think…

I am now certain (had a feeling before) that each face is not keeping the same index in the table as when it is first detected. so face on frame 1 line 0, may not be the same face on frame 2 line 0.
this is a problem

another project update @snaut !

005.opencv-face.43.toe (16.7 KB)

I am now cropping into a larger array of crop nodes, and I have a sorted version of the faces in a table.
I am sorting by the x value from faces.
this is an attempt at getting the faces to say in the same crop op for longer. it helps a little bit

Hi @nerk,

getting a stable output in large part will also require redetection of the face. Maybe an a bit more involved example as this one will have better results (and perhaps performance):
https://docs.opencv.org/4.x/d0/dd4/tutorial_dnn_face.html

Otherwise, you could do various things to optimize a bit more:

  • reduce performance cost by moving the whole calculation in to a Script CHOP and outputing channels for x, y, w, h. A script could look something like this (including custom paramters to tweak performance a bit)
# me - this DAT
# scriptOp - the OP which is cooking

import cv2
import numpy as np

# press 'Setup Parameters' in the OP to call this function to re-create the parameters.
def onSetupParameters(scriptOp):
	page = scriptOp.appendCustomPage('Face Detection')
	p = page.appendTOP('Top', label='TOP')
	p = page.appendFile('Cascade', label='Cascade File')
	p = page.appendFloat('Scale', label='Scale Factor')
	p.normMin = 1.001
	p.normMax = 2
	p.clampMin = True
	p.default = 1.1
	p = page.appendInt('Neighbors', label='Min Neighbors')
	p.normMin = 0
	p.normMax = 10
	p.default = 1
	return


# get the node this DAT is docked to
# this way we can get to the custom file parameter above
dockedScript = me.dock
face_cascade = cv2.CascadeClassifier(me.dock.par.Cascade.eval())

# called whenever custom pulse parameter is pushed
def onPulse(par):
	return

def onCook(scriptOp):
	scriptOp.clear()
	inputFrame = scriptOp.par.Top.eval().numpyArray(delayed=True)
	
	gray = cv2.cvtColor(inputFrame, cv2.COLOR_RGB2GRAY)
	gray = (gray * 255).astype(np.uint8)

	# detect faces
	faces = face_cascade.detectMultiScale(gray, scaleFactor=scriptOp.par.Scale.eval(), minNeighbors=scriptOp.par.Neighbors.eval())

	# setup CHOP
	facesDetected = len(faces)
	scriptOp.numSamples = facesDetected
	x = scriptOp.appendChan('x')
	y = scriptOp.appendChan('y')
	w = scriptOp.appendChan('w')
	h = scriptOp.appendChan('h')
	a = scriptOp.appendChan('active')
	
	# transpose the array with detected faces
	if len(faces):
		faces = faces.transpose()
	
		# now we get an array of 4 arrays
		# this is easy to assign to channel values
		x.vals = faces[0]
		y.vals = faces[1]
		w.vals = faces[2]
		h.vals = faces[3]
		a.vals = [facesDetected]

	return

From here you could either convert to a DAT and use Replication to create the required amount of Crop TOPs (check out the Replicator COMP) or drive instancing to do a preview as well as cut out faces into squares.

For the full instancing approach you would have to calculate the uv offset and scale using the position and size channels returned from openCV. This can nicely be done in CHOPs as well…

Best
Markus

@snaut thank you!

I spent the evening figuring out how to recreate a similar effect using the script CHOP and Replicator and I did it!
File attached if you want to see my progress :slight_smile:

In the coming days I will try to figure out your instancing method, as well as the DNN thing - that looks like a game changer for me but also a lot to dive into.

I’ve also been talking with someone who has given me tips on where to go to best teach myself other TouchDesigner + Python stuff. This is the beginning of an exciting journey for me.

005.opencv-face-v2.24.toe (5.9 KB)

1 Like

hey @snaut I implemented a solution using instancing, and then decided I wanted to proceed with the replicator method and improved that, and I am well off to the races on finishing this project!

I am trying to implement the DNN method using the info here: OpenCV: DNN-based Face Detection And Recognition

I am getting an error:
module 'cv2' has no attribute 'FaceDetectorYN'

Was this deprecated? HAving a hard time finding matching documentation

Hi @nerk,

the version of openCV in our current official release is 4.5.2 with this feature only available in versions 4.5.4+. You can use our Experimental Builds though which have openCV 4.7.0.

Cheers
Markus

Great, thanks @snaut

Still having trouble getting it to work. Do you know of a tutorial or example that has it working in TD?

Hey @nerk,

no - not aware of somebody trying to implement it. Can you describe what the issues are?

Cheers
Markus

@snaut I have started a new thread to continue this: Adapting opencv facial recognition script from haarcascade to DNN

Thanks for your continued attention!