Hello again, Markus @snaut ! Thanks for finding me again
So instead of using a Script TOP, you could use a Script DAT or CHOP to extract the size and coordinates of the detected faces - this then could be used either with an array of Crop TOPs or perhaps rectangles being instanced and you calculate the UV coordinates and scaling from the position and size of the detected faces.
I am very glad you mention this approach, because it is the new plan of attack I have come up with too. Problem is, I am still very new and am scratching my head as to how to implement it. I have been using my rudimentary knowledge of TD and python so far, combined with referencing forums and also using AI to help write code.
Allow me to bring you up to speed with my progress. Here is my toe if you want to take a look.
005.opencv-face.27.toe (5.0 KB)
Here is my current code for the script that detects the faces. It includes commented out lines from when I was still trying to do the crops in the code.
import cv2
import numpy as np
face_cascade = cv2.CascadeClassifier('//Users//noah//Desktop//TouchDesigner//haarcascades//haarcascade_frontalface_default.xml')
def onCook(scriptOp):
full_frame = op('videodevin1').numpyArray(delayed=True)
lowres_frame = op('fit1').numpyArray(delayed=True)
# frame = scriptOp.inputs[0].numpyArray(delayed=True)
gray = cv2.cvtColor(lowres_frame, cv2.COLOR_RGB2GRAY)
gray = (gray * 255).astype(np.uint8)
faces = face_cascade.detectMultiScale(gray, 1.2, 1) * 4
scriptOp.store('faces', faces)
# face_images = []
for (x, y, w, h) in faces:
cv2.rectangle(full_frame, (x, y), (x+w, y+h), (255, 0, 255), 2)
face = full_frame[y:y+h, x:x+w]
# face_images.append(face)
# Create a grid of detected faces
# rows = 2 # Number of rows in the grid
# cols = (len(face_images) + 1) // 2 # Number of columns in the grid
# grid = np.zeros_like(full_frame)
# for i, face in enumerate(face_images):
# row = i // cols
# col = i % cols
# grid[row*face.shape[0]:(row+1)*face.shape[0], col*face.shape[1]:(col+1)*face.shape[1]] = face
scriptOp.copyNumpyArray(full_frame)
return
The gist of it is that I get the faces and then store them in the Script TOP’s storage.
Then I extract those using a Script CHOP like so:
def onCook(scriptOp):
scriptOp.clear()
faces = op('script1').fetch('faces')
x, y, w, h = scriptOp.appendChan('x'), scriptOp.appendChan('y'), scriptOp.appendChan('w'), scriptOp.appendChan('h')
scriptOp.numSamples = len(faces)
for i in range(len(faces)):
x[i], y[i], w[i], h[i] = faces[i][0], faces[i][1], faces[i][2], faces[i][3]
return
Here is my TD node set up:
So you can see I am extracting the x y w h from the faces and cropping to the rectangle that data creates.
- I noticed that the frontalface default haar cascade, when it does correctly detect my face, seems to be detecting my forehead and eyes and nose mostly…rarely my mouth. Any idea why that is or how to improve? Or is there maybe an issue with how I am implementing or drawing the rectangles?
Are there better face detection models I could use? This one seems very strict and obivously doesn’t detect profiles, etc. Interestingly enough, the profile face haarcascade doesn’t detect my profile at all! lol
The haarcascade captures a lot of false positives (not faces) but that is because I have it set to very low settings because when the settings are at some of the recommended values it was rarely detecting my face.
-
I am using a crop node to crop the video to what I imagine is the first face or result that is detected. I could imagine setting up a bunch of crop nodes and and cropping to the same values from a different dimension in the data arrays (is that right?). I think an issue or question with this approach is how do we know how many faces are detected and how many of the crops to show on screen at a time? When no face is detected, the crop still happens and it just shows what I think is a single pixel - this isn’t great.
-
I am not yet sure exactly how I want the end result to look. I am contemplating a couple ideas for now. I think I like the second idea below the best right now.
-
a grid of faces that always exactly matches the number of faces detected and are the same size. when 1 face, it is shown alone in a square that fills as much of the screen as it can. 2 faces, see them side by side equal size. 3 - a row of 3. etc. once it gets to a certain point it starts a second row and adds faces and adjusts the display as needed to fit how ever many faces are detected.
-
boxes of the faces pop up on screen at random spots and maybe random rotations. the faces may be all the same size, or they may be the size they are detected. when a face is no longer detected, the last frame that was detected likely stays on screen while newly detected faces are still popping up and eventually end up covering the old ones. this just becomes a collage of all the faces detected. This video seems helpful for this approach: https://www.youtube.com/watch?v=kNCnyUaMZSg
If you made it this far with me - THANK YOU THANK YOU THANK YOU