Hi there,
I’m doing a project which involves looking at large numbers of video clips with audio where people speak a single word.
I would like to find the points where they start and stop speaking, and record them in a table to be used to trim the clips to those sections during playback.
So far I have been using logic chops to look at when voice volume goes over a certain threshold and using some lag, so multi-syllable words aren’t split up. This gives me a great realtime analysis of the audio. But I need to reduce this to two numbers that I can record in a table next to the filename of the clip. The frame index number where the volume goes over the threshold, and another where it returns below the threshold.
Is there a method for doing this?
Many thanks