How to Change Visuals Based on Specific Words Spoken Using Live Audio and Python?

Hi everyone,

I’m working on a project in TouchDesigner where I want to create visuals that change dynamically based on specific words being spoken in real-time. My goal is to process live audio input (e.g., from a microphone), transcribe it into text, and then trigger specific visuals depending on the detected words or topics.

What I’d Like to Know:

  • Has anyone successfully implemented a similar setup in TouchDesigner, where visuals react to spoken words or topics?
  • Are there better ways to manage live audio chunks in TouchDesigner (without using a Record CHOP) for external APIs like Whisper?
  • Any tips on optimizing the transcription and word detection pipeline to minimize latency?

I’d love to hear your thoughts or approaches to solving this problem! Any examples or advice would be greatly appreciated. :blush:

Thank you in advance!

The speech-to-text is probably best done with the help of AI in TouchDesigner, like Whisper which you mentioned. For that have a look at Torin’s whisper plugin here: https://youtu.be/34s2p9gzWhs?si=r0HDfySpwvP6CGIJ
In this he saves the audio out to .wav files (Audio File Out CHOP) and then feeds them to Whisper, instead of using a Record CHOP.