For instance I want to run the image captioning model from hugging face within Touchdesigner.
Normally
1.I would setup a python environment
python -m venv myenv
myenv\Scripts\activate
2.install required libraries
pip install transformers
pip install torch
pip install torchvision
pip install Pillow
- Run a python script
import torch
from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer
from PIL import Image
# Load the pre-trained model and tokenizer
model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
feature_extractor = ViTFeatureExtractor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
# Function to generate captions
def generate_caption(image_path):
# Load and preprocess the image
image = Image.open(image_path)
if image.mode != "RGB":
image = image.convert(mode="RGB")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
# Generate caption
with torch.no_grad():
output_ids = model.generate(pixel_values)
caption = tokenizer.decode(output_ids[0], skip_special_tokens=True)
return caption
image_path = "image/image.jpg"
caption = generate_caption(image_path)
print(f"Caption: {caption}")
Now if I want to run this script directly from within TD,
-how can have make sure I have the required libraries installed and linked for the model/python script to run? Should I somehow load the python environement in the script? is there a recommanded workflow?
-Additionaly how can I pass an input and return an output with the python script/DAT Operator(image, string, float…)
Thank you for your help!