I’m working on a TouchDesigner patch that can automatically scrape image files from a specified webpage (for example, a subreddit) and make them available for use in the patch.
Based on my understanding, I would:
Use a Web Client DAT to connect to the page
Use a Table DAT to store the image URLs
Use a Web Render TOP to render the images
However, I’m unsure how to handle the steps between fetching the website URL and actually scraping the image files.
I have a decent grasp of Python, but I’m struggling to apply that knowledge within the TouchDesigner environment. Any guidance on this specific workflow would be greatly appreciated! Additionally, if anyone has resources for learning how to interact with Web APIs using Python (in TouchDesigner or otherwise), I’d love to check them out.
the Web Client DAT has a callback DAT attached in which the function onResponse() receives amongst other things also the website content (in the data argument).
You can now either write a parser in this onResponse function which places all the image urls into a Table DAT or make use of an Extension for maybe a bit cleaner code structure.
A parser that is included with the default python distribution is HTMLParser for simple HTML parsing:
you usually create your own class inheriting from HTMLParser and make use of its Methods like handle_starttag to parse to your needs. For image tags this might look like this:
from html.parser import HTMLParser
class ImageParser(HTMLParser):
def __init__(self):
super().__init__()
self.image_urls = []
def handle_starttag(self, tag, attrs):
if tag == "img":
for name, value in attrs:
if name == "src":
self.image_urls.append(value)
def onResponse(webClientDAT, statusCode, headerDict, data, id):
# get the encoding from the header
encoding = headerDict.get('content-type', '')
encoding = encoding.split('charset=')[-1] if 'charset=' in encoding else 'utf-8'
# data is byte data, so decode it to a string
html = data.decode(encoding)
# parse the html to get the image urls
parser = ImageParser()
parser.feed(html)
# write the image urls to a table
table = op('image_urls')
table.clear()
for url in parser.image_urls:
table.appendRow(url)
return
Once in the Table DAT, you can use a Replicator to create as many copies of the Movie File In TOP as necessary and load the images via that: The Movie File In TOP does accept urls as sources. The Movie File In TOP’s File parameter would have to be a reference to the Table DAT holding your urls.
This should work for most but you might encounter some issues that have to be taken care of like relative urls and such…
Make sure to check out the OP Snippets for the mentioned operators for more examples on how these can be used.
Just as a side note that this working is highly dependent how the website is rendered. There are some website which will only fetch the actual content displayed using javaScript which will in fact not work with the webclientDat as it will only fetch the initital HTML. In this case it will only return a simple website noting you that the website does only work with JavaScript enabled.
In this case you are almost out of luck and might need to resolve to packages like pupeteer or scrapy, which is more involved then using the webClientDAT.
Also, you could look at the XML-Dat for parsing. Check the OP-Snippets for examples.