I’m trying to display the Chinese character from *.txt file (or any other method is welcomed).
I have set the text TOP to unicode and zh-tw, with *.txt encoding in ANSI, unicode, UTF-8, unicode-big-endian format. However, non of it display properly. Is there any one know to do deal with and give me some suggestion?
Attachments are my files and any Chinese character you use should be fine.
I’ve taken a look at your .txt and .toe files and I think that the issue comes down to an encoding problem. It looks like the characters that are not rendering properly from weekday.txt are encoded in the Latin-1 Supplement Unicode block.
Here’s how you can see this yourself:
If you open and print the file weekday.txt in python:
with open('weekday.txt','r') as f:
raw = f.read()
print(raw)
Your output should be:
To get more information about the improperly rendered characters, we can reencode the string in ‘unicode_escape’.
print(raw.encode('unicode_escape'))
This will give the output:
In this string we can see a few things going on.
Ascii characters are rendering properly (as in ‘Monday’)
‘\t’ represents an escaped tab character
A bunch of other strings of characters such as ‘\xac’
These third strings are the ones that are causing you troubles. Searching ‘\xac’ at charbase.com/00ac-unicode-not-sign. Even with the proper font settings, these characters won’t display as the Chinese characters you want because they don’t point to the Chinese characters you want!
I have some things going on this evening, but I’ll be back to try and get you a working example when I can.
Okay, here’s a working example. My disclaimer here is that encoding and unicode are not my forte so ymmv. I’m having a hard time wrapping my head around how encoding works in TouchDesigner’s FileIn and Text DATs - anyone with the knowledge able to shed some light here? I don’t seem to have any control over the file read operations/what kind of encoding is used.
Attached is a file, test.txt, that includes the first line of Chinese characters from this news article: news.xinhuanet.com/politics/2016 … 271629.htm. Important to note is that I saved the text file with UTF-8 encoding. I use Sublime Text as my editor and this is accomplished easily with [File > Save with Encoding > UTF-8] (should be a similar process with any text editor).
This python function opens a UTF-8 encoded text file and returns a string of Unicode escapes.
def file_to_unicode_escapes(filename):
with open(filename,'r',encoding='utf8') as f:
raw = f.read()
return raw.encode('unicode_escape').decode()
It seems like it encodes/decodes the text a redundant number of times, but I couldn’t get it to work successfully any other way - anyone have some advice? Regardless - this will work if you pass the string to the text parameter of a Text TOP:
Note that for this to work as desired you need to wrap the unicode escapes in an additional set of quotation marks and you need to set the expression parameter for the string to be evaluated as a unicode string.
If the font that you’re using supports Chinese characters, this should work. I had success with Google’s Noto Sans CJK SC font (released under the OFL license).
I’ve zipped up the font file, example text file, and a working .toe. Let me know if this works out for you! chinese_characters_td.zip (11.5 MB)
Thanks a lot for your super thoroughly explanation and python coding technique! I am not able to achieve in such a short time. It works in general even in other font type!
How touchdesigner manipulate FileIn DAT and Text TOP is still somewhat mystery. I still can’t display the character properly in table form or by reference from other OPs, but that can easily be overcome through other method since you make the biggest progress^^!
Will back to you if there are any significant progress.
No problem. I had the best luck by avoiding loading the text file into a DAT in TouchDesigner, couldn’t figure out any way to get it working that way. I suppose the condensed version of my post is: populate your Text TOP’s text parameter with unicode escape characters via python. TouchDesigner (at least on my machine - it could be local issue) doesn’t seem to like handling your characters and leaving them as unicode escapes prevents TD from altering the string before the font can render it.
Ah I also should have posted some of my references. If you want to learn more about encoding check out these very helpful links:
TouchDesigner only supports unicode in the Text TOP and Text SOP right now, which is why loading your file into a DAT isn’t working. Here’s the page on unicode: