Using ONNX Runtime in C++ CUDA plugin

Hi everybody

I am trying to incorporate ONNX runtime in a C++ CUDA Plugin, but haven’t had much luck yet. Whenever I build it I receive the Error: Failed to load the .dll error. I am not sure how to see what the exact error is, though, as my build goes through fine. When I don’t include the ONNX runtime dependency the .dll can be loaded, so I am pretty sure the fishy part is there, just can’t say what exactly.

I am using VS 2019 and CUDA 10.1 (with the experimental 2020.41040 build) and the directML ONNX runtime 1.5.1 nuget build on Windows 10. Are there any ways to see more concrete information in Touchdesigner as to why the .dll can’t be loaded? Or anybody that has been successful with this before?

First time posting here so not sure if you guys need more information, so just tell me if you do.


Hey MaxHuerli,

Welcome to the forum!
You can use a tool like Dependencies to scan your compiled .dll when it is in your TD plugin folder, to check if all dependencies such as external .dll files can be found.

1 Like

Is it possible it is trying to load the wrong version of the onnxruntime.dll? TouchDesigner currently ships with the CUDA 10.0 version of the library for use with the Kinect Azure.

So finally had the time to come back to my side project (only took forever). It ended up being missing dependencies of directX, such that the directML nuget package failed to load. I now just swapped to using WinML instead of raw ONNX runtime, I hope that will lead to more success. I am quite excited to see if I can include some real-time neural network inference in TouchDesigner.

The experimental 2020.41040 build uses CUDA 10.1 if I remember correctly, so that wasn’t it. Was just a missing directX dependency.

Thank you for the tool recommendation, helped me a ton to find the source of the error.

Hi #maxhuerli,
Did you succeed using winml with c++ cuda ?

I moved away from it, as the interop between CUDA and DirectML was too complicated for me, also other people discouraged me from it, as the interop can be slow. So staying in CUDA land seems smarter.

Onnxruntime-gpu is ok in python touchdesigner, but I wondered in c++ op to separate runnings.
Maybe a quite good way, I did not try, is experimental Td with python 3.9 and shared memory…
How do you run inferences ?

Did you try WinML with C++ Op and cpu memory ?

A good point with WinML, it is possible to link many models without transferts gpu-cpu between models in chain.

I am not fully sure I understand you, so correct me if I am not giving you the answers you wanted. I am trying to circumvent any type of python, so I don’t have any experience there. Also, I don’t want to use any CPU memory, as transferring memory is quite expensive. I understand that it is easier as APIs generally support that use-case, but copying memory from GPU to CPU and then back to GPU just for using specific APIs is pretty wasteful, even if it is only done so at the start of a model chain. Of course you can try doing it that way, but I am pretty sure it will be too slow.
Personally I tried converting from the provided cudaArray to an instance of ID3D12Resource such that I could create an instance of ITensorNative to run the inference with, but I did not succeed doing so.

Yeah i understand.

Maybe you know, but David Braun did cuda top pytorch, maybe that can help or if your model exist in pytorch and can be export to run under libtorch.


Yeah Davids project was what originally motivated me to do this :slight_smile: I am not sure what the overhead is when using pyTorch, but in general it’s not as optimized for inference as ORT as far as I know, although I saw they now even have TensorRT support, so maybe it’s even comparable when running quantized models now. I haven’t run any benchmarks comparing them, so only speculation from my side.

Ah I had another code base in mind where he actually used the python API. Not a huge fan of LibTorch, but mainly because I am more used to ORT. Also I feel the plugin is over engineered, but maybe I am just not seeing some stuff he encountered and the complexity of using OpenCV instead of just TouchDesigner nodes for pre-processing is worth it. But definitely interesting!