Mutli-GPU or multi node? 11 screen installation

Hi, I am not a TD user, but have been playing around, (I usually just work with OF). I am researching a large project and am wondering if touch and my hardware proposition is a possible starting point. The system will have 11 HD outputs and the concept is to have these 11 screens capturing a viewport from a larger dynamic 3d environment. The environment will have animated particles, and videos that move in space (1 per screen but when there is a changeover there will be 22 videos playing as the scenes cross over).

I have been looking at the documentation and the GPU affinity and am thinking about using a large custom machine with a high core count threadripper and 4 quadro GPUs (it would be great if this could work with RTX cards!) to allow for the outputs I need. In this scenario I imagine using 4 instances of TD and duplicating and synchronising the 3d particles per GPU but only loading in the movie files need for the outputs related to the card selected for the instance.

Having not deployed a large TD system before I wonder if this is a logical way to go for maintenance and easy of installation, or if I am better off just using smaller separate machines?

I could also just try to optimise and use the biggest single GPU I can find (seemingly 3090 gives the best performance compared to any single quadro card) and use 4x data paths to keep everything on a single GPU - if there is enough power this seems like the smoothest solution, as there is no mulit-gpu mess. But this depends on how CPU commands are either queued or threaded for video playback…

Lastly I wanted to know about internal threading, particularly from playing back notchLC encoded files. If I am playing 8 HD files simultaneously on a single high power machine will the instances of playback run on separate threads inside a single instance? the videos do not need to be frame synchronised, and in fact they will not have the same frame rate as they come from a large archive). So will TD’s notchLC file playback execute the CPU bound part of the file load and playback in separate threads in each instance?

You’ll want to run some tests to see, but if you can do this on a single GPU using datapath splitters that’s definitely the easier way to work.

Our video playback is very threaded in general. NotchLC itself doesn’t currently easily lend itself to multi-threading the same file, but you will get a thread decoding each file you are running.
If you go with GPU affinity, then yes it works with the RTX cards.

Oh ok that is great and changes things a lot for me.
The docs here:

Still suggest it is only for Quadro cards:

" GPU Affinity is not supported on Geforce level cards. It is not recommended that you use multiple Geforce cards in a single system. In Windows 7 or later all the work will get done by the fastest GPU and then final images will be copied to the other GPUs to be displayed on-screen. However the performance of this vs. Quadros is unknown. Even on Quadro’s working without GPU affinity is not a suggested way to work."

And yes I will be doing a lot of tests before committing to hardware.
Many thanks for the fast response.

@fred_dev please note RTX does not mean Geforce or Quadro - as both Geforce and Quadro have RTX type cards. GPU Affinity only works on Quadro (RTX or not) cards.

Yep sorry I meant the Quadro RTX cards.

Thanks for the clarification.