60 fps Stuttering

Hello, all!

I’m working on an installation and we’re running into a strange issue. In short, Touch is reporting a solid 60 fps, but we’re experiencing a intermittent stutter on output. We have a couple theories as to why it may be happening (and it may be a combination of the two- or something else entirely) and I’d love to get some input from the community to help work through it.

Our setup:

Xeon CPU, Win 10
2x Quadro P6000 (1 attached to UHD display output, 1 to 1080p operator display)
4x Render instances of Touch (2 per GPU)
1x Compositing instance running on GPU 1 (outputting to single UHD display)

Everything is correctly affinitied to the right GPU and assigned the correct NUMA node

Theory 1: Not Full Screen Exclusive

For some reason, the compositing instance does not seem to be going into Full Screen Exclusive mode. The Window Op is set to fill the entirety of the UHD display, and I believe all of the other settings are set right to allow FSE, however, the telltale “blink” doesn’t happen when we switch application focus and we’re seeing this stutter, which leads me to believe it’s not exclusive.

Is there something I’m missing that is preventing enabling FSE? The TD doc page mentions it “may need mosaic,” but is this only the case when you’re attempting to span displays? Is there a problem with having one display in fullscreen while another is displaying the desktop? Is there some way to force FSE? We looked into Win 10’s “Fullscreen optimizations” but enabling / disabling seemed to have no effect.

Theory 2: Render Instances Out of Sync With Compositor

For performance, we’ve offloaded the majority of the rendering to multiple render instances that get composited together just before output in another instance. We have to use the Shared Mem TOP (instead of Spout) to get frames from GPU 0 to GPU 1; however, I’ve seen the stutter happen using Spout as well. All instances are running at 60 fps, but we think that if the render time varies within that (or when within that frame each render instance writes to its shared mem), the compositor may occasionally display an old frame (resulting in displaying two duplicate frames).

Here’s a diagram of what I think’s going on (using the compositor and one render instance):

In this case, the first two frames would draw as expected, the third would appear to stutter.

We’ve tried a number of things: increasing the frame rate of the render nodes, changing CPU priorities (all the way up to realtime), showing / minimizing render instance windows, putting certain windows in the foreground, making them focused / defocused, realtime flag on / off, turning on / off vsync etc. Nothing has completely solved the problem. We’ve seem to have had the best luck with all instances in realtime priority, drawing all windows (with all visible), with the render instance in focus and with a hog chop to increase compositor frame time, but it may be coincidental (it’s a very intermittent problem).

So, if this is what’s going on, the question becomes: is there a way to sync the TD processes so that the compositor draws only after each render instance has updated its texture?

This had led to a number of questions in our search for a solution:

  • Is vsync a good “metronome” to keep the separate processes in sync? Can we keep windows minimized and not drawn while still vsyncing?
  • Is there any thing like OnPreRender (ala Unity) that would let us way to wait to the end to get frames / composite?
  • Does Shared Mem Out “publish” its texture on cook or is it just maintaining a reference to a texture? If it’s on cook, would OnFrameStart or End be a way of reliably scheduling the texture update?
  • Is there any way to get when Shared Mem In has a new texture? The Info CHOP doesn’t seem to have any relevant outputs.
  • Would Sync In / Out CHOPs help? Sync Out from the compositor, when it’s no longer waiting for any clients via sync_external, then render?

If you’ve made it to the bottom of this post, thank you! I’d love thoughts on these theories or any others you might have!

Ps. I found some related posts that may be relevant (thoughts?):

This one is a little old, and doesn’t quite seem like our symptoms (also we can’t currently afford to run at 120hz)

This one mentions issues sharing across GPUs, but for performance reasons- we’ve found both shared mem and NDI performant and stable aside from this occasional frame hitch)

Thanks again!

1 Like

I just put together a simple demo of what I think the problem is.
The sender is sending a 60hz flicker (white every other frame), and the receiver occasionally drops frames, though both report running at 60.

Sender.toe (3.8 KB)
Receiver.toe (3.7 KB)

To repro:

  • open both toes,
  • put both in perform (open a viewer on the trail if you’d like)
  • place windows side by side- the sender should be a steady flicker, the receiver will occasionally stutter.

I experienced different levels of stuttering depending on what was in the foreground, and I bet it depends on hardware as well.

Just FYI!

Neither the Shared Mem In nor Spout have any notion of a frame queue or sync. They publish their data at some point within the frame, and then the receiver will pick it up at some point within the frame. The nodes themselves can cook at any point within the frame, and that point can change from frame to frame, so the hand-off is not guaranteed at all. There is no publish mechanism available to Spout (which we don’t control), and Shared Mem just doesn’t have the feature currently.

When drawing it turned off, vsync is ignored for the processes. It is only used when actually drawing the frame. Otherwise an internal clock is used solely.

You may be able to get away with using the Sync In/Out CHOP along with shared memory. The key would be to ensure that the up-chain nodes from the Sync Out CHOP cause the Shared Mem Out TOP to cook (output some info channel from it like it’s cook count for example). That would ensure the Shared Mem Out TOP has cooked and filled its memory before the Sync In CHOP processes start reading it. I’ve never tried this though so not 100% it will solve your issue.

You should also test with content generated purely on your process that is outputting to the UHD screen. I say this because if you aren’t getting the fFSE flicker you may still run into unexplained stutters on that process even if you fix the Shared Mem In/Out hand-off issues.

NDI does have a queue, so it shouldn’t suffer from these drops offhand, but that’s not to say it can’t drop frames sometimes.

Thanks, Malcom!

We went back to NDI and it’s much more stable! We experienced a stutter at one point, but it may have been something else…

Is there a Derivative-recommended method of doing this sort of frame-sync between instances? I bet we’re not the first people to have a setup like this. Until now, I always thought Syphon/Spout was the most reliable.

Thanks again!