Timing system overnight reset

I have been diving down system and hardware rabbit holes for weeks now trying to resolve an issue which looks increasingly like it has some component within TouchDesigner itself.

I am running a TouchDesigner 2025.32050 / Windows 11 25H2 26200.7840 project with a StereoLabs ZED (SDK 5.2.0) camera. Overnight, and specifically overnight (not apparently related to uptime), the camera stalls. The last valid frame of video is conspicuously nighttime-dark, lit only by a couple always-on displays.

Seeking to determine the time of the stall (suspecting it to be some Windows service interference), I threw in a quick Clock CHOP-Feedback-Switch circuit so when the frame-over-frame difference from the ZED feed drops to 0, the system time is latched.

This circuit, using the builtin Clock CHOP, which functions fine when I throw Constant test feeds into the Switch, was for several days straight reporting a stall time of shortly-after-program-launch despite clearly running all day. (e.g. I would get into work, resolve the previous night’s crash by unplugging & re-plugging the camera, relaunch TD shortly after 9am, watch it run all day until I leave the office at 5pm, and get back the next morning ~8:50am to a crash with a latched time of 9:04am the previous day).

So I rebuilt the same circuit using a custom Python Exec→Table for the time data rather than the Clock CHOP, and saw the same symptom: overnight stall, with a latched stall time of approximately when TD was launched.

Intrigued, and knowing that ZED is very sensitive to time changes, I wrote an independent NodeJS app watching both Date.now() (system time) and performance.now() (monotonic increase irrespective of system time change), and while it caught some subtle drift enough to show it was working, it never latched a change of more than a few ms.

At a loss, I added an absTime.seconds term to the readout/latch circuit, and knowing full well that the .toe was running for hours yesterday (with the circuit-latched last-good-frame time shown onscreen and advancing by several hours), returned this morning to a stall with a latched time of 43.783 seconds after program launch (quite possibly consistent with when I entered Perform mode).

My feedback loop only captures one frame of prior time data. The conclusion, then, seems unavoidable that some event coincident with the ZED feed failure is throwing TouchDesigner & Python’s whole timing stack back to near-launch, but only for a few frames. By morning, the live time is progressing accurately, with only the latch circuit holding the inexplicable timestamp. I have no idea whether this is due to the ZED crashing and disrupting TD’s update flow, or whether the ZED node itself is being tripped up by a sudden and drastic backwards delta in the perceived system time and stalling as a result. And other than perhaps cascade-chaining my time-latch logic back a few more frames, I don’t know where else to look for evidence.

Update: a chain of ~10 feedback nodes off the main Clock CHOP with a similar latch-Switch at the end suggests that this is not a 1-frame artifact, and the reversion to near-launch-time persists for at least ~1/6 second. I may need to fall back on more advanced scripting to actually capture the time and duration of the clock shift…

Hi @nndeveloper,

from your description it sounds like the network is not cooking when not viewed. To analyze why it’d be good to see your approach.

This attached workflow here should work though:

I’m using a Cache TOP and a Composite TOP with the Operation parameter set to “Difference” and a Analyze TOP that returns the “Sum” of all pixel values which in turn controls the initialize and start input of a Timer CHOP. With the Timer CHOP running the Done callback after 1 second of the video not changing, the Add Frame parameter of the Movie File Out TOP is pulsed which causes the input to the Movie File Out TOP to cook and calculate the imprinted timecode string of the Text TOP.

base_imgStaticTime.tox (9.7 KB)

Hope this helps
cheers
Markus

Yeah, save for the specific flavor of composition, that’s essentially my logic.
Source into a Diff against a Feedback of the Source; into Analyze (max) into Threshold, extracted to a CHOP into a switch that selects either the live time (if the source diff max is nonzero) or the feedback of the switch output (if the source diff max is zero), so essentially when the source stops changing, the time latches and holds and I in theory know when the camera stopped.

The problem is that I can watch the camera run and the time advance for literal hours, and then at the moment of failure every available representation of the system time (including absTime) jumps back to a value roughly correlating with when I entered Perform mode (give or take a minute or three).

Time is used all over the greater graph and continues as normal around/after the blip, to the extent that it has been hard to capture the specific non-perturbed time of failure. I’ve ultimately resorted to a couple Exec scripts and an ugly logging table DAT to latch

  • the highest absTime reached before the stall (thus far 12k-42k seconds),
  • the time to which absTime reverted at the point of stall (21-41 seconds), and
  • in theory, the first absTime in excess of the last-time-before-the-stall once the stall has been detected (which I haven’t been able to test to the point of confidence since I lost access to the symptomatic machine after only 1 night/failure on my latest logic; nonetheless, all prior not-quite-correct versions of the scripting indicate absTime is very much resuming, with triangulation between experiments suggesting time-disruption durations of between 1 and 10+ frames)

So yeah. Latching the theoretical time of failure is trivial. The bug is that at the time of failure the available representations of time have a nonsensical value, inconsistent with values already rendered to screen during Perform mode, which persists for one or more frames before returning to normal.

You might be on to something, though, with the idea that in Perform mode nodes work from an initial cached absTime until they’re cooked, and whatever causes or precipitates from the camera failure disrupts the cook to the point where CHOP/Exec logic on some independent update trigger runs before the cook occurs and sees the cold-cached time on the sourced nodes…

Hi @nndeveloper

the Feedback CHOP is something I would be skeptical of - can you give my solution a try and see if the cookchain stays intact?

cheers
Markus

Will give it a go. I’ve taken the liberty of wiring up inputs & outputs just so this can run blackbox alongside the remainder of the graph but still give me easy eyeballs on its status.

I suspect this will actually show the “correct” timecode not because it is using a different mechanism but because it will end up sampling the timecode a full second after the stall, by which time every other experiment shows absTime to be resolving correctly again. It has only been the instant surrounding the stall at which the time appears to revert.

1 Like

okay… I can only run the machine headless right now, and I think that’s yielding failures with some significantly different timings. But results at a glance seem to confirm my hypothesis:

  • my trap inferred a failure time of 2026.3.11 13:28:45pm after an absTime runtime of 12716 seconds, which blipped down to 2252 seconds for what smells like a single frame at the moment of failure before resuming with a time equal or higher than the failure time.
  • your tox was running and showing a time in the 00:00:02:xx range on the live output branch (consistent with a countup from when I exited Presentation mode, while I was checking my DAT) but had saved 2 jpg files, one showing 10:30 on a black square with a Windows timestamp of 10:07am, one showing 03:31:57:17 on the last good camera frame with a Windows timestamp of 1:28pm.

Hi @nndeveloper

sorry for not completely following, but sounds like it captures the failure correctly now as your solution and mine both seem to trigger after the same runtime?

cheers
Markus

Yes. I had already determined a way to capture the time of failure, it just needs to be roundabout in order to circumvent what still appears to be an edge-case bug in TD.

My solution effectively captures the correct time on the frame before the failure, yours captures the correct time a second after the failure once timing behavior is restored.

But on the exact frame that the ZED node/camera fails, every available representation of system or absTime, whether from the pack-in Clock node, a direct absTime inline ref, or a full Python Exec script using Python’s normal timing calls all seem to jump backwards, sometimes by several hours, to a timestamp of approximately (but not exactly) when TD entered perform mode.

I do not know, however, which is the cause and which is the effect. It is equally believable to me that when some other issue with the underlying ZED stack (external to TD) causes the ZED to crash, TD gets jammed up for a frame and a bunch of nodes/cooks don’t evaluate properly, including perhaps skipping over proper time updates OR that some deep and gnarly 1-frame runtime bug within TD throws a drastic frame-over-frame time delta and the ZED stack enters a bad state when all of its inference models are hit with a dt of minus several minutes/hours rather than plus a few ms.