Hi Malcolm,
I’m building a Custom POP that runs an object-detection model (RF-DETR, ONNX → TensorRT) on the contents of a TOP and emits one POP point per detection with bbox / score / class_id attributes.
The POP architecture is a great fit for this.
The bottleneck is the input read.
From a Custom POP, the only way I can find to access a TOP’s pixels is OP_TOPInput::downloadTexture(), which is a GPU→CPU readback. At 1280×720 RGBA8 that’s ~3-5 ms per cook before any inference happens.
After that I do a CPU bilinear resize + ImageNet normalize to feed the model, which is another ~3-5 ms. Net cook is ~11 ms even though the TRT inference itself is 4 ms.
What’s already in td_headers:
- OP_TOPInput::getCUDAArray() exists (CPlusPlus_Common.h:971) but the comment says it can only be called from a C++ TOP / Custom TOP in TOP_ExecuteMode::CUDA, otherwise it returns nullptr.
- OP_Context::beginCUDAOperations / endCUDAOperations are already on the base context that POPs derive from.
- POPs can already output CUDA buffers via POP_BufferLocation::CUDA (POP_CPlusPlusBase.h:123).
So the input side is the only piece missing for a fully GPU-resident pipeline inside a POP.
Request: allow Custom POPs to call OP_TOPInput::getCUDAArray() — either by lifting the TOP-only restriction, or by adding a POP_ExecuteMode::CUDA (or a getGeneralInfo flag) that opts the POP into the same begin/end CUDA bracket TOPs use today.
Use case is exactly the same as a Custom TOP doing CUDA: zero-copy access to the TOP texture, then run kernels / ORT IoBinding / TRT against it.
Am i correct or is there already a way to to access the top pixel as cuda array in a pop?
I can imagine it’s something you are already doing for some POP native node.
I’m using TouchDesigner 2025.32460 on Windows 11.
Cheers,
Colas