Guidlines to create gpu context cpp node with copy to system ram option

Before I`m starting my effort to create the GPU node in the diagram,
I hope maybe to get some guidelines.

I want to receive 3 inputs:
Matrices (cpu) and tens of textures RGB + Mono types which already exist in gpu memory.
this data will be fed into already made custom cuda dll inside gpu sop container.

the output is Mesh and texture ,still all in gpu context.

1.it will be possible to pass pointers of the tops in gpu memory directly to the sop container and use for
the cuda based dll ?
( at the moment i`m copy it back to the system ram and again sends to cuda)

  1. I will have again to output synchronously the texture and the mesh ( asked before ) but any tips for that?

  2. Assume other than rendering the mesh with the texture which already exist in the gpu.
    How do I copy the data back to system memory to save it on system storage ?
    (I know cuda allows it - but there are any restrictions in TD ?)

Thanks for any idea,tip or guideline.

Barak.