How to make use of the new cudaMemory() method in the Python TOP class?

Did anybody play with the recently added cudaMemory() method in the Python TOP class.
It gives a me a pointer to and the size of the raw CUDA memory block containing the TOP’s content, now I’m a bit unsure how to convert that raw CUDA memory block into a valid CuPy array or OpenCV UMat.

3 Likes

Wow that’s exciting. Did you try the cuda GPU mat class from OpenCV? https://docs.opencv.org/3.4/d0/d60/classcv_1_1cuda_1_1GpuMat.html I think you’d want to have a preallocated GPU mat of the right size that gets reused frame to frame. Then you’d use some library like pycuda to cudaMem2DarrayCopy (I forgot the exact name) from touchdesigner’s cuda array into the GPU mat’s ptr.

1 Like

I would think you can also use the constructors that take a data pointer and make the GpuMat point to the memory directly. You just need to make sure your CUDAMemory object stays around for the life of the GpuMat, since the CUDA memory will be deallocated when the CUDAMemory object’s reference count goes to 0.

thanks guys!
So far I managed to make it work in cupy, without copying anything. So that’s supercool already.

But to use it in OpenCV and GpuMat I realized I first have to recompile OpenCV & opencv-contrib to include CUDA support. So I’ll continue my GpuMat attempts after this recompile mission…

It seems OpenCV is shifting to OpenCL as they have moved all CUDA our of core and to opencv-contrib? I would love to use the UMat in OpenCV as that is part of their ‘Transparent API’ so all opencv functions that work with a Mat work with an UMat, but the UMat is OpenCL only, and I see no option to reach CUDA memory from OpenCL or vice versa, so I see no way to use the UMat with the TouchDesigner cudaMemory() method - I guess the only way would be if @malcolm feels inclined to add an additional openCLMemory() method to the TOP class to write the contents to raw OpenCL memory :grin:

Last time I used OpenCL I determined the interop with OpenGL was terrible, basically requiring a full pipeline stall to sync the two APIs. Maybe things are better with Vulkan though? Fingers crossed.

1 Like

I tried using cupy.cuda.UnownedMemory in a separate python shell and passing in the pointer and size returned from cudaMemory(). The initialization doesn’t fail, but my attempts to actually get any data out are. However, I’m not very familiar with cuda and cupy, maybe someone else has an idea of where to go from here?

Do you mean a separate Python process? CUDA Memory pointers can’t be shared between processes like that.

1 Like

Ah, thanks for the clarification on my mistake, Malcolm. I was crossing my fingers that the raw pointer would work. Guessing there’s per-process virtualized memory address table for gpu memory, similar to cpu memory?

Ya, exactly. There are ways of doing cross-process communication with CUDA memory though, you’d have to check the docs for cupy to see what they’ve exposed.

1 Like

So, could you please tell how to access to that CudaMemory block?)

1 Like

So here’s a working demo, this uses CuPy which you will need to install first: https://docs.cupy.dev/en/stable/install.html#install-cupy

This example:

  1. copies the ‘source’ TOP every frame to raw CUDA memory using its cudaMemory() method.

  2. then read that raw CUDA memory block into a CuPy array (on the GPU)

  3. just for fun, flips the CuPy array/texture vertically on the GPU

  4. then copies the texture from GPU to CPU memory, into a numpy array

  5. then writes the CPU numpy array to the Script TOP

cheers, Idzard

copy_via_CUDA_to_CuPy_to_CPU_to_Script_TOP.tox (1.7 KB)

1 Like

I’ve just made some quick tests!
It’s really amazing: using that method, even converting to NumPy array on CPU is much faster, then using native ScriptTOP .numpyArray() method, like 2ms against 8/9ms.
I think if there is a way. to upload GPU array directly to GPU_Mat, it could be even faster!
But I haven’t find a way yet…

I’ve had no luck finding a way for me to return a GpuMat yet. You can do a CUDA memcpy from a Cupy object to a GpuMat, I think.

I tried again to create a GpuMat using a CUDA pointer but failed - this looks like a limitation in the OpenCV Python bindings. According to the docs you can create a GpuMat using a CUDA pointer https://docs.opencv.org/4.5.0/d0/d60/classcv_1_1cuda_1_1GpuMat.html
But if I check in Python what’s available by doing help(cv2.cuda_GpuMat) it looks like those methods are not exposed to Python.

Yeah I’m finding the same thing. I’m thinking you need to use cupy to do cudaMemcpy from the pointer I return to the GpuMat you allocate yourself

thanks for the tip about cudaMemcpy @malcolm . Could not find anything like that in cupy but found a similar method in PyCUDA and got it working with OpenCV 4.5 :grinning:

This example uses at least 2020.44130 (experimental release) and PyCUDA which you will need to install first (“pip install pycuda” should do it).

======================================

This example:

  1. copies the ‘source’ TOP every frame to raw CUDA memory using its cudaMemory() method.

  2. creates a OpenCV GpuMat, preallocates the correct size and type

  3. then copies the raw CUDA memory from 1) to the CUDA address of the GpuMat using pycuda.driver.memcpy_dtod

  4. to prove we were there, it does a threshold filter using OpenCV Cuda on the image

  5. then copies the CUDA memory from the OpenCV GpuMat to the Script TOP using the latest and greatest copyCUDAMemory() in latest experimental !!

copy_TOP_CudaMem_to_OpenCV_GpuMat_and_back_to_TD_using_copyCUDAMemory.tox (1.7 KB)

cheers Idzard

4 Likes

Hey @nettoyeur,

after pip installing PyCuda, I get

AttributeError: module 'pycuda' has no attribute 'driver'
Results of run operation resulted in exception.

is this something you have come across before?
Cheers
Markus

hmm I see the same error now also on my machine. Sorry about this @snaut . It has something to do with cuda initialization which I missed in my posted source code.

PyCuda needs to be initialized by a

import pycuda.autoinit

at the top of your script.
Unfortunately this creating of a Cuda context causes the TD op('source').cudaMemory() method to fail.
Interestingly enough if you now delete the import pycuda.autoinit line, and restart TD, now everything works. This is why I had not added this missing line to my code, I thought it was not needed after all. But only until you restart your computer I guess(?)…Back to the drawing board!

I’ve faced the same problem but I’ve solved it like this:

  1. Downloaded precompiled pycuda binary from that page: https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycuda

  2. Reinstall CUDA Toolkit according to that version (10.2 in my case)

  3. Reinstall pycuda from downloaded .whl (use --force-reinstall argument)

  4. In the code instead of ''import pycuda" I’ve put “import pycuda.driver as cuda_drv”

and basically that’s it!

2 Likes

Excuse me, I have a question about string:

gpuMat = cv2.cuda_GpuMat((src.shape.width,src.shape.height),24)

What does ‘‘24’’ argument means?