I’m trying to get texture memory working in the CUDA example with no success.
It compiles but gives me 0 value pixels.
The wiki is not really clear :
It says the memory needs to be allocated as an array, which is done by setting memType to TCUDA_MEM_TYPE_ARRAY;
The nvidia progamming guide says it doesn’t have to be an array.
“A texture can be any region of linear memory or a CUDA array.”
From what I understand I need to use cudaBindTexture() if it’s linear memory, and cudaBindTextureToArray() if it’s an array, but this requires a cudaArray.
I tried both, cudaBindTextureToArray() with casting topParam->data to a (cudaArray*), but neither work.
Else I have
texture<float4, 1, cudaReadModeElementType> texIn;
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 32, 32, 32, cudaChannelFormatKindFloat);
and float4 p = tex1Dfetch(texIn,y * inw + x); in the kernel.
I’m not sure what Touch does and doesn’t.
Does anyone has a working example ?
Thanks a lot,
Vincent
ps : I’ve attached one of my unsuccessful try. The faulty kernel is sampleTextureKernel, the others where I was testing swizzling do work. cudaTOPTemplate_texture.zip (361 KB)
then you should be able to pass matrix_d into your cuda kernel.
sampleKerneld <<< grid, block >>> (matrix_d, …
or if you want to populate the array on the cpu you could do something like
float matrix[widthwidth3];
//create the pointer to the device mem
float matrix_d;
//fill in your array on the host
matrix[x] = y …
//copy the host data to the device
cudaMalloc((void*)&matrix_d,sizeof(float)widthwidth3);
cudaMemcpy(matrix_d,matrix,sizeof(float)widthwidth3,cudaMemcpyHostToDevice);
// run your kernel
sampleKerneld <<< grid, block >>> (matrix_d, …
//and copy back to host
cudaMemcpy(matrix,matrix_d,sizeof(float)widthwidth*3,cudaMemcpyDeviceToHost);
not sure if that helps. probably slower than keeping data on the device
Thanks for the reply, but it’s not what I’m looking for, texture binding allows to fetch values much like texture2d() calls in fragment shaders. There’s supposed to be increased performance for 2d spatial locality, and it’s closer to the fragment shader paradigm so it would be easier to port some glsl code I have!
Yes I want to stay on the device as much as possible.
I found your old thread asking for a cuda sop or vbo, indeed that would be nice.
For now you can always fetch values from a cuda top in a vertex shader to move particles or vertices but directly drawing from the vbo would be more convenient.
Hey, I’ll elaborate on that wiki note some more, seems like I misread the CUDA documentation when I wrote that.
For 1D textures you can’t use ARRAY type, it needs to be linear memory. However your size calculation is incorrect. It’s widthheight4, but it needs to be widhtheight4*sizeof(float). The 4 is the number of components per pixel.
For 2D textures you can use either array or linear, but you need to use cudaBindTexture2D if you are using linear, and make sure you calculate the pitch correctly (width4sizeof(float)), assume 4 component and floating point data.