Better method to sample a texture in various width blocks and find average color of each block?

I have 15 tubes all with different diameters. Each one will spin according to motion that is detected in optical flow. I need to split the optical flow texture into 15 blocks whose widths correspond to the diameter of each tube. I have a GLSL TOP (set to a resolution of 15x1) that does this, however it is super slow when optical flow is a high resolution (720p+).

uniform float uTotalWidth;
uniform samplerBuffer sWidth;

out vec4 fragColor;
void main()
{
    vec4 color = vec4(0.0,0.0,0.0,1.0);
    vec2 texRes = uTD2DInfos[0].res.zw; 

    float thisWidth = texelFetch(sWidth, int(gl_FragCoord.s - 0.5)).r;
    thisWidth / uTotalWidth;
    thisWidth *= texRes.x;
    float prevWidths = 0.0;

    for(int i = 0; i < gl_FragCoord.s - 0.5; i++){
        prevWidths += texelFetch(sWidth, i).r;
    }
    prevWidths /= uTotalWidth;
    prevWidths *= texRes.x;

    for(int x = 0; x < thisWidth; x++){
        for(int y = 0; y < texRes.y; y++){
            ivec2 sampleOffset = ivec2(x + prevWidths,y);
            color.r += texelFetch(sTD2DInputs[0], sampleOffset, 0).r;
        }
    }

    color.r /= (thisWidth * texRes.y);
    fragColor = TDOutputSwizzle(color);
}

It’s obvious why it’s slow, it’s sampling every pixel and then adding it all together. So if I put a Fit TOP or Res TOP before it and scale down the resolution, it performs better. What I’m wondering is there a way I can do it without a Fit/Res TOP? Or rather what’s the glsl behind those two TOPs so I can include it in my shader?

irregular_sampling.toe (40.0 KB)

The way the Analyze TOP does it is in multiple passes. It does a smaller group of pixels, 3x3 or 5x5, and outputs the average/min/max from that block and write it to one pixel. Then another pass does the same with the newer already averaged pixels to reduce it down further.
It’s possible a compute shader using group-local memory could be a fast solution too, but that’ll be much more complex to get right.

1 Like

So to use passes correctly, in each of the 15 sections/regions, I would divide it into 3x3 clusters and maybe designate the middle pixel as the one that stores the added/averaged color value, then on the second pass again make 3x3 clusters except with those middle pixels and then store the average color in the middle pixel of that (now that i’m writing it, it might be better to use the bottom left pixel as the one that stores the total average of all pixels just so it wouldn’t keep moving the pixel coordinate).

And then on the final pass, i average the remaining pixels and store that value in one of the 15x1 output pixels? Writing this shader might be more effort than it’s worth. I could see you having to test for overlap in sampling and maybe reduce edge clusters to 1x3. Also having to test when you can’t make 3x3 clusters anymore, because obviously bigger regions will need more passes to get the average color, while smaller regions will need a built in exception when it can’t be reduced anymore.

You’ll want to use a different node for each pass, and reduce the resolution down for each pass to only want needs to be output.

Why not try a replicator with 15 bases. Each base could have a customized crop TOP into an Analyze TOP.

That’s what I had before and it was consuming too many resources. I have to keep my project very thin and consolidate where I can. I’m going to try to use the Engine COMP where I can but so far not much luck, getting errors about insufficient samples.

Engine COMP isn’t a good solution for GPU operations anyways. It’s main goal is to gain access to more CPU cores, you only have one GPU so it’s more efficient to do all GPU work in the main process.

Yup. I’m mainly grouping my larger CHOP networks together into Engine COMPs. I managed to create one successfully with my audio processing, but there’s still other instances where I get insufficient samples and I have no idea why.