Compute Shader atomic operations

Background info:
https://www.khronos.org/opengl/wiki/Atomic_Counter#Operations

I think that atomicCounterIncrement and atomicCounterAdd work correctly, but I’m not sure atomicCounter works correctly.

atomicCounter(pCount) is my attempt to get the current state of the atomic uint pCount, but it seems to always be zero even though atomicCounterIncrement increments it later in the code.

I’m also not sure atomicCounterDecrement and atomicCounterSubtract work correctly.compute_shader_atomic_questions.tox (1.5 KB)

Any advice is appreciated.

Hey david,

I’m definitely no expert in those atomic operations, so dont take my advice too serious :slight_smile:
In your first example you are setting the atomic counter initially to 1000, then adding one using the increment and use that as texture coordinate. This is a pixel way out of the output buffer so that’s why it appears its 0 (if you just plot the color value to 0,0 you’ll see it’s 1000)

Other thing that is happening I think is that those atomic operations are not completely ‘synced’. As in if you for example run 2 times the atomicCounterIncrement() on the same counter, you get unexpected random behaviour. (add for example an absTime.seconds as uniform so it cooks every frame, the pixels will jump all over the place)
Maybe (not sure) there is some kind of barrier function to wait for all dispatches to sync.

I used it once to make a ‘sumTOP’, but in the end I still had sometimes random output values. So I guess I dont fully understand it as well… so following this post :slight_smile:

cheers,
tim

Here is this sumTOP using atomicCounterAdd(), it seems to work, so might be old nightmare I had :smiley:

SumCompute.tox (1.1 KB)

Thanks for checking it now and for catching that mistake. I meant for it to be initialized at 0 for all the examples. Even when it’s initialized at zero in the new tox I uploaded, the result for atomicCounter is different than for atomicCounterIncrement.

I think you’d be interested in the goal I’m trying to achieve. I have a buffer of particles (x, y, brightness).
For every particle in the buffer there are 3 situations:
The particle may be too “bright” → “kill” the particle by NOT copying it to the new buffer (and not incrementing pCount)
The particle is nicely “bright” → copy the particle over to the new buffer and increment pCount
The particle is too dim → place two new particles in the new buffer with positions similar to this particle and increment pCount twice total.

So I’m trying to use pCount and I don’t necessarily want to increment it on every thread.

This other tox is like a simpler case: a random shuffle from one buffer into the other. But again, only the atomicCounterIncrement way workscompute_shader_random_shuffle.tox (1.2 KB)

Update:
I was able to write the code a little differently and avoid solving this problem. I only call increment if I really need to. I don’t just automatically do it in the middle of the main function. If I want to “kill” the particle, increment hasn’t been called yet.

Hey david,

Was just playing with it some more and I realized I truely dont understand what is happening :). See attached tox, if you update the compute shader every frame, the value of the atomic counter wont be the same. It feels we are missing something.
I saw @vinz99 using the ImageAtomicAdd() function, haven’t played around with it yet, but perhaps that gives some more understandable results :smiley:
(imageAtomicAdd - OpenGL 4 Reference Pages)

When solving your problem, did you still use an atomic counter? I’m not sure I’m following what you are trying to do with this pCount. In your description it seems like pCount will either incremented once or twice every frame, which will result to infinity over time.

compute_shader_random_shuffle_over_time.tox (1.7 KB)

Yes I’m using atomic counters in this section of my project. pCount gets reset to zero every frame. At the end of the frame, it’s ideally about the same as the number of pixels in the image. So some particles get deleted (not copied over), some get copied over as is, some get copied twice. Hopefully you end up with a number of particles equal to what the size of the buffer can handle. If pCount goes above that, which is possible, don’t write at all, just return in main.

What I implemented was “Weighted Linde-Buzo-gray Stippling” https://kops.uni-konstanz.de/bitstream/handle/123456789/41075/Deussen_2-gu29mv4u87jh2.pdf;jsessionid=CBA553E15000B0419FF1BD6A5FB9B1A9?sequence=1

One step is getting a voronoi diagram rendered at a big resolution like 1280x720 or 1920x1080. You can do this with either the cone method or jump-flood-algorithm. Every pixel needs to be an integer indicating which voronoi seed is closest. No anti-aliasing is allowed.

Next step is a shader with ImageAtomicAdd. The number of invocations needs to be the size of the input image, so 1280x720=921600. The pixel dimensions needs to be the number of voronoi seeds*3. This is because ImageAtomicAdd only works on integers, not vec4, and we need 3 numbers calculated for each seed. Those numbers are the centroid x, the centroid y, and the overall density. So for every pixel in the input image, visit it in the compute shader, determine what index voronoi cell it is and do some stuff to determine what to store in the output pixels.

In your example, if you put an analyze TOP set to average and then multiply the red channel by 16 (number of pixels) you get about the same sum as what the compute shader reports.

aah that looks interesting indeed! Have to read the paper to fully understand it I think, but very curious to see the end result! So did you manage to use this atomic counter in a stable way?
In the end I’m using the analyzeTOP * amount of pixels indeed, but what I tried to achieve was a more optimized way of counting the pixels (used in TDNeuron to sum up big matrices). So if you do found a way to use it in a stable way, it would help me alot :slight_smile:

Well I still don’t understand the function atomicCounter(pCount) because it doesn’t work how I expected. atomicCounterIncrement(pCount) works the way I want it to, fortunately.

Hello,

Sorry, late to the atomic counters party!
Besides the ImageAtomicAdd() function I had also been using the atomicCounterIncrement() function with success.
I think the atomicCounter() function had also left me baffled but I took another look, it seems despite what Atomic Counter - OpenGL Wiki says, just reading the value is not an atomic operation and depends on what has been completed in the different threads.

(Interestingly this atomicCounter - OpenGL 4 Reference Pages doesn’t say it reads the value atomically, but this does atomicCounterIncrement - OpenGL 4 Reference Pages)

It seems calling it after the atomicCounterIncrement() and after a memoryBarrier() (after " You will need to ensure internal visibility if you want to use ordering guarantees within a rendering command." ) gives the final result of the counter for all the threads.

Strangely enough it still seems to give unstable results with atomicCounterAdd() after a texelFetch(), as in Tim’s sum example. Might have to do with the fact that you can’t sync different groups no matter what (barrier() only works within the workgroup)

And about David’s question doing a decrement after the imageStore(), it seems all the increments are completed by the various threads before the write, so anything after doesn’t matter.

And doing a decrement just after an increment seems to give random results because of a race condition.
Adding memoryBarrier() gives a more predictable outcome but since all the increments would then be complete before the first decrement, the counter still goes from 0 to max overall.

See attached tox for various tests compute_shader_atomic_questions_workgroup_sync_issues.1.toe (5.6 KB)

So in short it seems the only reliable thing is one increment per thread and the rest is kinda confusing ;D

Some interesting insights:

https://rauwendaal.net/2013/07/03/atomiccounters-indirectbuffercommands/

In all case it seems atomic counters are going away with vulkan, only imageAtomic operations remain.

2 Likes

Also @timgerritsen, probably what you want is the parallel prefix sum algorithm Prefix sum - Wikipedia, there’s a sample with the opengl superbible samples GitHub - openglsuperbible/sb7code: Source code and supporting material for the 7th Edition of OpenGL SuperBible

Might be easier to wrap the thrust functions in a cuda TOP :wink:
https://thrust.github.io/doc/group__prefixsums.html

2 Likes

vincent, wow, thanks for this valuable information! :slight_smile:
gonna dive into it later today!

cheers,
tim

Is imageAtomicCompSwap supported in TD?

Here’s my compute shader with this preprocess directive:
#extension GL_NV_shader_atomic_float : enable

layout (local_size_x = 1, local_size_y = 1) in;
void main()
{
    // this works:
    // imageAtomicAdd(mTDComputeOutputs[0], ivec2(1,1), int(5));

    // can either of these two work?
    imageAtomicCompSwap(mTDComputeOutputs[0], ivec2(1,1), uint(0), uint(5));
    //imageAtomicCompSwap(mTDComputeOutputs[0], ivec2(1,1), int(0), int(5));
}

The error:

error C1115: unable to find compatible overloaded function "imageAtomicCompSwap(struct image2D1x32_bindless, ivec2, uint, uint)"

imageAtomicCompSwap.tox (814 Bytes)

Since we don’t support integer textures in TD, all of our textures are regular image2D, not uimage2D. Usally atomics only work on integer image formats.

The GL_NV_shader_atomic_float adds functions to work on float textures, such as imageAtomicAdd. However, imageAtomicCompSwap is not one of the ones this extension adds float support to.

https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_atomic_float.txt

1 Like

Thanks for explaining. Your info helped me find that imageAtomicExchange does work, which seems sufficient for what I need.

There might be a minor problem with imageAtomicAdd in the macOS Vulkan build. I’ve tested macOS (2022.21460) and Windows (2022.20150) with this and get the same error
imageAtomicAdd_test.tox (878 Bytes)

Compute Shader Compile Results: 
WARNING: /project1/Boids/boids_gpu_simulation/get_bins/imageAtomicAdd_test/null_preprocessor_directives:1: '#extension' : extension not supported: GL_NV_shader_atomic_float
WARNING: /project1/Boids/boids_gpu_simulation/get_bins/imageAtomicAdd_test/null_preprocessor_directives:1: '#extension' : extension not supported: GL_NV_shader_atomic_float
ERROR: /project1/Boids/boids_gpu_simulation/get_bins/imageAtomicAdd_test/glslmulti1_compute:7: 'imageAtomicAdd' : required extension not requested: GL_EXT_shader_atomic_float
ERROR: 1 compilation errors.  No code generated.

The shader has read-write output access, is 32-bit float mono and contains
int result = int(imageAtomicAdd(mTDComputeOutputs[0], ivec2(0,0), uint(1)));

The preprocessor directive is
#extension GL_NV_shader_atomic_float : enable

Using this preprocessor
#extension GL_EXT_shader_atomic_float : enable
fixes it all on Windows but crashes on macOS.

So it’s not a problem that I need to change the preprocessor on Windows, but macOS seems to not work.

As far as I know atomic floats arn’t supported on macOS. So while it shouldn’t crash (I’ll look into that), it won’t work in the end anyways.

Darn, ok thanks for checking.