Particle system using a compute shader

Hi there,

Here is a small example of a particle system using a compute shader.
Uploaded for ‎Georgios Cherouvim‎ to demonstrate.

There are not much comments in the code, my sincere apologies.
If you have any questions, let me know :slight_smile:


New version including comments & billboard renderer here: … ute-shader
computeParticles.tox (7.04 KB)


Hello good day, very good contribution.
What programming language is created?
Will you have online tutorials so you can practice and create this type of systems through codes?

Thank you

Hi there,
The compute shader that calculates the positions and directions of all particles is written in GLSL. I haven’t made any tutorials about it (yet). I might do so in the near future.

To start with writing compute shaders I suggest to check the following urls: … a_GLSL_TOP


Thank you! This is a clear and helpful example. I noticed that the resolution of the shader can be optimized a little further by finding a resolution that covers the desired number of particles but isn’t necessarily square. Suppose you want 8 particles, and each particle needs 4 pixels. Right now you’re saying that’s 8*4=32 total pixels. Ceil(sqrt(32))=6 pixels, so a 6x6 texture would cover the 8 desired particles. More efficiently, the compute shader could be 4x8. It’s pretty minor but why not if it works without messing up anything else (I hope…). Thanks again for sharing!
computeParticles_2.tox (8.07 KB)

1 Like

@davidbraun, great!! Didn’t think about that, but this makes it also easier to communicate the amount of particles to other shaders. Was solving this the whole time by sending the real amount of particles using an uniform.

Thank you : D

Hey @timgerritsen, thanks again for sharing that example a while back. I learned a lot and here is what I’ve done so far. … 22&t=11384

Since those are 2d Particles, I reduce the data size from 4 to 3, but I am still a little unclear of how the dispatchSize and the layout(local_size) do and how they work together. Am I right to assume that it has something to do with how the texture is divided to be multi-threaded on the graphics card?

hey @ch3,

When building a compute shader, you are in control on how many parallel ‘processes’ are
ran (dispatched). One dispatched ‘process’ will then run your shader code ‘local_size’-times.
So instead of that it’s bound to the resolution of your output image (with a normal pixel
shader), it now can be different.

For example, if you have a dispatched size of 32x32 and a local size of 16x16, it means your
shader code will be run 1024 * 256 = 262144 times. If you would put a pixel at the
coordinates of gl_GlobalInvocationID, it will only fill an area of 512x512 pixels, even though
your TOP could have a resolution of 1024x1024.

So what I did with this particle effect shader is to set the dispatched * localsize to the
amount of particles. Then in the code it adds multiple pixels (4 * 3 floats) to the output
buffer. So I set the output resolution to 4 times larger than the dispatch size * localsize.

The reason this is split, is because of the memory management as far as I understand.
Within the local group you can access local shared memory. So there is a relatively fast way
to ‘communicate’ with your other instances of that localgroup.
I haven’t played with this much yet, so not completely sure how this works :slight_smile:

Hope I make a bit of sense, if not check out this page:
And try to play with the build-in input variables By plotting them to the output buffer and
see what they do:

gl_NumWorkGroups, gl_WorkGroupSize, gl_WorkGroupID,
gl_LocalInvocationID, gl_GlobalInvocationID, gl_LocalInvocationIndex.

For example to get the uv coordinate of the current instance:
vec2 uv = gl_GlobalInvocationID / vec2(gl_NumWorkGroups * gl_WorkGroupSize);


1 Like

Here is my version:
now you can define the shape of the floor by a TOP.

1 Like

Great Simone! you inspired me to rebuild the example and using instancing instead of billboards.
Uploaded as a community post here:


Hey Tim. I was looking at your Compute Particles file and curious about some stuff in your compute shader file. For instance, in your Read and Write functions you write different data (pos, vel, col) to the same imageLoad function four times (mTDComputeOutputs[0]). I would have thought this would simply overwrite the previous data, and/or cause error. Why is this not the case?

Similarly you only have three TOPs input into the glsl Shader, yet you use #define in a confusing way as well. Again I would have thought ‘#define TEX_COLOR 0’ would conflict with ‘#define DATA_POSITION 0’ , but obviously I am wrong.
I Would greatly appreciate helping me understand.

Hi electro666,

It looks a bit confusing indeed, but the trick is the IndexToXY() function. The second argument of imageLoad(sampler, ivec2) is a pixel coordinate to load from. I got used to the way to store all information of a particle inside 1 TOP instead of having multiple tops. This is done by assigning multiple pixels to a particle. IndexToXY() will convert the particle id & the type of data to a pixel coordinate.

Like for example pixel 0,0 is the position data of the first particle, pixel 1,0 the velocity of that particle, 2,0 → the color, 3,0 → some other data (birth data and mass). So every particle has 4 pixels of data assigned to them. All stored in 1 TOP.

The defines is something I got used to so I don’t get confused what is what. DATA_* just describes which pixel we’re talking about, TEX_* describes the input textures. Those are 2 different things. (notice TEX_* is only used in something like sTD2DInputs[TEX_…] and the DATA_ tihings only in the IndexToXY() function.)

Hope this makes sense.


Thank you so much for your quick response Tim. I had a feeling that’s what you what you were doing, but I have never seen anyone store all the info in one texture and use offsets to read/write it. very interesting.

I will have to revisit the ‘#define’ thing, as I thought you were creating space for a texture or array, but you are literally just naming an integer to use as a name for clarity.

Yes indeed, you can see those #define statements as like a search/replace on compile time.

About the saving all the data in 1 TOP, I personally like this approach since it’s very easy to extend a particle with extra data. No need to add multiple feedback TOPs etc.
Though I’m not 100% sure this is the correct way to go, it might be that it gives some unexpected behavior when you start reading out of a different invocation block. As in a pixel being read before it’s been updated.
Since @vinz99 mentioned this, I started to use a feedbackTOP instead (still saving all in 1 top, just not using the imageLoad() but a texelFetch(sTD2DInputs[0])),


dig the idea of storing it all in 1 texture vs multiple! creating new textures for new attributes is always a pain to setup.

1 Like

yea it does make a certain kind of sense, although it may make the resolution math a bit trickier. As seen by DBrauns script. :wink:

1 Like

yes true, and I think I read somewhere it’s best to have a dispatch size as a multiple (or power even?) of 2. But not sure about that, since it seems to work without :slight_smile:

Still using davids algorithm to find the best resolutions a lot. (thanks @DavidBraun :)), I’m using it in a scriptCHOP in my palette, Since it’s only running once when the particle system initialized, I don’t mind the complexity that much. Like @lucasm mentioned, the pain to setup new attributes vs the somewhat hacky way to set it up, I would choose the last one :smiley: