Fast Blurs

Recently I built a bloom system using TD TOPs that worked decently but was a bit slow. So I thought I would rebuild it in C++ to get better performance, but on the path of building the Bloom system from scratch I did some research into different blur algorithms and came across some papers for some pretty fast blurs where you can use much bigger kernels than the standard Gaussian full sampling method with a much lower hit on the GPU.

In the component are 4 blurs:

  • the TD Blur for comparison
  • a separable 2 pass Gaussian blur using bilinear interpolation and offsets to get close to the actual Gaussian curve with half the samples.
  • a Kawase Blur which is a multipass blur that gets closer to Gaussian curve with each pass
  • Moving Average blur which uses a moving average algorithm to compute a box blur over multiple passes. Not so great for small blurs but nearly no cost increase with large kernel sizes.

fastBlurs.tox (4.64 MB)


Cheers! Adding blurs to a 21290x1080 canvas and watching my frame rate tank - this will be helpful!

Thanks for sharing !

Glad you guys might have a use for it.

One of things I like best about the Bi-linear Gaussian blur in the toe is that all the weights are pre-calculated so you can go from 1 to n samples with no frame drops (as long n samples is small enough for your GPU to main desired fps)

Very nice! Thanks for sharing!

Also interesting to see the speed up using texture arrays instead of texture buffers.
I noticed you were sending linear values as second channel in the weightOffsets, so thought
to be smart and only sending a float instead of a vec2 to even create more speed up. Turns out that doesnt really matter :slight_smile:

Super nice, definitely gonna be using this for gigantic blurs :slight_smile:


1 Like