Recently I built a bloom system using TD TOPs that worked decently but was a bit slow. So I thought I would rebuild it in C++ to get better performance, but on the path of building the Bloom system from scratch I did some research into different blur algorithms and came across some papers for some pretty fast blurs where you can use much bigger kernels than the standard Gaussian full sampling method with a much lower hit on the GPU.
In the component are 4 blurs:
- the TD Blur for comparison
- a separable 2 pass Gaussian blur using bilinear interpolation and offsets to get close to the actual Gaussian curve with half the samples.
- a Kawase Blur which is a multipass blur that gets closer to Gaussian curve with each pass
- Moving Average blur which uses a moving average algorithm to compute a box blur over multiple passes. Not so great for small blurs but nearly no cost increase with large kernel sizes.
fastBlurs.tox (4.64 MB)