FIXED: GLSL TOP performance issue on Vulkan builds (NVIDIA GPU)

miu-lab · November 15, 2022, 10:45pm

Hi folks,

My first post is here.
I have a performance issue with GLSL TOP on Vulkan builds w/ NVIDIA GPU.
More info below.

Many thanks for your help !

Issue

GLSL TOP performance drop w/ NVIDIA GPU on Vulkan builds, see attached toe.

Tested Machines

Desktop

CPU : AMD Ryzen Threadripper 3970x
GPU : NVIDIA Geforce 3090 w/ 522.30 WHQL then 526.86 WHQL
RAM : 256Go

Laptop

CPU : AMD Ryzen 5800H
APU : 5800H w/ Radeon Graphics
RAM : 16Go

Tested Builds

Builds 2021.XXXXX

Tested w/ 2021.16410

Laptop - OpenGL + AMD APU : Performances are GOOD (around 7ms / frame)
Desktop - OpenGL + NVIDIA GPU : Performances are GOOD (around 0.9ms / frame)

Builds 2022.XXXXX

Tested w/ 2022.26590 and 2022.29850

Laptop - Vulkan + AMD APU : Performances are GOOD (around 5ms / frame)
Desktop - Vulkan + NVIDIA GPU : Performances are BAD (around 260ms / frame)

Project File

glsl_TOP_bug.toe (26.4 KB)

malcolm · November 16, 2022, 1:52am

Thanks, i’ll take a look

timgerritsen · November 16, 2022, 4:06pm

Hi there,

Just tested here on a nvidia gpu (4090) and also goes quite terribly slow :D. I did some investigations and I think it’s because of the amount of ‘switch’ branches in the shader. For some GPUs (as far as I understand it) branches are not so good for optimization. They run all branches first and after discard the ones that are not valid/accessed. This makes the shader run all the different noise algorithm all at once. For example if you return the _fnlGenNoiseSingle3D() immediately before the switch with for example a cll to the opensimplex algorithm, it seems to work super fast again.

I think it would be better to separate those different algorithms in different shaders, and using a switchTOP to select the one you need.

Also, a small side note, not sure if you are aware of the buildin glsl TDSimplexNoise() function, which gives you some nice simplex noise relatively fast.

Hope this helps.
Cheers,
tim

miu-lab · November 16, 2022, 9:43pm

Hi tim,
Thanks for your feedback and tips, i’ll check that !

Achim · November 17, 2022, 3:48pm

I didn’t expect this to still be an issue in 2022.

Does anyone have a list which GPUs “properly” support branches, I.e. without running all of them ?

timgerritsen · November 17, 2022, 4:05pm

To be honest I thought the same :D. I just realized by debugging the code of miu-lab. (when not doing the switch it suddenly runs super fast). Might also be a vulkan thing since the opengl version seems to run faster. So perhaps Malcolm can enlighten us

malcolm · November 18, 2022, 1:06am

I don’t think any modern GPUs will run all branches. Within a group of 32 or 64 pixels (AMD is 64), all of them need to take the same branch for full speed of that part of the code. Otherwise each divergent branch will also need to be taken, in serial (no more multi-GPU-core work) and then parallel work can continue once all the divergent branches have been resolved.
Yes, it’s likely a Vulkan bug in either the driver or TD, I’ll determine soon.

malcolm · November 21, 2022, 8:59pm

This does seem like an Nvidia bug, I’ve reported it to them.
One thing you should consider doing with Vulkan though, is make use of:

To assign some of these different types. It should improve performance is most cases, even after this bug is fixed.

malcolm · January 30, 2023, 7:21pm

Thanks for this report and the sample. Nvidia has fixed this bug and rolled it into upcoming drivers 530.89 and later.

I’ll mention those that the specialization constant workflow is at least 2x as fast than using uniforms for these branches, so you should definitely also consider starting to use those for shader ‘modes’.

malcolm · February 21, 2023, 7:58pm

Sorry, one update. I don’t think the full speedup will be until the 535.xx series. The 530 (and even some newer 528.xx versions) will be far faster than they are now, but still quite slow compared to the AMD or Nvidia-OpenGL versions (7ms or so on my machine).

miu-lab · February 21, 2023, 8:20pm

Thank you Malcolm for your feedback. The solutions you proposed work as they are, but it’s always good to know that you’re on top of it! Furthermore, I opened a new topic yesterday regarding another issue I encountered in the compute shaders with NVIDIA’s hardware derivative functions, they seem not to be working on my end. Thank you for your help!