Just tested here on a nvidia gpu (4090) and also goes quite terribly slow :D. I did some investigations and I think it’s because of the amount of ‘switch’ branches in the shader. For some GPUs (as far as I understand it) branches are not so good for optimization. They run all branches first and after discard the ones that are not valid/accessed. This makes the shader run all the different noise algorithm all at once. For example if you return the _fnlGenNoiseSingle3D() immediately before the switch with for example a cll to the opensimplex algorithm, it seems to work super fast again.
I think it would be better to separate those different algorithms in different shaders, and using a switchTOP to select the one you need.
Also, a small side note, not sure if you are aware of the buildin glsl TDSimplexNoise() function, which gives you some nice simplex noise relatively fast.
To be honest I thought the same :D. I just realized by debugging the code of miu-lab. (when not doing the switch it suddenly runs super fast). Might also be a vulkan thing since the opengl version seems to run faster. So perhaps Malcolm can enlighten us
I don’t think any modern GPUs will run all branches. Within a group of 32 or 64 pixels (AMD is 64), all of them need to take the same branch for full speed of that part of the code. Otherwise each divergent branch will also need to be taken, in serial (no more multi-GPU-core work) and then parallel work can continue once all the divergent branches have been resolved.
Yes, it’s likely a Vulkan bug in either the driver or TD, I’ll determine soon.
Thanks for this report and the sample. Nvidia has fixed this bug and rolled it into upcoming drivers 530.89 and later.
I’ll mention those that the specialization constant workflow is at least 2x as fast than using uniforms for these branches, so you should definitely also consider starting to use those for shader ‘modes’.
Sorry, one update. I don’t think the full speedup will be until the 535.xx series. The 530 (and even some newer 528.xx versions) will be far faster than they are now, but still quite slow compared to the AMD or Nvidia-OpenGL versions (7ms or so on my machine).
Thank you Malcolm for your feedback. The solutions you proposed work as they are, but it’s always good to know that you’re on top of it! Furthermore, I opened a new topic yesterday regarding another issue I encountered in the compute shaders with NVIDIA’s hardware derivative functions, they seem not to be working on my end. Thank you for your help!