Shader recompile freezing with RayTK

In RayTK, many types of changes cause the generated shader to change.
Whenever the shader changes, there’s a pretty significant hard freeze of every aspect of TD.

It scales with the complexity of the network and the size of the generated shader.

I suspect it’s the GPU driver recompiling the new shader. I’m able to reproduce the stall by locking the final DAT and making a trivial change. That means that nothing else upstream is causing it, which includes all of the code generation, parameter processing, etc.

Does anyone have any suggestions for ways to reduce that time?
Reducing the amount of code overall helps, but many of the techniques that I’ve used to do that have other side effects.

For example, for most operators that have a bunch of different modes, switchable with a menu parameter, it will only inject the code for the currently selected mode. That reduces the complexity overall, but it means whenever you change the parameter it has to do a recompile.

I’ve started to switch some of those over to using runtime switching based on a uniform. So when the parameter changes it doesn’t have to recompile. But it means that all of the code to support every possible option gets added in.

There’s also pretty heavy usage of preprocessor macros in many areas. I’ve looked into applying the preprocessing in a separate, python-based step before passing the code to the GPU driver, but it would definitely add a lot of complexity to the system.
Another option would be to pass the code through an optimizer of some sort.

1 Like

Hi Tekt, Good to meet you at the IIHQ Championship. Did you use “probe” in the palette to watch what gets heavily cooked - CPU and CPU - when you change things? Is it all in the Render TOP?

Almost all of the work happens in a single GLSL Multi TOP (at least while running normally). When the freeze happens it seems to completely lock up the TD process, so that Windows treats it as non-responsive until it resumes.

Here’s a screen recording where I use the probe while making a trivial change to the shader code, first with CPU time and then with GPU time selected.

Example with some notes:

Note that the example project is using the “compiled” version of the components, so it isn’t necessarily a good way to explore how the toolkit works, but it does show how a single change to the shader text triggers the freeze.

I’d love to talk through the design of the toolkit with you at some point. It’s possible that there’s some fundamental issue with the approach that it takes, but I’m hoping there are some ways to reduce the freezing without a total restructuring.

I’ve been experimenting with a few ways to improve the recompile performance.

One theory is that it’s just the sheer volume of code sent to the GPU, so I wrote an alternate preprocessor that filters the code vs using preprocessor macros. Essentially it’s doing the same thing as all the #if but in Python before it gets sent to the GPU. It seems to sort of help in some cases, but it isn’t that much of an improvement.

I’m also optimizing the Python involved in building the shader, but it seems like that stage is only a small fraction of the time involved in a recompile.

The main issue is the “Linking shader” item that shows up in Performance Monitor.

I tried switching to Python-based inlining of “#include” directives, but that didn’t seem to change much, so I’m assuming it means “linking” in the compiler sense rather than preprocessor.

Does anyone know of ways to improve that linking time?

Yeah, there is a LOT of work involved with turning GLSL code into something the GPU can run with. Preprocessing wouldn’t be particularly expensive I’d think, and infact using comments or #if 0 to block out portions of your code is a good way to reduce the amount of code compiled.
Unfortunately the only real way to speed this up will be with some sort of shader cache system that takes the GPU and driver specific output from a shader compilation and caches it, to load it directly when it encounters GLSL text it’s already seen. It’ll likely still cause a hiccup the first time the combination is seen within one run of TD though, since it still needs to load the shader/pipeline, even if it’s not compiling it.

Vulkan has some tools for this that I’m going to be looking into.

1 Like

That makes sense.

If there isn’t an appreciable difference between having code that’s #if-ed out vs stripping that code out beforehand, I’ll stop focusing on doing that.

Sounds like my time will be better spent just trying to optimize the code.

There are some sacrifices that I made to simplify interoperability that may need to be addressed to improve the performance. For example there’s a Context struct that gets passed between all operator function calls even in cases where it isn’t actually used. And there are cases where operator chains get called multiple times when they could potentially reuse a returned value.

There’s also been some interesting work done with “ubershaders”, but that would involve rewriting all of the infrastructure of the toolkit.