CHOP Geo Instance vs TOP Geo Instance vs GLSL deform - why is this faster?

Hi all -
I’ve attached the below pictured TOE to this post, it’s a comparison of 3 primary ways of animating instanced geometry:

Since the latest official is out, I’ve been taking a closer look at performance of geometry instancing. I’ve always noticed how the geometry COMP incurs a constant cpu cook time that seems mostly unrelated to the amount of instances, and more a general overhead cost of the instancing data animating.

My guess has always been that the chop (or now TOP too) data has to get converted, or uploaded to GPU, or something, and there’s some CPU overhead here like there is in any top. However, the cpu overhead for me is roughly .25-.35 ms a frame, not too bad, but also if you’re using instancing a lot for UI systems or in replicators, that can add up quick, making the performance of the system more sluggish for the cpu to handle.

I’ve also started leveraging custom GLSL shaders for handling the movement of instances, and that’s where I started noticing an odd and fairly unexplained (to me) difference in CPU performance.

If I feed the very same chop of instancing data into the shader, and sample it as a samplerBuffer, using the instance Index, and offset the world position after tdDeform() (same visual result) the result is that the shader’s cook time is next to nothing, and the geometry COMP doesn’t cook at all.

If I wanted to optimize this approach even further, I could stick the animated parameters in the render top’s sampler inputs, and let the shader access it from there(since the render top cooks anyways) - which would remove constant cooking entirely from the shader. (good for replicated networks, and where lots of copies of this instancing is happening)

That said, when I switch between the 3 methods described above, the result is the GLSL approach is faster, not only does it not incur that chunky cpu cost the geometry COMP does, the render TOP is actually slightly faster in CPU, and GPU. Making me even more perplexed as to why.

Now to my actual question - What is the geometry COMP doing during instancing, that I am not doing during the glsl sample/deform stuff? Is it doing some critical things that are very useful in some other more complex use cases? or is this something that could be optimized?

With the addition of the custom instancing attributes, and other super amazing additions this geometry instancing COMP is becoming very attractive again, and would like to see it be a bit more optimized for animated data - however maybe this ask isn’t really realistic if there’s a reason for the extra cook time that isn’t obvious.

Thanks!

InstancingComparison.9.toe (17.7 KB)

2 Likes

Just wondering, how is the CookTime for the geoCOMP in a traditional way when you are in performmode? Maybe it is more a “problem” of UI-Elements, viewer etc. ?
But interrestig discovery anyway, following!

That’s a good question! forgot to check that in my tests, but checking now it seems that perform mode does not have an impact on the cpu cook time at all.

Would also be interested in total frame time reported in Performance Monitor versus reported by the OPs directly, just to be sure of the numbers. Some time may be ‘unaccounted’ for in the node cook time and show up else where (reported GPU time is more susceptible to this).

1 Like

@ben good point, I hadn’t thought to check that, seems to be consistent with what I’m finding in the network editor as well when I attach an info chop. Here’s the results for the 3 methods:

CHOP:
image

TOP:
image

GLSL:

What’s crazy, is the render top goes from ~0.07 for the chop instance method to ~0.027 going from the CHOP method to the GLSL method. Not only is the GLSL method mitigating the cost of the geom COMP entirely, the render top seems to have a much easier time of dealing with it as well.

My only guess is that the geometry in the GLSL geom COMP is static, and not getting prepared for rendering every frame in some cost saving way that the chop and top instancing methods do?

The glsl geometry comp as mentioned doesn’t cook, so it doesn’t show up in the perf mon, and the shader shows what it does in editor, small cook time of ~ 0.02

I had a small bug in my first TOE, that was preventing the glsl method from rendering (had a constant mat in the render tops override material slot)

InstancingComparison.12.toe (17.2 KB)

This hasn’t affected the above findings in any way though.

Chop: 0,495
TOP: 0,695
GLSL: 0.177

When I made not a big mistake for every render in total. The TOP Rendermethod seems to have some overhead because of the ChopToTop, so when working with TOPs alone it should be faster. (try locking the instance TOP for a better comparison)

Also, do you have any dynamic values in your instancing shader? If not, how does changing uniforms or sampler change the performance?

Ya I agree, I left the ChopToTop out of my findings mostly for that reason, a more practically constructed network would probably have been tops all along if using that for instancing. Looking at just the geoCOMP alone though there’s hardly any difference between the CHOP and TOP method, maybe a tiny bit.

If I understand your question correctly, all three examples have animated tx/ty values, that I guess are dynamic for all three? the glsl method I am feeding the animated values in via CHOP, but that chop I believe get’s turned into a texture buffer anyways under the hood, for use with texelFetch(), so I think using a sampler input instead of a samplerBuffer would prob be comparable.

That said, you have some pretty different results there! Or at least for the glsl part. What are you referring to when you say GLSL: 0.177? Is that how much time the glsl shader takes to cook on your computer?

This was actually just a bug in the upgrade to Sequential parameters for Custom Instance Attributes. It was searching (and failing) to find parameters for all 12 custom instance attributes, which takes a long time.
This is now fixed, and the cook times between this node that the GLSL TOP are both around 0.03ms on my machine. Sorry about that and thanks for the report and analysis!

3 Likes

Awesome news! can’t wait to try that out later.

Thanks for looking into that Lucas and Malcolm! As someone who was heavily using the manual glsl method, great to know that performance is similar.

Looked very briefly at the file, it doesn’t seem you’re testing for rotations, though I remember it being an interesting case since the geocomp would calculate the matrices on the cpu, whereas with glsl you can also do it on the gpu, the fastest depending on geo complexity vs number of instances (would usually stick to gpu)

Haven’t dug into 2020 that much yet but as a side note getting an interesting visual bug with this file which I haven’t encountered so far, curious if others have the same :

Windows 10 home 1903, 2080TI nvidia studio driver 442.19

Ya I saw that too. It’s an issue with the cookbar COMP and it’s calculation of the network pane size. Needs investigation on our side.

1 Like