2020.27390 / 2020.44350 : erratic behavior with multiple moviein using hardware decode

Hello,

Curious what is the hardware limitation for multiple moviein with hardware decode?
On a 2080TI, getting erratic behavior as soon as two h265 8K movies :

You can see at some point the cook time starts ramping up (up to 6ms) for no apparent reason, and then the second TOP stops.

And a 3rd TOP never starts at all without error message.

.

44350 has a slightly more useful error message : “error loading frame”

Also curious what the gpu cook time refers to, since I thought it was dedicated hardware on the chip? Is it only the frame upload time? Definitely non negligible, though really fast (faster than NotchLC, and HapQ also in 8bits)

Used two moviein to try to get more accurate gpu cook times but it didn’t quite help :wink:
Curious what I should expect for 8K 10bit 30fps? Is 2.5 ms accurate (TD runs at 60fps)?
It seems so (tried to confirmed by adding a large blur until I hit 16ms) though a little hard to tell.

Thank you!

Edit : my test movie is 20 seconds long so suspecting the issues arise when it loops?
Edit 2 : also noticed that 10bits seems twice as slow as 8bits, is that truly the case (because of the 16bits fixed conversion)? also noticing the other factor that seems to matter the most is the bit rate, with higher bitrate being faster to decode, which makes sense
Thanks!

H264 Hardwaredecode is only supported for two movies on GFX-Cards. That might explain the third one staying black.
Quadro has “unlimited”

btw found this interesting hack for all latest nvidia drivers which removes the artificial max 2 NVENC sessions limit for Geforce cards - not sure if it also helps for NVDEC decoding… Who is willing to try on their card :grimacing:

Thanks!

@alphamoonbase curious where you found that limit, two?
Found this https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new
linked from https://developer.nvidia.com/nvidia-video-codec-sdk

And oddly it seems it’s only 1 NVDEC for a 2080TI, but also only 1 for an RTX6000
For NVENC it gets more confusing, there are two columns : “total # of NVENC” and “Max # of concurrent sessions”

I wonder if “Max # of concurrent sessions” for NVDEC would be more than 1 but not shown in the chart?

Otherwise I was talking of HEVC/H265, though according to the chart it’s the same, but if it’s 2 decoders, it’s doesn’t explain why it would error after a while? (see the gdrive recording above the screenshots).
If it’s only 1, surprised it even lets me play 2?

For NVENC I’m able to encode 3 at the same time so the chart makes more sense.

I’m doing some dev on a 2080TI but final machine has an RTX6000 so I’ll be able to do quadro testing as well.

@malcolm any thought? Thank you!

I think it’s only encode that is limited on the Geforce cards. Decode should be whatever the card can handle in terms of it’s power. What does the Decode usage graph show in the Windows Task Manager?
I would expect 8K 10-bit to be pretty close to the limit the chip can handle, so I wouldn’t expect 3 of those to work. Can you post your sample video so I can try here?

Thanks Malcolm, that makes sense. If it’s a dedicated chip still curious how the usage works in relation to other gpus operations and what the gpu cook time shows in TD?

here’s the 8K file above, tried it again, reliably getting the playback to stop working with two decodes :

(encoded in TD from some 4k footage from here Free 4k Stock Video Footage - (6,806 Free Downloads))

This would suggest all Geforce cards max out at a single hw decode.

The 2080TI (Turing) hardware decoder can do H.265/HEVC at maximum 8192x8192 resolution, main profile up to Level 5.1, main10 and main12 profile. See NVDEC Video Decoder API Programming Guide - NVIDIA Docs
So my guess is you can decode multiple concurrent streams until you have filled up this resolution.

It’s admittedly a confusing chart, but I think the # of NVDEC is the total NVDEC engines, which is # of chips * NVDECs per chip, which is just basically 1 for everything except for the server/cloud stuff. But I think an engine can do more than one decode.

@vinz99 I don’t think the GPU cook times are really accurate for this one since it’s using the NVDEC + CUDA for the decode work while the GPU cook timing is mostly GL measurement. I’ve done one fix that helps with the issue with the error on the last frame in 40K:

the other thing to try is turning the Frame Read Timeout to 0. If the decoder starts to fall behind I think it gets into a state where it’s always playing catchup and unable to service the required frames fast enough.

Also, what does the windows Task Manager show in terms of memory usage and Decode usage?

1 Like

I do notice that each decode takes about 4GB of VRAM. So if you’re on an 11GB 2080ti, you are likely hitting your memory limit.
I’m on a RTX5000 with 16GB and things work well for 3 but then have the same issue when I try for 4

Hey Malcolm, thanks for the update and the new build.

In the end it seems we’ll do 8 bits 420 for the file format since it’s so much lighter.
With that format decoding even 3 8K videos seems perfectly stable (5.3 gig used, 84% gpu usage in task manager)
4 also works though only at 40 fps

So it does seems there’s no limit to the number of decodes.

Otherwise trying again the 10 bits 444 videos, the new build seems to help a little but it still eventually hangs with 2 videos. But again just with 2 videos I’m already at 100% gpu and 9.1 gig used.

And the frame timeout to 0 made it worse, as far as I can tell.

edit : if the gpu cook time is unreliable, I guess the only way to assess performance is comparing codecs with a more complex project? Or do you have other recommendations? thanks!

edit2 : didn’t realize the task manager was actually breaking down gpu usage between decode and 3d, that’s nice, so the global % seems the max (? not the sum or average), and the “decode” part is much higher.
It does seem that the 3d usage goes up with each video, so that’s probably what gets reflected by TD gpu cook time, so fast but still an impact.

The 3 steps is me going from 0 h265 movie ins to 3