Bluefish444 latency & NVIDIA GPUDirect

Hi,
please may I ask whether is NVIDIA GPUDirect supported with Bluefish444 I/O cards? I am looking for lowest possible video I/O latency, but I wasn’t able to find documentation on GPUDirect working with Bluefish444 in TD.

So far my experience with Blackmagic cards was far from ideal in terms of latency. AJA performed better than Blackmagic, but I am quite curious how fast might Bluefish444 be…
(In my tests total input + output delay was around 8 frames on Blackmagic Quad 2 and around 5 frames on AJA Corvid 88.)
Bluefish444 prices seem quite high, but their products (especially KRONOS K8) look very nice. Please what do you think about their I/O cards - are they worth the money? Have anyone tested their latency with TD? Thanks.

Cheers
Monty

Our BlueFish support does not currently support GPUDirect, so AJA is the way to go for that currently. We are investigating upgrading the bluefish SDK and adding support for the KRONOS K8 card though.

Hi @malcolm, thank you very much for information - that would be great. While browsing Bluefish website I noticed AMD’s version of GPUDirect - called AMD DirectGMA. It looks quite interesting as it completely eliminates copying buffer to system memory. It allows Bluefish card to directly write into portion of GPU memory and also allows GPU to write directly into the memory of Bluefish card. I guess that could provide even faster transfers between cards - especially considering the fact that new AMD Radeon Pro GPUs use PCIe 4.0. May I ask what do you think about that? Is there a chance TD would support this technology in conjunction with Bluefish cards? Thanks.

I am attaching some pictures and citations.


With conventional approach, video data is copied three times in the system memory, while with Nvidia GPUDirect the data copy performed only twice (frame grabber to system memory and system memory to GPU memory).

AMD DirectGMA provides high-speed Peer-to-Peer DMA transfers between the memories of 2 GPUs or between the memories of the GPU and the FPGA. DirectGMA for Video is optimized for the likely best performance in order to exchange data between frame grabbers and GPUs with direct video streaming from Frame grabber to GPU, totally avoiding data copies to/from system memory.

Key Features of Nvidia GPUDirect

  • Accelerated communication with video I/O devices
    • Avoid unnecessary system memory copies and CPU overhead by copying data directly to/from pinned host memory
  • Synchronised transfers between GPUs and Bluefish444 hardware
    • Allow devices to work together without artificial wait-states to ensure completion

Key Features of AMD DirectGMA

Hey, this currently isn’t on our task list. Sorry

Oh, I see, all right. Please may I also ask what do you think about Bluefish cards in general? Have you done by any chance some internal tests comparing AJA and Bluefish in terms of I/O latency? I am looking for solution that would have maximum of 3 to 4 frames of total input + output delay (1080i50). However it seems to be impossible to get such a fast transfer with cards I have tested so far. I was hoping Bluefish might be better at this, but I have never had a chance to test their hardware. Thanks.

I am also very interested in this topic. I already worked with AJA und Blackmagic cards and while latency with TD was good, I found the drivers for them were buggy and in some cases even crashy.
I would gladly invest in more expensive cards like Datapath or Deltacast, but I am not sure how they compare latency-wise with AJA/BM in TD?

Deltacast support is something some clients have asked me about as well. We’re looking at assembling some money to fund the dev hours, if other people want to pool towards this as well that would be nice :slight_smile:

some major Mediaserver vendors recently have moved from bluefish to deltacast because bluefish wasn’t meeting production timetables.

Deltacast is a current OEM favourite. It’s the stock card delivered in hardware from Christie/Coolux, Avstumfl/Pixera, disguise/d3 and MA/VPU. Their FLEX architecture looks pretty neat. From what I know they are not available for consumer/low-volume sales.

Hello, please may I ask whether there are any news in terms of upgrading bluefish SDK? Thanks. :slight_smile:

Hey, no news on this right now. I have it in my todo list but the new cards are at the office so I don’t have them to work on right now. I may grab them one day when going in.

Sure, no problem, I was just curious, thanks for reply.

Hello,
as I am still fighting with video I/O latency I thought I might ask if anything changed regarding bluefish? Thanks.

No new news yet. Still working through other priorities so far

All right, thanks for info.

May I ask whether it would be possible for you to just test if KRONOS K8 works with current Touchdesigner build? If it would somehow work even with older SDK and without GPU Direct while still providing throughput latency (1080i50 SDI In -> TD -> 1080i50 SDI Out) around 3 to 4 frames I would gladly buy it. I would test it myself but I don’t have access to it and it is quite expensive for just a “test run”.

In case it wouldn’t work with older SDK (or would perform worse than with 3-4 frames latency) may I ask whether you could recommend me some video I/O card compatible with TD that would be able to process SDI signal with such a low latency?

I’m pretty sure it won’t work with the current SDK since things have changed quite a bit between the new card on the old ones.
For now I would suggest AJA cards.

Aha, I see. Thanks for info.
Unfortunately AJA doesn’t seem to meet the low latency requirements as I have been getting around 5 frames (sometimes it is 4 frames, but sometimes also 6 which is not acceptable :slightly_frowning_face:) delay between SDI In -> TD -> SDI Out.

Earlier in this thread I have seen mention about funding some dev hours. I am not sure if this would be the way to go, but in case it would be possible to somehow help this bluefish integration I would gladly participate as it is quite important for me to produce low latency solution during next month.

That seems high for AJA. What is your machine configuration?

I have tested AJA Corvid 88 on HP Z820 with Dual Xeon E5-2687W v2 3.40 GHz, Quadro K6000, 32GB RAM. I guess this won’t be much better on other machines, right?

Ya true, that should be a fair test offhand.

Hmm, I have thought of using something like Unreal Engine for bluefish IO and then Spout for transfer between UE and TD, but just writing that sounds too complicated to perform well :grinning:.