An IT architecture for IP video operations

Utah Scientific
Scott Barella, Utah Scientific
Post Production
September 22nd 2017 at 12:20PM : By

Scott Barella is chief technology officer for Utah Scientific and a member of the board of directors for the Alliance for IP Media Solutions (AIMS), where he also serves as deputy chairman of the Technical Working Group

The term "fabric" is commonly used to describe an Ethernet-based architecture, and for good reason. Fabric is a perfect analogy for how all of the connections in an Ethernet design are woven together to create a consistent underlying structure and a strong platform for studio video over IP. Ethernet fabric is, in essence, a composition of similar and dissimilar data flows that define a set of functions in and around themselves.

About multicast transmissions

Multicast is an Ethernet approach in which information is addressed to a group of destination computers simultaneously. In other words, it’s a simple method for a single video or audio transmitter to connect with many different video or audio receivers. Multicast is ideal for a modern broadcast environment because it behaves like an SDI router connecting a single source to multiple destinations. The same subnet rules for unicast also apply to multicast transmissions; i.e., the transmitter and receivers have to be connected within the same subnet. This connection foundation is often referred to as the “source port,” since the multicast traffic being emitted into the network has to originate from an IP port using an IP address.

The method to join the transmitter to multiple receivers is called Internet Group Management Protocol (IGMP), with an Ethernet switch managing this connection. IGMP not only notifies the transmitter that the switch will be joining receivers to it, but also making sure that the transmitter is aware that the switch is listening.

Timing is everything

The most common means of transporting video is the Universal Datagram Packet (UDP). UDPs are very basic, but this simplicity is also why UDPs are so common; since they don’t require any direct connection management, they’re also highly versatile. However, UDP packets are so common there’s no way to number them. This is problematic given the frame-by-frame structure of video — by which it’s played out according to a number of frames per second. It’s critical that these frames are in the correct order, and that’s where Real Time Protocol (RTP) comes in.

RTP is ideal for video because it solves the vital task of sequencing the packets. RTP packets are small enough that about seven can fit within a single UDP, and RTP packets can even be time-stamped. RTP timecode exists as an entirely separate data stream, rather than an actual marker placed on the video -- eliminating the old-school sync pulse that’s existed since the beginning of video.

The method used to time-stamp the RTP packets is Precision Time Protocol (PTP), by which the transmitting device is responsible for reading the PTP packets on the network and stamping the RTP packets as they are emitted to the Ethernet network. In this manner, PTP packets serve as a synchronisation source.

The SMPTE ST 2059 standard utilises PTP for achieving time stamps relevant to video packets in Ethernet networks. As a pure Ethernet timing method, PTP is the ideal synchronisation mechanism for the new SMPTE ST 2110 standard — itself built from the ground up on Ethernet. The SMPTE ST 2110 soup contains PTP packets (2110-10), video RTP packets (2110-20), audio RTP packets (2110-30) thanks to AES67, and ancillary data RTP packets (2110-40). But putting it all together in a workable combination requires a little planning.

Network design

An important aspect of SMPTE ST 2110 is the size of the pipes needed to carry all this data in some organised fashion. Video will naturally demand 10 Gb/s pipes, while audio and data have much smaller bandwidth requirements (100 MB/s and 1G/s, respectively). Therefore, it’s possible to manage video on 10 G/s switches and audio on smaller one G/s switches. Keep in mind that 10 G/s ports can be rather expensive, so it’s important to use bandwidth accordingly and aggregate signals in order to maximise port capacity. Keeping all the signals in groups according to workflow may be a good tactic, given that SDI workflows have tied routers together for years.

Another element of network design is control and management. C&M addresses and their associated signaling are usually kept entirely separate from the “business end” of the audio, video and data essence flows, with the control network existing on separate 1 Gb/s Ethernet switches and connected separately.  

Example multicast network

The diagram at top offers a simplified example of how all of the Ethernet threads can be woven to create a strong fabric for a video network. Here, the management and control network is separate and therefore exists on an entirely separate subnet, with all control devices connected to a simple 1 Gb/s switch. This control subnet manages configuration as well as the device flows, with a separate switch that can be divided into two control VLANs as indicated by the black and red lines.

The video, audio, and data flows are shown in the center on main and one alternate subnets, with transmitters represented by the blocks on the left and receivers on the right. There are two PTP GrandMaster timing generators locked to GPS, but both supply their PTP packets to each subnet.


While the evolution of professional video over IP is underway, there will be lots of challenges to early network designers. A thorough understanding of the “threads” in the underlying Ethernet fabric will go a long way toward addressing these challenges and helping engineers usher in the new era of studio video over IP.