The world of IP-based signal distribution has many different elements that can become quite confusing for the AV professional who’s looking to take his or her first steps into the arena. However, the benefits of IP, coupled with already-broad adoption, put those hesitant to start working with networked AV at a significant competitive disadvantage. In order to help remove common barriers to entry, we present a two-part overview of audio-over-IP (AoIP).
In the first part, we will cover some basics of networking—just enough information to make it easier to understand the technical jargon that manufacturers like to throw around, while also making it easier to communicate with customers and other systems designers. This will simplify differentiation between various AoIP solutions and when they might be suitable for a specific project. Building on that information, in part two, we will look at several current AoIP methodologies, comparing their capabilities and limitations. We’ll provide enough detail that you should be able to design and implement a basic system rather quickly.
So, let’s dive in….
Why Audio Networking?
Prior to networked AV solutions being released into the market, all signals in a system would typically run point to point or through matrix switches. That required individual cables and terminations on every signal path and, oftentimes, heavy multi-core snakes. The number of cables required in such systems could create issues where there was limited conduit capacity available or where there were challenging cable pulls. Reconfiguration was not always easy, and growth could become expensive if exceeding the maximum available I/O of an existing system would require forklift equipment. Signal degradation with distance could be an issue, particularly in the analog world. Yet, on the upside, there was minimal distribution-related latency.
Networked audio solutions require minimal terminations and cables, because one CatX cable with RJ45 connectors can carry dozens, or even hundreds, of signal paths. This simplifies cable pulls and reduces required conduit capacity, saving many hours of labor on the installation side and potentially dramatically cutting system-acquisition costs. The physical distance between networked system components is limited only by the coverage of the network. Reconfiguration and growth is as simple as adding end points at any network drop, without the concern of exceeding the I/O capacity of a matrix switch. Logical changes to the system are also simplified because most routing can be accomplished through software connected to the network. Although there is the potential for latency, it can be mitigated to the degree that it’s a non-issue for most applications.
Networking Basics And The OSI Model
When describing network-related signal-distribution options, manufacturers often cite various “layers” of the OSI model. The OSI model is a common method to describe what happens in a network. The model is just a theoretical description of how data is handled from the time an end user is working with an application through to when the data is received by an application on the other end of the network. That could mean all the steps from how you interact with the piece of software you use to write an email, to what the recipient of that email sees when he or she opens the spam you just sent them, to everything in between. (Because of the ubiquity and easily understood nature of email, we’ll use it in several of our examples. Yet, when it comes to working with many of the networking concepts we’ll cover, the ideas are the same, regardless of whether you’re working with email, audio, funny cat videos, control or any other payload data.)
The OSI model breaks down what happens into elements that are called layers. Each layer adds different functions. This could be things like encryption, multiplexing multiple pieces of data into a single data stream, routing data across a network to a specific device, routing within the final physical device to a specific application (such as Outlook, instead of Chrome), moving data across a wire and many other functions. We’re going to summarize what happens at each of the layers, focusing on the portions related to AoIP. AoIP equipment manufacturers often cite the highest layer their offering utilizes in order to hint at either the underlying capabilities or the technology of the system. Refer to Table 1 as we go through the layers, because it’ll make it easier to follow. Because the OSI model is a generalization, there are occasional gray areas as to which layer is responsible for a specific function.
If you’re looking at your email, you’re working with Layer 7 of the OSI model. This is called the Application Layer because…well…you’re working with an application. When you send the email, it takes the raw data and applies any required compression and encryption in Layer 6, the Presentation Layer. From there, it continues on its way down through these theoretical layers, each one adding its own function. When it gets down to Layer 1, the Physical Layer, our data is moving across wires (or wirelessly) between devices. When the data reaches the destination device, the journey continues up through the various layers. Each layer looks at its associated portion of the data to perform that layer’s functions until, finally, it reaches the destination’s Application Layer.
Starting at the bottom, let’s look through each of these layers and their functions as they relate to AoIP.
Layer 1 – Physical Layer
The Physical Layer contains the cables and connections of a network, and it’s responsible for moving bits from place to place. The Physical Layer also includes hubs. A hub is piece of hardware where all the network devices connect together, each on its own physical port. Data sent from one network device goes into the hub, and then it’s sent out of all ports on the hub to all connected devices. Data collisions can occur when multiple devices attempt to send information at the same time. To avoid collisions, switches (with Layer 2 capabilities) are now used in place of hubs.
Layer 2 – Data Link Layer
The Data Link Layer is responsible for collision avoidance. Switches are devices similar to hubs, but a lot smarter. Instead of sending all information out to all ports, a switch sends information only to the specific port with the intended receiving device. How does that work? Well…every networkable device has a unique physical address, called a MAC address. (MAC stands for “Media Access Control,” and it’s not related to Mac computers. So, even your Windows PC and your Android phone have MAC addresses.) In most cases, the MAC address is assigned from the factory, cannot be changed and is exclusive to that individual unit. A switch reads the MAC address of every device when they are first connected. The switch then builds a table that lists the MAC addresses of the devices connected to each of its ports. When data comes into the switch, the switch reads the MAC address of the intended recipient, and then it sends the data out only to the corresponding device.
There are two general types of switches: unmanaged and managed. Unmanaged switches are plug-and-play; yet, they lack advanced features that might be helpful in some cases. Managed switches provide some configuration options. One of the options on a managed switch is to create a Virtual Local Area Network (VLAN). A VLAN is a method of grouping ports for isolation purposes. This can be helpful for security or if you are trying to segment media traffic (i.e., audio and video) to keep it separate from other corporate traffic.
If we were to look at Data Link Layer traffic, it would contain our original payload information (i.e., audio, email or whatnot), along with the MAC address of the receiving device. This combination of data is called a frame.
Layer 3 – Network Layer
Sometimes, it is beneficial for network administrators to separate data types for efficiency or security. That is where Layer 3, the Network Layer, comes into play. The Network Layer adds the ability to divide a network into smaller networks (i.e., subnets). It also allows for connection to larger networks, such as the internet. Layer 3 also includes information on message priority. For example, can your email wait a few extra milliseconds so that real-time audio traffic can pass undeterred? In Layer 3, additional information is added to our frame that allows the Network Layer to perform its functions. That additional information, combined with the frame, is called a packet.
Layer 4 – Transport Layer
Up from the Network Layer we move to Layer 4, the Transport Layer. The Transport Layer is responsible for data order, reliability, flow control, multiplexing and similar functions. Within Layer 4, port numbers are added to the data stream to facilitate routing within a device to a specific application. These ports are different from the hardware ports on a network switch, mentioned previously. Layer 4 also brings into play our internet protocols (IP). Two primary internet protocols impact the world of commercial AV; they are the transport control protocol (TCP) and user datagram protocol (UDP). The use of TCP or UDP is typically determined by the solution manufacturer, and it’s not something that is addressed by system designers. However, TCP and UDP are brought up in marketing materials often enough that they warrant a closer look.
TCP is used when reliability is more important than speed. With TCP, every data packet is acknowledged by the receiving device. That means every packet must include both the receiving device’s address and the sending device’s IP address for the return message. In addition, every outgoing packet has a corresponding return message. Both the additional addressing information and the return data add to the amount of network traffic. That increases both bandwidth consumption and latency.
By comparison, UDP is used when timely delivery is more critical than accuracy, such as with real-time audio. With UDP, there is no delivery confirmation and the headers are smaller, requiring less bandwidth and creating less latency as compared to TCP. On the downside, this can be a disadvantage on unreliable networks. Many AoIP methodologies will use UDP, with some of them providing an additional level of reliability through the Application Layer. For example, when looking at Dante in Part 2, we will see that connection reliability is automatically confirmed when routing is established between devices within Dante’s Controller software.
Layers 5 To 7
After Layer 4 come Layers 5 and 6, which are the Session and Presentation Layers. Layers 5 and 6 are mostly software- and application-related. They are not of primary concern to AoIP integrators as, mostly, they involve behind-the-scenes processes. As a frame of reference, they perform functions such as data compression/decompression, encryption/decryption and multiplexing. Layer 6 also prepares the data for the Application Layer (Layer 7) with which we interact.
To recap, the OSI model is a tool to help visualize and describe what happens in a network. Data moves down through layers on the way out, and back up on the way in. This model can be applied to describe any network. Other models exist to describe the same functions differently. However, the OSI model is the most popular.
Proponents of various methodologies (or, more specifically, their marketing departments) will often refer to a specific layer of the OSI model to imply capabilities—whether they currently exist, or they’re on the roadmap for the future. As such, some liberties can be taken with interpretation of the various layers, and assumptions of functionality cannot be based solely on the layer referenced by a manufacturer. That disclaimer being made, signal-distribution methodologies claimed to function as Layer 2 (i.e., data with MAC addresses) typically can route between switches on a LAN, but they’re not able to send information across routers to different networks. Methods described as Layer 3 that use IP addresses might have the ability to route beyond a LAN to other networks; however, being a Layer 3 methodology is not a guarantee that routing outside the LAN is viable. For example, Dante is a Layer 3 methodology that, until recently, could only route within a wired LAN. The release of Dante Domain Manager expanded the routing capabilities, making it suitable for use across multiple LANs.
Independent of the AoIP methodology in use, or the OSI layers used to describe it, successfully sending AoIP requires methods for managing priority, timing and synchronization. These are common items also used to describe the capabilities of a particular methodology.
Priority refers to the preferential treatment of some data, making sure it is moved to the front of the line when other network traffic is present. Sync makes sure all data that is intended to arrive at the same time actually does. For example, perhaps an email can wait a few extra milliseconds; however, a worship leader’s microphone must sync with the large, projected image in order to avoid looking poor.
Priority, Sync And Timing
DiffServ and QoS (short for “Differentiated Services” and “Quality of Service,” respectively) refer to a protocol that provides priority by data type. Clock sync and audio packets might need the highest priority. The priority is establish by tags in the IP header of Layer 3. The priority number is assigned within a managed switch to each packet. If this sounds daunting, don’t worry about it; chances are, you’ll never have to set this up yourself, because it falls to the IT department. However, knowing the general idea will make it easier to understand the marketing-speak of manufacturers, as well as to facilitate communication with an IT department, should it be required. The good news is, DiffServ and QoS are not required in most applications, but they’re there if needed. The great news is that the only time this is necessary is on a converged network where non-media traffic might be present with our audio.
Layer 2 switches can provide hardware-based time stamping on each frame. This is called Precision Timing Protocol (PTP), and it can provide sub-microsecond-accurate synchronization. This can be very helpful if, for example, you have multiple microphones set up on a drum kit. A protocol that utilizes PTP can ensure the signals remain in sync and don’t suffer from phasing due to latency dissimilarities between signals.
Real-time Transport Protocol (RTP) helps make sure data arrives in the right order. It’s a timing protocol that works with UDP by placing a timestamp in the header. A variation called Real Time Streaming Protocol (RTSP) does something similar within TCP. Email and other non-real-time data can arrive in any order, as it can be resequenced, if required, prior to opening.
The preceding overview of network terminology associated with AoIP will help as we move into Part 2. In it, we’ll cover the specifics of three common AoIP methodologies, where they fit in, their strengths and limitations, and when they’re commonly used. We’ll provide enough information to help you get started with implementing a system.