Videowall Failures & Fault Tolerance

Detecting and dealing with component failure in digital signage systems.

Failures can happen in videowall systems, and if yours is mission critical or public facing, failures can cause problems. We’ll discuss common failures and their causes, as well as ways to detect and tolerate some of them, particularly for software-centric videowall systems.


Let’s assume a that videowall system consists of displays (could be LCD flatpanels, direct-view LED meshes or projectors, for example), computers or hardware to drive those displays, infrastructure to connect all of it and software to control everything. Any of those components can fail, but not with equal probability. Because a videowall consists of many displays, the probability of display failure increases with the number of displays, which is why it is important to use high-quality commercial displays suited for 24/7 operation. These are more reliable and better supported than consumer displays.

When there are computers or media players driving each display, the probability of a failure also increases as the count grows. The usual things to fail in commodity computers are power supplies and moving parts, such as fans and hard drives. Fanless computers and solid-state storage help, but tend to increase cost or lower capacity or clock speed.

Controller computers or server fans, power supplies and hard drives that drive many displays also fail. Redundant power supplies and solid-state storage can help, but can also significantly increase cost. CPU fans, as well as fans on graphics cards, often fail over time, and lead first to degraded performance (due to temperature-based throttling built into most components these days), then to failure.

Infrastructure Tends To Be Reliable
Infrastructure, including network switches or cables, tends to be reliable once it is stable and working correctly. Much of infrastructure is solid state or passive, so it rarely fails. Fans in some infrastructure components can still fail, but they are often redundant, so don’t take the system down after a single failure.

