Custom Automation   |   Automated Water Systems   |   Development   |   News   |   About Us   |   Home

SUNAPSYS CSIA

Redundancy

We all know that modern-day electronics sometimes fail and this also applies to control equipment. For many high-availability systems failure can be a serious problem. One solution to this problem is to implement controller redundancy as shown in the figure below.

This system is constructed with a primary and backup processor. Special logic circuitry selects which processor is controlling the system and which processor is the backup. When the primary CPU fails, the logic detects the failure and switches control to the backup CPU keeping the plant running.

While this seems like a logical and well-designed system, there are four issues with this approach to redundancy:
  1. Cost - typically this functionality is only available in a vendor’s most expensive, high-end processors. The special logic circuitry and options to support this approach are also an additional cost. Paying for two processors with duplicate features and functions in case the primary fails only pays off if the processor fails. If it doesn’t fail, the funds are tied up in an unused resource.

  2. PLCs are VERY reliable - The mean-time-between-failure rates are so high that the entire control system, and even the plant itself, will likely be replaced before one CPU will fail.

  3. Complexity - the redundancy logic shown above adds another layer of complexity to the overall system, which can actually make the system less reliable. The logic to perform the function of detecting a failed CPU and then switching all control over to a backup CPU is not a trivial function. Because of this complexity, the reliability of such systems is frequently less than that of the processors they are trying to back up. If this circuitry fails, the entire system fails.

  4. Single point of failure remains - redundancy is only implemented on one component in the system illustrated above. What about the rest of the system? On systems that are so critical redundant I/O, redundant power supplies, redundant networking, and redundant power from the power company should also be implemented. These parts have a higher likelihood to fail than the CPUs but it is easy to see how quickly costs can escalate to the point of not being affordable.
So what other options are available? The first is to evaluate our true requirements. Do we truly need 100% uptime, or is 99.9% enough? Maybe all we need is a pre-programmed spare CPU sitting on a shelf in the panel next to the one in operation. If it fails, we pull out the bad one, slide in the spare and we are back up in minutes.

Another approach is to look at redundancies that are already built into our facility. Many times there are two pumps configured in a duplex arrangement so that if one fails, operation can continue the first one is repaired, similar to the idea of having a primary and backup CPU. Or, there may be parallel production lines in the plant, all performing the same function, dividing the work load between them. If we look further we will find there are frequently many such redundancies already existing in the mechanical systems in our plants allowing control system redundancy to be implemented as seen below.



With the shrinking costs of microprocessors and other electronics, the approach shown above is not only feasible, but may actually be less expensive. PLC manufacturers now make very economical “micro” PLCs that can perform many of the same functions as their larger counterparts. Thanks to the Internet, the cost for networking these inexpensive controllers with Ethernet interfaces has also become very economical.

The savings of this approach go beyond just the hardware and system redundancy. We now only have to design, engineer, and program a control system to run just a small section of our plant. These smaller systems are easier to design, program, install and start up than large redundant-backup plant-wide systems. We only have to design a small sub-system once, then we can replicate the schematics, programming and documentation for all the same systems of that type in our plant. If we repeat this process for all the different kinds of sub-systems in our plant, we end up with a robust, distributed control system.

Now if a CPU fails, its impact will be limited to just the portion of the plant it controls. The remainder of the plant continues to operate normally. Meanwhile, the redundant counterparts can cover the load while we are making repairs. We can also run the affected area in manual control, if necessary.

Any way we look at it, this beats the risk of having our entire plant down and we achieved this reliability for less cost and complexity than using an expensive redundant backup system.


Control & Information System Integrator

Custom Automation   |   Automated Water Systems   |   Development   |   News   |   About Us   |   Home