So You Need Control System Redundancy?

Written by Sam Lacasse | Jan 11, 2024 3:30:00 PM

Over the past few years, we have seen an increase in the use of redundancy within industrial control systems. In this blog I would like to add my insights as to true redundancy to identify the major considerations, advantages, and possible alternatives to be considered.

Redundant (Miriam-Webster 2023): serving as a duplicate for preventing failure of an entire system (such as a spacecraft) upon failure of a single component.

Redundancy has always been implemented in designs within the electrical, mechanical, hardware/virtualized computing infrastructures, and network designs. Examples include electrical power feeds from different utility sources, multiple pumps connected to the same or multiple source(s), mirrored servers, or dual fiber runs to a network switch. This allows the system to maintain operation when unexpected failures occur.

When determining the requirements for industrial control redundancy, start with the obvious. Work your way from the top down. I like to start with a site or building infrastructure review to identify single points of failure and how this would affect the control system within the scope of a project.

Site

Security access
Emergency response
Local support capabilities

Utilities

Electrical distribution (Utility Feeds, Generators, UPS….)
Safety systems (Fire Alarm, Eye Wash, Toxic Gas Monitoring Systems (TGMS)....)
Heating and cooling systems (Clean Steam, Chilled Glycol….)
Process (Chilled Water, Potable/Non-Potable, Compressed Air…..)
Ventilation (AHUs, Safety Exhaust….)

IT Infrastructure

Security (Physical, Configuration, Management….)
Network distribution (Fiber, VLANs….)
Servers (Physical, Virtual….)
Data storage
Email

Process

Process specific (Water for Injection (WFI), Hydrogen, Nitrogen….)
Production equipment (Batch, Assembly, Modeling Phases…)
Inventory control (SAP, Oracle….)
Packaging (Automated, Robotic….)
Alarm notification (Response Center, Email/Text (Uni/Bi-directional)....)

To determine the redundancy requirement for a system or process, several factors should be considered.

Criticality of The System

Assess the impact of system failure on overall operations, safety, or financial consequences. The higher the criticality, the greater the need for redundancy.

Reliability of Components

Evaluate the reliability and failure rates of individual components within the system. If certain components are known to have a higher probability of failure, redundancy may be necessary to mitigate this risk.

Downtime Tolerance

Determine the acceptable downtime for the system. If rapid restoration or continuous operation is required, redundancy becomes more important to ensure minimal interruptions.

Cost Implications

Evaluate the financial implications of system downtime versus the cost of implementing redundancy measures. Redundancy should be balanced with the potential losses incurred during system failures.

Scalability and Future Growth

Consider the system's scalability and potential growth. If there are plans to expand or increase system capacity in the future, incorporating redundancy early on can help accommodate future requirements.

Environmental Factors

Assess the environmental conditions in which the system operates. Harsh or unstable environments may increase the likelihood of component failure, necessitating redundancy for increased robustness.

Regulatory or Compliance Requirements

Determine if there are any specific regulatory or compliance standards that dictate redundancy requirements for the system. Ensure compliance with relevant guidelines or industry-specific regulations.

Historical Failure Data

Analyze historical data on system failures or incidents to identify patterns or trends. This information can help inform the level of redundancy required to mitigate similar risks in the future.

Remember that redundancy should be carefully designed and implemented based on a comprehensive evaluation of these factors to strike the right balance between reliability, cost, and system performance.

Industrial control systems have redundant hardware and software solutions. Each manufacturer offers design guidelines that need to be followed to correctly implement redundancy.

Hardware

PLC redundancy will have 2 of everything within the main controller. This includes chassis, power supplies, controllers, sync modules, IO Modules and network connections.

Software

Software redundancy will have a Primary and Secondary server for each process such as visualization, alarms, and IO data.

Alternatives

In some applications, after a thorough system review, it may be determined that full redundancy is not required.

Here are a few alternatives to consider for control system redundancy:

Fail-Safe Mechanisms

Implementing fail-safe mechanisms can help prevent system failures and minimize their impact. Fail-safe mechanisms include safety checks, emergency shutdown procedures, and protective measures that are activated in case of a failure. For example, duplicate communication module that handles Modbus TCP/IP communications to critical devices that can be automatically/manually activated.

Fault Detection and Diagnosis

Employing advanced fault detection and diagnosis techniques can help identify potential failures or deviations from normal system behavior. By continuously monitoring system parameters and comparing them to expected values, faults can be detected early, allowing for prompt corrective action.

Redundant Sensors

Adding redundant sensors can provide additional measurements for critical system variables. If one sensor fails or provides inaccurate readings, the redundant sensor can serve as a backup, ensuring that reliable information is still available for control purposes.

Diverse Control Algorithms

Utilizing diverse control algorithms can enhance system resilience. By employing multiple control algorithms with different approaches and assumptions, the system can switch to an alternative algorithm in case the primary one fails or behaves abnormally.

Robust Control Strategies

Implementing robust control strategies can help compensate for uncertainties and disturbances in the system. Robust control techniques account for variations and uncertainties in the system parameters, ensuring stable performance even in the presence of disturbances or component failures.

System Monitoring and Maintenance

Regular system monitoring and preventive maintenance can help identify potential issues before they lead to failures. By implementing a comprehensive monitoring and maintenance program, system reliability can be improved, reducing the need for redundancy in the first place.

Remember that the specific choice of alternatives depends on the requirements and constraints of the control system.

Summary

I have deployed many systems that were identified here. In many of the designs I have seen, especially during the construction phase, they have fallen short of the intended function of redundancy. It is important to carry redundancy designs beyond the control system purview and be a holistic approach that includes all project disciplines. There needs to be a shift in education and awareness for engineering and trades in the correct methods to deploy true control system redundancy.

Here is the short list of shortcomings to be mindful of:

Redundant power feeds and network connections originating from the same distribution panel or being run within the same raceway.

Resolution - suggested rerouting power and network cables in separate raceways or taking a different path. This was not accepted by the Construction Manager because it was not in the original design and the project was being closed out.

Network switch used as part of a design change within a control panel to reduce the cost of cable runs. In this case the end device was capable of redundancy and did not require a switch. Adding the switch created an additional point of failure.

Resolution - bypass the network switch and run redundant network cables to the end device. This was a simple solution that reduced complexity, increased reliability, and had minimal cost impact.

Multiple system redundant controllers and network equipment installed within the same electrical enclosure. Controllers for each critical system should have been located in separate enclosures.

No resolution because it was too costly to re-engineer the panels and room configuration.

Control panel ventilation within a high temperature area did not have a backup cooling fan. During failure of one of the dual power feeds, the exhaust fan turned off and the panels overheated.

About the author

Sam Lacasse is a Senior Process Controls Engineer for Hallam-ICS with 22 years of experience. Graduating from New England Institute of Technology 1993 with an A.S. Science Degree. He has extensive experience in Toxic Gas Monitoring, Food & Beverage, Robotics/Vision/Motion control and large scale Water/Wastewater applications.

Read My Hallam Story

About Hallam-ICS

Hallam-ICS is an engineering and automation company that designs MEP systems for facilities and plants, engineers control and automation solutions, and ensures safety and regulatory compliance through arc flash studies, commissioning, and validation. Our offices are located in Massachusetts, Connecticut, New York, Vermont and North Carolina, Texas and Florida and our projects take us world-wide.

View full post