TGMS Maintenance – How to Effectively Manage and Maintain Your System

by Sam Lacasse on Aug 20, 2020 10:30:00 AM

Your department head calls you into his office…Good news John you have been promoted! Part of your new position is to maintain the Toxic Gas Monitoring System. You think to yourself….what do I know about maintaining a system like this?

Technical systems require a team environment to be successful. If the team is not yet in place, consider recruiting a few members to be part of a diverse team.

Understanding the system is a first important step. Here are a few questions I ask when I enter a new environment.

Is the system new to this facility?
Who are the primary stakeholders?
What is the site safety philosophy?
What is an acceptable level of down time?

To effectively create a maintenance strategy, we must understand all the pieces that make-up the system. This is especially true when a system has been newly installed, upgraded, or has simply changed hands from one administration team to another. A new system, in most cases, will have fewer challenges to overcome than a mature system that contain some idiosyncrasies only known to someone with practical hands-on experience and site history. The expectation is that a newly installed project will have a comprehensive documentation set and training program to aid the operation and maintenance approach.

To have an effective monitoring system it must eliminate nuisance events to maintain the user’s confidence in the system.

Assessment

Assessing the system down to the ‘nuts and bolts’ is a very important step. The graphic below is a typical concept of a monitoring system.
Have a long-term vision of what the system needs to be. Are there any major upgrades planned to the building or process? You need to know where the system is and then have a strategic plan to get to where it needs to be. Many different industries have different requirements. A User Requirement Specification (URS) is a great document to have to define the general requirements of the system and is used as a guide that everyone can use as the primary reference.

Detection

This part of the system will be the primary focus of this section. The detection components will have some combination/variation of the following components;

Detectors that come in a variety of sensor types and technologies:
- Gas detectors - portable, catalytic combustion, electrochemical, infrared and photoionization.
- Discrete – air flow, coaxial pressure, emergency button
Programmable Logic Controller (PLC) based controls with distributed Input and Outputs (IO).
Centralized computer system or Human Machine Interface (HMI) with a Graphical User Interface (GUI).
1. Options
  - Historical database.
  - Alarm database.
Alarm notification software (email/text/voice).
Reporting software.
Development/maintenance software.

Gas Detectors

Gas detectors, cartridges and other consumables have a life expectancy. Be sure to enter all equipment into a maintenance database that is routinely monitored an sends alerts when equipment is due for servicing. Label equipment with a unique ID and a last service or next service or replacement due date label. Generate monthly reports of progress and activities. Note the warranty period and consider a vendor or supplier maintenance contract if internal resources are not available.

PLC, IO and HMI

Generally, PLC based systems rely on a stable and reliable power source usually having an Uninterruptable Power Supply (UPS), a Central Processing Unit (CPU) with specific firmware, compiled software in the CPU to run the specific application, network infrastructure and IO hardware to interface to the field signals originating from a detection device. Items to consider:

Each component should have a unique ID.
Any field replaceable batteries or consumables should be entered into a maintenance database.
Updated and complete back-ups of configuration and application files must readily available.
All documentation must be accurate and maintained.
Control panel drawings should have printed copy in the respective control panel (s).

Maintaining this type of system is usually overlooked until an update or major upgrade is required. Get familiar with the details using an internal or external resource if necessary. It is much easier to understand a system while it is functional than after something goes wrong. Know your Bill of Materials (BOM) usually included as part of the control drawings and identify core components, suppliers and availability. If a BOM does not exist have one created, possibly in your site maintenance software system.

Most PLC manufacturers maintain a good online resource for technical manuals, knowledgebase articles, firmware upgrades, support status, etc. This is a good place to start to map out the PLC hardware life cycle. Nothing lasts or is supported forever. Be proactive and have a plan in place to upgrade a system between 8 to 10 years in age. This is a good general rule of thumb and has many variables such as manufacturer/installation quality, clean or dirty environment equipment is exposed to, pace of technology, etc…

PLC equipment tends to be labeled a ‘black box’. I mean people know the acronym but not much more than that. These types of systems are very reliable and do not require much attention until something stops functioning. Most events I have been called in for usually stem from an external source such as ethernet switch failure, a construction worker inadvertently cutting power, a communication line being disconnected. Very rarely are there electronic hardware failures at the PLC level but it does happen on occasion.

Central Computer System

I’ll try to keep this section brief. Most users today are more familiar with a computer-based system and often have an entire department dedicated to maintaining, monitoring and updating these systems. Historically a safety system is isolated from the outside world. This was partly because there were not enough resources available to monitor, maintain and protect a system with routine network maintenance or from a dreaded cyber-attack. The need for instant notification and the cooperation of IT professionals has set this trend in a different direction. More and more we are seeing all types of systems fully integrated together to achieve a common goal to reduce response time. Virtual Machines (VM) technologies have been utilized to increase capacity and almost eliminate downtime because of a failed piece of PC hardware. Long gone are the days of maintaining old Operating Systems(OS) and specific hardware drivers required by the conventional PC. Now a computer has been reduced to a file, a very large file.

Gas Detectors

Maintaining this system will be a bit more involved. As mentioned in the previous section be sure to have an active back-up system in place for configuration files and programs, and a disaster recovery plan (usually part of the IT scope). Periodically test and verify system backups and restore procedures and historical data (alarms and process values). Review system capacity over time. The more historical records that are created, the more memory is used. Systems that are properly configured have incremental back-ups with external storage of back-up files to reduce the risk of having all your data in a single location.

Development software if not already included in your system may be something to consider. This allows you to make changes from within the system, monitor and record system configuration changes and, in the event of a failure, would greatly reduce the response time to diagnosing/correcting a problem. Example: If you have to call in someone to diagnose a PLC problem, they would make a site visit, plug into an network switch, make sure they have a non-conflicting IP Address so they don’t cause an issue, obtain the latest program version, and then start the diagnostic process. These initial steps could take hours.

The detail that folks are usually unaware of are the recurring costs of software maintenance plans. Most software manufacturers adopted a licensing model that are either specific to a single software package or a concurrent user pool. Software is licensed to an end user (a specific user in most cases) for the specific version of the software. Manufacturers offer a form of maintenance plan where you pay some percentage or ratio of the software you currently own for the current year. This allows you access to a web site where you can download updates. I recommend to all my clients to remain ‘in support’; it is much more cost effective in the long run because if your support lapses too long the cost is comparable to buying new software licenses.

Security

The important things to note on a safety system are to identify outside sources of service interruption. Identify the following;

How is our system network isolated either physically or administratively?
1. A network diagram should be reviewed and frequently updated. A picture is worth a thousand words. _{Fred R. Barnard}
Are computers terminals and local displays protected by a username and password?
1. Common passwords used between users is not recommended.
What password policy is in place? Yearly, Monthly, never?
1. Discourage writing passwords on sticky notes under keyboards/monitors. You’d be surprised what we have seen!
Are there physical security measures in place protecting our equipment?
1. Walk each area and document. i.e badge access, lock & key, passcode, etc..

Training and Support

Once you have identified your system and its critical infrastructure and components, have a meeting to see what the team knows and areas that need outside resources or additional training. Contact you local representatives and see what they offer for training and support. It is likely to be more cost effective to train your team to do some of the more day-to-day support functions while leaving the heavy lifting to vendors. Every location varies on the level of support required.

Run through a hypothetical ‘what if’ scenario to get a feel for who would do what during a system outage. This should be a good test to identify possible areas of improvement or development.

Summary

You will be in a much better place if you are proactive rather than reactive.
Have a plan.
Maintain records and documentation of routine tasks that are performed.
Generate schedules and reports to inform everyone involved of the system’s health and performance.
Don’t go it alone!
Stay safe!

About the author

Sam Lacasse is a Senior Process Controls Engineer for Hallam-ICS with 22 years of experience. Graduating from New England Institute of Technology 1993 with an A.S. Science Degree. He has extensive experience in Toxic Gas Monitoring, Food & Beverage, Robotics/Vision/Motion control and large scale Water/Wastewater applications.

Read My Hallam Story

About Hallam-ICS

Hallam-ICS is an engineering and automation company that designs MEP systems for facilities and plants, engineers control and automation solutions, and ensures safety and regulatory compliance through arc flash studies, commissioning, and validation. Our offices are located in Massachusetts, Connecticut, New York, Vermont and North Carolina and our projects take us world-wide.