Manufacturing Intelligence - Data Storage

by Steve Comeau on Jan 9, 2020 10:30:00 AM

<span id="hs_cos_wrapper_name" class="hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_text" style="" data-hs-cos-general-type="meta_field" data-hs-cos-type="text" >Manufacturing Intelligence - Data Storage</span>

This is part three of a five-part series on Manufacturing Intelligence (MI). This blog post describes data storage for Manufacturing Intelligence solutions. It is intended for all persons involved in the data stream, including: Automation Engineers, Plant Managers, Data Analysts, and engineers involved in Continuous Improvements and EH & S. This topic is not limited to manufacturing and is useful for all facilities with a need to collect and use large amounts of automated data.

Part 1 - What is Manufacturing Intelligence?

Part 2- Data Sources And Transactions For Manufacturing Intelligence

Part 3- Manufacturing Intelligence - Data Storage

There are two types of databases that are frequently used for data storage in Manufacturing Intelligence (MI) applications:

Relational databases such as SQL Server: Data is stored in tables that can be linked together using related columns.
Historian databases: Data is stored as a time series in a sequence of binary files, with a timestamp, tag name, value, and data quality code.

In this post we will focus on SQL Server databases. Microsoft SQL Server is frequently used as the database of choice for storing data from PLCs, field devices, and SCADA systems. Although you can use open source SQL databases such as MySQL and PostgreSQL, we generally recommend SQL Server since it is an industry standard, high-performance, and contains essential tools for database development and reports.

For small projects you can use the free SQL Express Edition. On most projects the choice will be SQL Standard Edition, as this edition does not have any constraints on database size and is full-featured. We recommend using the most recent version available that is compatible with the software that will be used to log data and generate reports. The most recent version is SQL Server 2019. You may find that your application software is not yet tested with a recent version, but is specified as compatible with SQL Server 2017, SQL Server 2016, or even a previous version.

Tables

The data in the SQL database is stored in tables. In one scenario, tables are built for your data logging application. Although you may be concerned that the database is a “custom build”, this is actually a positive, since the database is designed and built for your application; therefore, it will meet your needs and perform well. The key element though is the design process. The database must be properly designed and documented with diagrams and a data dictionary.

Some application software will automatically build data logging tables and log data to these application-specific tables. This option works well for basic projects and for quick and easy data logging. This is also a good choice for trending using the SCADA application integrated trending and reporting tools. For more advanced projects well-designed tables will likely be much more effective, especially for building reports and querying data. Both approaches are valid and useful, and both approaches can be used on a project.

Stored Procedures and Functions

Stored Procedures and user-defined Functions provide programmable capabilities to your stored data. Instead of simply logging data, stored procedures can provide the means to perform calculations and process the data, thereby providing a much richer information source. Stored procedures can be used as part of the data logging process so that data can be processed during database inserts and updates. They can also be used as a data source for reports, so that complex calculations and data aggregations can occur for report datasets.

Security

SQL Server has two types of security for users:

Logins for the Server-Level security
Database users for Database-level security

When SQL Server is installed, the default system admin SQL user is ‘sa’. Other Windows users or groups can and should also be added as System Admin. The sa user should be assigned a complex password and the password stored securely by IT. You should not need to use sa again. It is very important that you never use the sa user for data logging or reports. You will see plenty of code examples that use sa, but do not do that. There are several reasons not to use sa:

User sa has full access to server and any server database. The password should only be known to database admins.
Users performing database admin or development tasks should log in using Windows credentials so their activity can be logged with an individual identity and the users can be assigned the appropriate permissions.
Your data logging application users or report users only need access to the specific database(s) with the data and should have minimal permissions.
If the sa user password is changed by the database admin or IT, it will break the application.

It is a good practice to create a specific SQL Server user for the data logging application and assign the user the minimum permissions needed to perform the required tasks.

Backups

It is not good enough that the files on the server are backed up or the VM has a snapshot. Automated database backups should be performed, and these backup files should be stored securely. The automation is implemented by adding a job (or jobs) to the SQL Server Agent which performs the backup tasks. The jobs are scheduled to run periodically or on specific days and times. The SQL Server Agent service must be running and set to start automatically.

You may need to recover a database if it becomes corrupted; or accidental changes are made that can’t be reversed; or if data is accidentally deleted or updated.

SQL Server has three recovery models:

Simple
Full
Bulk Logged

For non-essential databases or during development, the Simple Recovery model will work well. This model is simple to implement and to recover a database. At a minimum, use Simple Recovery with a daily Full backup. You can also add more frequent Differential backups. The advantage of the Simple Recovery model is that restoring the database is simple. The downside is that there can be data loss from the time of the last backup to the time of the restore.

For production databases, use the Full Recovery model. With the Full Recovery model, you must perform database backups and transaction log backups. You will have the ability to restore the database to a point in time and likely not have any data loss. It is critical to include Transaction Log backups, since if you do not, the Transaction Log will continue to grow until it finally fills the hard drive. You really do not want the hard drive to fill on the database server. In most cases, the Full recovery model backups will be implemented and maintained by IT personal, but if that is not the case, be sure to document the backup and restore process; verify that the backups will restore properly; and verify that the Transaction Log is not growing indefinitely.

Next steps

In the next blog post of this series, we examine Database Integration, including:

Integrating with other databases
ETLs and data transformation
Aggregating data for reports

About the Author

Steve has retired from Hallam-ICS, but his contributions to the company continue to be valued.

Read My Hallam Story

About Hallam-ICS

Hallam-ICS is an engineering and automation company that designs MEP systems for facilities and plants, engineers control and automation solutions, and ensures safety and regulatory compliance through arc flash studies, commissioning, and validation. Our offices are located in Massachusetts, Connecticut, New York, Vermont and North Carolina and our projects take us world-wide.