This blog posts explores the common database types and schemas used for manufacturing intelligence and facilities reporting.
Manufacturing sites and large facilities can generate a large amount of process data that is stored in databases, which are used as data sources for reports.It is worth examining the various types of databases and structures used to store that data, so informed choices can be made when selecting databases and schemas used for reports.
When selecting a database type it is important to address the user requirements of the report end users. The reporting solution must balance report generation speed, flexibility, and data availability.
Speed can become an issue as database size and complexity grows or as users demand more data over large time spans. The database design choice and structure can greatly affect speed and responsiveness of the system.
Flexibility allows access to data in multiple ways so users can easily slice, group, and filter data. Consideration must also be given to handling changes in business requirements and to handling changes to the data contextual information.
Data Availability, in this context, is the ability of the system to be available for data collection and be available to users for obtaining data. One of the important drivers of data availability is the choice of data collection methods. Choices such as collector redundancy, local data caching, and application reliability will affect the data availability.
Time-Series Data Historian
The time-series data historian is a standard way of logging manufacturing and process data. Its strength is in logging huge amounts of measured data using a small amount of storage space by using various sampling techniques and sample intervals. The data is identified with a tag name, and the values are stored as a numeric data type or string. The data historian is a database type that is used for high-performance trending applications. A weakness of the data historian is that tag names give little context to the data, so an additional database is needed to associate the data with attributes. It is also difficult to align the data from multiple tags into a single unit of information, such as a batch record.
The data historian can provide data very fast when paired with an appropriate client application, such as a client trending tool. But data retrieval can be very slow when using integration software such as an OLE DB connector that is used to query the historian using SQL syntax.
Data historians can be configured for high availability by setting up redundant collectors with failover, so that if the primary collector fails, the standby collector will take over immediately. The collectors can also cache data locally when the data archive server is not available, and then log the data to the data archiver once it is back running.
SQL Database – Flat Tables
If you are looking for one of the simplest and cheapest ways to log data from a SCADA system, logging into database tables is the way to go. A typical setup will include a Microsoft SQL Server database server to store the data and either a SCADA system or other logging application.
Here are some options:
- From a SCADA system such as GE iFIX, logging can be setup using SQT/SQD blocks which can periodically log data using either triggers or periodic intervals. This option also provides a local file cache in cases where the SQL Server is not available.
- GE iFIX also supports using Visual Basic for Applications (VBA) within iFIX. The iFIX Scheduler can be used to trigger an iFIX script which then uses the Active Data Objects (ADO) object library to connect to a database and log data. This requires programming, but this method can be used to handle complex calculations, business logic, and error logging. This is a good choice for complex situations and cases where the data is being logged in response to operator actions.
- Software such as the OPC Data Logger by Software Toolbox, Inc. can be used to log data to a database table from OPC servers. The data can be logged to Microsoft Access, MySQL, Oracle, SQL Server, and text files.
Data can be logged to wide tables, where a single record includes a timestamp (or start time and end time) and values from multiple sources. This is a good choice for batch records and production events. Data retrieval can be fast and is simple to query.
Data can also be logged to tall tables, where there is a single timestamp, tag identifier, value, and other fields that indicate data quality and other attributes about the value. This method is similar to the data historian. Data retrieval will be more flexible, but slower than the wide-table method and more difficult to query.
SQL Database – Relational Database Normalized Schema
In some cases, data is logged not into simple flat tables, but into tables that are part of an existing normalized relational database. This is useful for providing the data more context, such as how it relates to equipment and products. Here the data is logged so that it has a unique ID that allows it to be related to other tables in the database. Logging the data using calls to stored procedures is useful for this type of situation. If you purchase an MES package, the data may be logged to a normalized relational database that is part of the software product.
SQL Database – OLAP Database Star Schema
The Star Schema data base is a variation of the relational database that is designed for reporting instead of database transactions. The star schema design is used in an Online Analytics Processing (OLAP) database. Data would not typically be logged to an OLAP database directly, but would instead be logged to a normalized relational database(s) and then the data would be automatically extracted from the source databases using an Extraction, Transform, and Load (ETL) process.
The Star Schema design is very useful for providing data to Business Intelligence (BI) tools such as reports, PowerPivot, and Microsoft Power BI. This is a good choice if data flexibility and report speed performance are critical requirements. Implementing the star schema design is a more involved and complex endeavor than data logging into a relational database, but it can provide a much more useful data environment for data analysis users who need to filter, slice, dice, and group data in many different ways to gain insights.
Future blog posts will explore in detail the Star Schema design.
OLAP Data Cubes
To gain even higher performance from an OLAP database, a data cube is used to aggregate the OLAP database data. A product such as Microsoft Analysis Services (SSAS) multidimensional server, which is part of SQL Server, can be used to implement OLAP data cube. These solutions would likely be developed by a corporate IT team; and are therefore outside the scope of the typical Manufacturing Intelligence solutions.
An alternative to Analysis Services is to use PowerPivot, a free download from Microsoft and an add-in for Excel. PowerPivot is designed for self-service BI. Data can be obtained from a star schema database using queries or for better control using stored procedures. Users can then develop their own pivot tables and pivot charts that use PowerPivot as a fast in-memory data source.
About the Author
Steve Comeau is a Senior Business Intelligence Developer for Hallam-ICS. Steve has over 20 years in control systems, specializing in data historians, database programming, and report development.
Hallam-ICS is an engineering and automation company that designs MEP systems for facilities and plants, engineers control and automation solutions, and ensures safety and regulatory compliance through arc flash studies, commissioning, and validation. Our offices are located in Massachusetts, Connecticut, New York, Vermont and North Carolina and our projects take us world-wide.