A Detailed Overview of OnPing

1 An Overview Of OnPing

OnPing is a suite of data and device management systems designed to make interacting with industrial machine data simpler. This post is an attempt to walk through all the various systems involved in OnPing one by one. There are lots of capabilities and interacting parts involved in OnPing and it can be overwhelming. However, having everything written down in one place can be a good way of finding one part you understand and using it to understand others. I hope this post helps advanced users
of OnPing to see how systems relate. Also, I thought it would be useful information to people trying to architect similar systems.
There are a lot of moving parts!

1.1 Major Systems in OnPing

OnPing has a lot of moving parts… each box represents a major system in OnPing. This is not a total list of the parts and pieces of OnPing but does give a sense of the way data flows through the system. A particular block actually might represent a dozen separate systems that can be thought of together. It is useful to see how much is involved in an effective data and device management system.

OnPing Major Systems Graph

2 Lumberjack Application System

The lumberjack application system is an application manager designed to run on top of a Linux build. It contains, drivers for an extensive set of field devices including Ethernet IP, total flow and modbus. OnPing’s systems are able to access these devices directly and send data to and from them easily.

The Lumberjack Application System (LAS) is designed to run on very small devices, as small as a raspberry pi or odroid device.

Every running application on the LAS is completely managed. The data it contains is backed up. The network settings it has are remotely configurable. The system makes every effort to minimize the amount of data sent over the wire back to its sources. A Permissions and Identity system are both stored locally and sync’d with cloud devices.

OnPing’s Lumberjack Application System takes security seriously, all data is encrypted, all transmission is client initiated. Eliminating the need for static IPs that so often become vectors for attack. Permissions on the network can be revoked remotely.

All this attention to the networking and security settings is essential to everything else that is done in OnPing. The goal is to make a system that everywhere else you can just talk about Lumberjack A talking to Lumberjack B and that is it.

3 Data Management System

The Data Management systems encompass the ways we store historical and configuration data.

Data flows from the Lumberjacks to the data management system. The important parts are…

  • The RTU-Client – Lumberjack application responsible for interacting with various drivers and applications running on the lumberjack.
  • The RTU-Manager – Server side application responsible for gathering data from lumberjacks and feeding it to cloud applications.
  • The tach-db time series data base. Reads data in from field devices and stores in a multi-resolution format for quick display.
  • The tach-db recovery system – data is recovered as chunks in the event of comm loss.

Data Management System

3.1 The RTU Client

The RTU Client links all the drivers that pull data from various devices in a Lumberjack Application System. It mixes this data with permissions and allows applications to manipulate this data locally with a unified interface. In addition it manages data pushes to the rtu-manager which maintains a server side cache of this data. The emphasis is on the word cache. The source of truth for data in OnPing is always what is closest to the machine.

3.2 The RTU Manager

The RTU Manager provides parameter data to all the cloud services that interact with data in OnPing. It is responsible for receiving all data from devices running LAS. It pushes data to tachdb and is designed to be highly avaialable for very fast queries in OnPing.

3.3 The TachDB

TachDB is our time series database. It is designed to allow very fast multi trend data requests. Data is stored in approximation intervals of finer and finer resolution until you get to the real value. The tachdb lets us trend 100s of parameters across massive timelines incredibly fast. It has many techniques to help with some common problems with time series data.

3.4 The TachDB Recovery

The TachDB Recovery system sends data over in the event of a communications outage. Many systems that do this send the data in the same format as the original system. This often leads to data choke points. By sending the data in a highly compressed format we avoid these issues.

4 OnPing Visualization and Frontend

Currently most of the visualization system resides in the OnPing binary directly but as each part is asked to do more I see this changing soon. Visualizations included.

4.1 Custom Tables – arbitrary tables used to assemble data into meaningful displays for users

Custom Tables started out as a way of allowing our customers to display data in arbitrary ways.
More and more they also provide structure and meaning to the data in OnPing. It is not
uncommon to see tables used to organize machines across a company or particular stats from various locations.

Tables are used to generate reports, they serve as inputs for templates and for maps. Much of what makes OnPing so customizeable is powered by these.
OnPing Custom Table

4.2 HMI – Human Machine Interface, graphical displays designed to mimic machine style ones.

Our HMI system is able to connect data from any driver and any place together in simple views.
They can be shared and linked together to form very complicated and configurable systems.
OnPing HMI
The HMI in OnPing are also localizeable. This means you can deploy an HMI with a separable authorization system to multiple locations on our 2 when deployed, changes to the master copy in OnPing will be propagated to all other instances of the HMI.

4.3 Graphs and Trends – Time series graphs to display changes over time.

Trending, both quick trending. By clicking on a point or with a specific set of parameters that have been preselected. OnPing allows users to pick resolutions of graphs to optimize between loading speed and granularity as needed for various applications.
OnPing Area Trend
Different sorts of trends can be designed with multiple axis for very dense data display.
multiaxis

4.4 Maps – Add geo-spatial data to OnPing.

OnPing gathers location info for every device being polled. This allows maps to be created ad-hoc from any Custom Table in the system.

Map Config

Maps provide a great way of marrying together data from wide areas into easy to understand presentations.
Map Icons

5 OnPing Identity Management

There are two items in OnPing that require identity management.

  1. The users of OnPing, this is primarily handled through the plowtech auth server.
  2. The lumberjacks in OnPing, this is handled through the lumberjack identity server.

Every User in OnPing has an identity based on their email address. They also have a set of groups they are members of that provide permissions for them. While different than a more common roles based authorization system, it shares many features with this approach.

5.1 The Double Rooted Tree of Users and Groups

OnPing assigns permissions using a double rooted tree structure.

  • Every User in OnPing has a Parent User and a Parent Group.
  • Every Group in OnPing has a Parent User and a Parent Group.
  • Users can also be members of groups

The figure below shows how these relationships work. Black lines denote hierarchy, colored lines denote membership.
User Management

  • Every Entity in OnPing is a member of a group and has a set of permissions.

Plow Dashboard

6 Alarm System

The alarm system in OnPing is split across a few different services. Alarming is a very complicated activity whose ubiquity often makes people forget its complexity. To do properly a system must be highly distributed highly available and dependably consistent… Those things don’t mix well! The alarmdb is responsible for combining all the disparate portions of an alarm and creating a unified concept of an alarm to be tested to see if it needs to be ran. These tasks include:

  • Gathering the rules about whether an alarm is in a Tripped State
  • Gathering whether an alarm has been enabled or disabled
  • Gathering the status of the Alert options currently selected by users
  • Combining all this information into a single presentation to be distributed.

Once this has been done it is sent to an alarmstate server by means of an alarm coordinator. The Alarm Coordinator presents the Alarm DB system with routing information that lets the service know how to send data where it is needed.

The Alarm State Server does the state machine work and the callout work in the system. Currently it is split into dozens of nodes, each responsible for 1000s of alarms. Alarms depend on a lot of timing rules that can be quite complicated. Once you know an alarm has tripped it still must be determined…

  • How long has the alarm been tripped
  • How long until a tripped alarm should call out
  • When alarm has called someone, how long should it wait to call again.
  • If something has happened with the phone system or some other error on callout, how long to wait to try again.
  • Should clear calls happen.
  • Has the state of the alarm changed in the middle of checking all of the Above!

What a headache, Alarms are complicated! Lastly, we need to Log what has happened.

  • Who was alerted
  • How were they alerted
  • When were they alerted

alarm system graph

7 Virtual Parameters

Virtual parameters are used by alarms and visualization alike. They provide on demand calculation using OnPing-mask-script to do the calculating. They are a distributed application in much the same way our Alarm system is.

7.1 VP DB

The VPDB contains all the metadata for all the virtual parameters, and also the script database. Each virtual parameter has an ID (VPID), a list of sources
(i.e. the parameters that the virtual parameter is made of), and a Script ID. The Script ID can be used to query the script database in order to get the actual script text.

7.2 VP Router

The job of the VP Router is to store a list of addresses to VP Calculators, and provide these addreses to clients. It will choose which address to give to a client depending on the load that has been given to the VP Calculators. The addresses are listed in a yaml file and any changes to the list will be automatically loaded by the VP Router. Therefore adding or removing VP Calculators should be effortless.

7.3 VP Calculator

The VP Calculator is the process that performs the heavy task of calculating the time values of a virtual parameter. You can actually use it without virtual parameters, since all that it does is to query parameters from TachDB and combine them using a script. It has its own caching system to alleviate TachDB’s load.

8 Managing Complexity

I get asked a lot about what it takes to stand up highly configurable SCADA systems and wanted to sort of outline at least most of the major systems involved in OnPing. Each one of these systems has taken a huge amount of work to architect correctly. Together they provide a unique and integrated system for managing data and devices from industrial applications. The main lesson we have learned is that managing complexity is the key to keeping these systems understandable and maintainable.

We have designd the OnPing systems to scale well and have a high degree of flexibility to absorb new technology and features into the system. There are many features we have added that weren’t discussed but this post gives a nice overview of the primary systems in OnPing.

The goal here is to try and absord the complexity of orchestration for our users so they can concentrate on using their automation to its fullest capabilities.