Tag Archives: big data

Industry 4.0 and the sensor data analytics problem

That sensor data problem

A few weeks ago, I met with a number of IT consultants who had been hired to provide data science knowledge for an Industry 4.0 project at a large German industrial company. The day I saw them they looked frazzled and frustrated. At the beginning of our meeting they spoke about the source of their frustration: ‘Grabbing a bunch of sensor data’ from a turbine had turned out to be a pretty daunting task. It had looked so simple on the surface. But it wasn’t.

Industrial time series data

Data hungry Industry 4.0

In my last blog post, I looked at the Industry 4.0 movement. It’s an exciting and worthy cause but it requires a ton of data if executed well. Sensor data (aka industrial time-series data) from various assets and control systems is key. But acquiring this type of data, processing it in real-time, archiving and managing it for further analysis turns out to be extremely problematic if you use the wrong tools. So, what’s so difficult? Here are the common problems people encounter.

1. The asset jungle

When we look at a typical industrial environment such as a packaging line, a transmission network or a chemical plant, we will find a plethora of equipment from different manufacturers, assets of different ages (it’s not unusual for industrial equipment to operate for decades), control and automation systems from different vendors (E.g. Rockwell, Emerson, Siemens, etc.). To make things worse, there is also a multitude of different communication standards and protocols such as OPC DA, IEEE C37.118 & Modbus just to name a few. As a result, it’s not easy to communicate with industrial equipment. There is no single standard. Instead, you typically need to develop and operate a multitude of interfaces. Just ‘grabbing’ a bunch of sensor data suddenly turned difficult. There is no one-size fits all.
 Asset Jungle

2. Speedy data

Once you have started communicating with an asset, you will find that its data can be quite fast. It’s not unusual for an asset to send data in the milisecond or second range. Capturing and processing something this fast requires special technology. Also, we do want to capture data at this resolution as it could potentially provide critical insights. And how about analyzing and monitoring that data in real-time? This is often a requirement for Industry 4.0 scenarios.
high speed data

High speed data vs slow: what could you be missing?

3. Big data volumes

Not only is data super fast, it’s also big. Modern assets can easily send around 500 -10000 distinct signals or tags (e.g. bearing vibration, temperature, etc.). A modern wind turbine has 1000 plus important signals. A complex packaging machine  for the pharmaceutical industry captures 300-1000 signals.
The sheer volume creates a number of problems:
  • Storage: Think about the volume of data that is being generated in a day, week or month: 10k signals per second can easily grow to a significant amount of data. Storing this in a relational database can be very tricky and slow. You are looking at massive amounts of TB.
  • Context: Sensors usually have a signal/ tag name that can be quite confusing. The local engineer might know the context, but what about the data scientist? How would she know that tag AC03.Air_Flow is related to turbine A in Italy and not pump B in Denmark?
sensor structure

Signal/ tag names can be extremely confusing

4. Tricky time-series

Last but not least, managing and analyzing industrial time series data is not that easy. Performing time-based calculations such as averages require specific functions that are not readily available in common tools such as Hadoop, SQL Server and Excel.  To make things worse, units of measure are also tricky when it comes to industrial data. This can especially be a huge problems when you work across different regions (think about degree C vs F). You really have to make sure that you are comparing apples to apples.

5. Analytics ready data

An often overlooked problem is that sensor data is not necessarily clean. Data is usually sent at uneven points in time. There might be a sensor failure or a value just doesn’t change very often. As a result you always end up with unevenly spaced data which is really hard to manage in a relational database (just google the problem). Data scientists usually require equidistant data for their analytics projects. Getting the data in the right shape can be immensely time-consuming (think about interpolations etc.).
Uneven Time-Series data

Unevenly spaced sensor data

That tricky sensor data

To summarize this: ‘grabbing a bunch of sensor data’ is anything but easy. Industry 4.0 initiatives require a solid data foundation as discussed in my last post. Without it you run the risk of wasting a ton of time & resources. Also, chances are that the results will be disappointing. Imagine a data scientist attempting to train a predictive maintenance model with just a small set of noisy and incomplete data.
To do this properly, you need special tools such as the OSIsoft PI System. The PI System provides a unique real-time data infrastructure for all your Industry 4.0 projects. In my next post, I will describe how this works.
What are your experiences with industrial time-series data?

Industry 4.0 & Big Data

Industry 4.0

If you work in a manufacturing related industry, it’s difficult to escape the ideas and concepts of Industry 4.0. A brainchild of the German government, Industry 4.0 is a framework that is intended to revolutionize the manufacturing world. Similar to what the steam engine did for us earlier in the last century, smart usage of modern technology will allow manufacturers to significantly increase effectiveness.
While there is a general framework that describes what Industry 4.0 should be, I have noticed that most companies have developed their own definitions. As a matter of fact, most of my clients lump the terms Industry 4.0, Digitalization and IoT together. Also, the desired objectives have a wide range and include items such as:
  • Improve product quality
  • Lower cost
  • Reduce cycle time
  • Improve margins
  • Increase revenue

Industry 4.0 initiatives

Industry 4.0 initiatives

With a wide definition of Industry 4.0/ Digitalization comes an equally wide interpretation of what type of tactics and initiatives should be undertaken to achieve the desired outcomes. Based on my own experience, I see companies look at a variety of activities that include:

When you think about it, each one of these programs requires a ton of data. How else would you go about it? Consider the easiest example: energy management. Reducing the amount of money spent on energy throughout a large plant by gut-feel or experience is almost impossible. It is the smart use of data that allows you to identify energy usage patterns, and hot spots of consumption. Data must therefore be the foundation of every Industry 4.0 undertaking.

Big Data & Industry 4.0

What type of data does Industry 4.0 require? It depends. Typical scenarios could include relational data about industrial equipment (such as maintenance intervals, critical component descriptions etc.), geospatial (e.g. Equipment location, routes, etc.) and most importantly sensor data (e.g. Temperatures, pressure, flow-rates, vibration etc.).
geospatial information

Sensor data enriched with geospatial information

Sensors and automation systems are the heart of your Industry 4.0 program: they pump a vast amount of highly critical time series data through your various initiatives. Just like the vital signals from a human being allow a doctor to diagnose a disease, industrial time series data allows us to learn more about our operations and to diagnose problems with our assets & processes early on.
Screen Shot 2016-07-12 at 21.38.49

The value of industrial time series data

Assets such as turbines, reactors, tablet presses, pumps or trains are complex things. Each one of them has thousands of valves, screws, pipes etc.. Instead of relying on intuition, hard-earned experience and luck, we can collect data about their status through sensors. It’s not unusual for specific assets to produce upwards of 1000-5000 signals. Combine a number of assets for a specific production process and you end up with some really BIG DATA. This data, however, allows engineers and data scientists to monitor operations in real-time, to detect specific patterns, to learn new insights and to ultimately increase the effectiveness of their operations.

screen568x568

What’s next?

Industry 4.0/ Digitalization is an exciting opportunity for most companies. While many organizations have already done a bunch of stuff in the past, the hype around Industry 4.0 allows project teams to secure funds for value-add initiatives. It surely is an exciting time for that reason.
But is dealing with industrial time series data easy? Collecting, archiving and managing this type of data can be a huge problem if not done properly. In the next blog post, I will speak about the common challenges and ideas for making this easier.

Big Data – Can’t ignore it?

Big data

2012 is almost over and I just realized that I have not yet posted a single entry about big data. Clearly a big mistake – right? Let’s see: Software vendors, media and industry analysts are all over the topic. If you listen to some of the messages, it seems that big data will create billions of jobs, solve all problems and will make us happier individuals. Really? Not really – at least in my humble opinion. It rather seems to me that big data fills a number of functions for a select group of people:

  • It provides analysts with a fresh and fancy-sounding topic
  • Media have something big to write about
  • BI companies obtain a ‘fresh’ marketing message
  • Professionals can have ‘smart’ discussions
  • Consultants can sell new assessment projects

Big data – really?

I do apologize for sounding so negative. But I have a hard time finding big value in this big data discussion. Please don’t get me wrong – I would be the last person to deny that there is a tremendous amount of value in big data. But it does not deserve the hype. On the contrary, I personally find that the current discussions ignore the fact that most of us do not have the skills to do big data. We need to get the foundation right and make sure that we can tame the ‘small data lion’ before we tackle the big data Gozzilla. Don’t believe me? Consider the following:

  • Spreadsheets are still the number one data analysis tool in most organizations.
  • Managers still argue about whose revenue and unit numbers are correct.
  • Knowledge workers have yet to learn how to make sense of even simple corporate data sets.
  • 3D pie charts are floating around boardrooms.
  • Companies spend over 6 months collecting and aggregating budgets only to find that a stupid formula mistake messed up the final report
  • Hardly any professional has ever read a book or attended a course about proper data analysis

Pie Chart

Here is the thing: Dealing with big data is a big challenge. It will require a lot more skills than most of us currently have (try finding meaning in gazillion TBs of data using a 3D pie chart!).

A big data problem

Earlier this year, I acquired a 36 megapixel camera. You can take some amazingly gorgeous photos with it. But it comes at a cost. Each photo consumes 65-75MB on my sad hard drive. Vacations now create a big data challenge for me. But guess what: this camera is anything but easy to handle. You have to really slow down and put 100% effort into each and every photo. 36MP have the ability to reveal every single flaw: The slightest camera shake is recorded & exposed. Minimal focus deviations that a small camera would not register, kill an otherwise solid photo. In other words: this big data camera requires big skills. And here is something else: The damn camera won’t help you create awesome photos. No, you still need to learn the basics such as composition, proper lighting etc.. That’s the hard stuff. But let me tell you this: If you know the basics, this big data camera certainly does some magic for you.burj khalifa

Big data – what’s next

Ok. That was my big data rant. I love data and analytics. No doubt – there is a tremendous amount of value we can gain from those new data sources. But let’s not forget that we need to learn the basics first. A Formula 1 driver learned his skills on the cart track. At the same time, there is a lot of information hidden in our ‘small data’ sources such as ERP, CRMs and historians. Let’s take a step back and put things into perspective. Big data is important but not THAT important.

With that: Thank your for following this blog. Happy holidays and see you next year!

Christoph

Visual Analytics – The new frontier? (Guest Post)

WHAT IS VISUAL ANALYTICS – BY DR JOERN KOHLHAMMER

Massive sets of data are collected and stored in many areas today. As the volumes of data available to business people or scientists increase, it becomes harder and harder to use the data effectively. Keeping up to date with the flood of data using standard tools for data management and analysis is far from easy. The field of visual analytics tries to provide people with better and more effective ways to understand and analyze these massive data sets, while helping them to follow up on their findings immediately, in real-time. Visual analytics integrates the analytic capabilities of the computer and the abilities of the human. This means, the human is empowered to take control of the analytical process; he or she is not just the final stage of a reporting process. Visual analytics sheds light on unexpected and hidden insights, which may lead to innovation and increase profits. For example, many key performance indicators are simply calculated using statistical models. But the true relations between data, models and business objectives often remain unclear. If visualization is included as an integral part of the analysis process then comprehension of the models as well as of the data is increased. Errors in the basic assumptions of the models can be recognized early on and newly discovered dependencies in the data can lead to new and possibly better reporting indices. Continue reading