You are on page 1of 48

Survey of Time Series Analysis

Techniques for Sensor Data

Rajesh Sampathkumar
Senior Consultant
The Data Team
The Data Team: What We Do

Premier consulting organization


Business transformation by data science at scale
IT transformation by big data
Deep experience in the data space
Industries: banking, financial services,
manufacturing and telecommunications
Presenter Background

Senior Consultant @ The Data Team

~10 years of engineering, statistics, problem


solving and data science

Industry experience: automotive, discrete


manufacturing, aerospace, technology and
consulting

3 Confidential
Talk Overview

Sensor data and the Internet of Things


Designing for Data
Sensor characteristics and analysis challenges
Aggregate sensor-data analysis
Time-series sensor data analysis
Sensor data analysis opportunities
Concluding remarks
Sensor Data and the Internet of Things

5 Confidential
The Explosion of Sensor Data
Intelligence Intelligence

Management Management
Smart Processes

Smart Products
Measurement Measurement

IoT
Explosio
n

Cheap
Compact
Compute-
ready
Sensors
Sources of Sensor Data

7 Confidential
Typical Sensor Fusion Architecture

Position
(rad)
Steering
Computer
Rate Memory
(rad/s)
Layer

Speed
Speedomet (m/s)
er

Vehicle
Area
Accelerati
Network on (m/s2)
Interface Analog to Digital Digital Record
Conversion Layer Creation Layer
Gearbox
Gear
number

Brakes Brake Load


(N)

Analog Blog post:


Measurement Analog Measure
Interface Processing Sensing and Measurement Characterizati
on
for IoT
8 Confidential
Physical and Digital Data-Centric
Products
Hardware User Input Application-based
System
parameter data
Parameter
s s

Control
Noise Parameter
Physical Noise Digital
System System

Environmental Environmental

Examples: connected cars, data-enabled Examples: web server logs,


wind farms, wearable technologies like
smart watches, medical devices
mobile application data,
internet data/metadata

9 Confidential
Design For Data

10 Confidential
Design for Data
Considerations of Data-Centric products
Digital transformation of
products and services by Data- Sensor
Measureme
Centric Power
data User
nt and
Requireme
Sampling
Interactions nts
Sensor data is at the Network
Data Diversity of
heart of this new considerati
security protocols
paradigm ons
Static Dynamic
Real time analysis analysis
Application needs define
processing considerati considerati
sensor integration and data ons ons
management
Sensor Characteristics and Analysis
Challenges

12 Confidential
Sensor Data Characteristics
Characteristics take on
importance based on use Static Dynamic
cases characteristics characteristics

Range Stability
Periodic sensor calibration Linearity Sensitivity
Bias Hysteresis
Accuracy Saturation
Interactions of Precision Dynamic
Range
characteristics are also
important
Sensor Characteristics Illustrated
Accuracy Bias Stability

Linearit
Accuracy: Precision
y
Subgroups Comparison
Gauge evaluation
approaches include
R&R analysis,
logical validation,
Type 1 studies

14 Confidential
Domain Considerations in Sensor
Characteristics
Sensor characteristic Product Manufacturing Considerations (Examples)
Accuracy Reduce sampling standard error, high accuracy for precision
manufacturing

Precision Emphasis on high measurement reliability in high technology


manufacturing, medical systems, drug testing

Sensitivity and Sensor sensitivity and hysteresis are extremely important to


Hysteresis understand in high frequency state changes such as electrical
switches, rotating machinery (turbines, engines)

Range and Dynamic Sensor range and dynamic range are important in video data
Range and image processing applications. In video applications,
dynamic sensitivity also.

15 Confidential
Modes of Sensor Data Analysis

Data summaries,
Transformation and Time series data
Query processing profiling and aggregate
numerosity handling analysis
analysis
Stream mode query ETL pipelines Aggregate analysis Time domain
processing Numerosity reduction descriptive and considerations
Batch and micro- Feature Engineering inferential statistics autoregressive
batch mode query Machine learning models
processing models for aggregate Multivariate models
analysis and cross correlation
Anomaly detection
and anomaly handling
Time Ordered Sensor Data: Key
Challenges
Managing and summarizing large data sets
Avoiding spurious correlations in Machine
Learning
Studying evolving or changing systems
Identifying points of failure, or failure modes
Detecting anomalies and supporting problem
solving

17 Confidential
Aggregate Sensor Data Analysis

18 Confidential
Aggregate Analysis: IID Data
What makes data independent and identically distributed?
Criterion Testing Approaches

Time-ordered data System design or architecture awareness,


time-based indexing
Independence (Autocorrelation) Durbin-Watson test, Ljung-Box test

Distribution Assumptions QQ-Plots, Shapiro test, KS-test, AD-test


(distribution fit tests)
Time-based Location Invariance Augmented Dickey Fuller test
(stationarity)

Time-based Scale Invariance Bartletts test, Breusch-Pagan test


(homoscedasticity)

* Methods applicable to IID data are listed.


Testing Assumptions for Aggregate
Statistics
Correlation Studies Normality Tests and Ljung Box and Correlograms
QQPlots

Speed

Other tests of relevance:


Temp

1. Breusch Pagan test


(heteroskedasticity)

2. Variance Inflation
Factor
(Multicollinearity)
20 Confidential
ML for Aggregate Sensor Data Analysis

Regression, Classification and Clustering

Regression Classification Clustering

OLS Significance Distance based


Regularized (L1, testing methods
L2) Tree methods Distribution
Kernel methods Kernel methods based methods
Tree methods

Feature engineering and transformation algorithms may also be applicable


Sensor-Specific ML for Aggregate Data

Anomaly Detection (product, process)


Failure Mode Classification (product, process)
Graph-based analytics (consumer data)
System-state inference (product, process)
System failure prediction (product, process)

22 Confidential
System State and Sensor Data
Individual sensor
readings can be
misleading about
system state

Sanitizing sensor
data is key

End-to-end domain Sensor states that arent


explained by operational
considerations definitions of
measurements
23 Confidential
Sensor Data Regression Caveats
Spurious correlations in linear
models
Different system states over time
Control systems impact
measurement
Known and unknown system
behavior
Environmental changes
Violation of linear model
assumptions
Check for multicollinearity
(variance inflation factors)

24 Confidential
System State Classification (GMM): Scikit-Learn

Gaussian Mixture Models are


effective for known/modeled
phenomena

GMMs are valuable


approximations for repetitive
system behavior measurements

Insight: State-specific data


sampling and real time analysis

25 Confidential
Failure Mode Classification (Tree):
Scikit-Learn
Supervised multi-class classifiers
Burr
can be used to identify known
system failure modes

Ensemble (tree) algorithms are


scalable to large distributed
clusters
Undercut
Insight: real-time failure mode
handling in manufacturing
Novel behavior? Exploration
systems possible in unsupervised
algorithms

26 Confidential
One Class SVM Example: Outlier
Detection
Novel behavior may not
always be bad depends
on the domain

Physical significance of
extreme deviations
(anomalies)

Insight: For small-rate data


Lateral movement analysis in drilling
sets, real-time
operation
outlier/failure mode Identification of novel behavior and outliers
detection
27 Confidential
Anomaly Detection Considerations

Some algorithms (like SVMs) dont lend


themselves to parallelization easily
Data transformation important for SVMs/Kernel
classifiers at a computational expense
Physical interpretation of anomalous reading is
important

28 Confidential
Sensor Data Time Series Analysis

29 Confidential
Considerations for Time Series
Analysis
Time
Autocorrelati Distribution Location Scale
ordered data
on fits invariance invariance
sets

Time series modeling considerations:


Blog post:
No. of differences to stationary behavior Time Series Analysis C
Periodicity and Seasonality/Oscillations onsiderations for Sens
Trends in data or Data

30
Autoregressive Models for Sensor Data

IID Non-IID
Autocorrelation tests
and tests for IID, IID Data Non-IID data

stationarity Non-

AR and ARMA models


Aggregate Multivariate Stationary stationary
Statistics/ML data (special case) (general
case)

ARIMA models VAR-based AR ARIMA

VAR based models


ARMA
ACF/PACF, Stationarity and IID tests:
Statsmodels
Do
the tests, but visualize
the data

Correlograms are most


useful when viewed
together

Stationarity and
Autocorrelation tests
before during differencing ACF Plot PACF Plot

32 Confidential
Time Series Differencing and
Decomposition
Use differencing to improve
model quality
Prefer Additive/Multiplicative
decomposition over nave
decomposition

33 Confidential
Scale and Location Variance: Good Practices

Choose your moving statistic


window based on sensor and system

Use scalable implementations


(expected in Spark-TS)

Consider ARCH class of models to


address variance changes over time

* GARCH = Generalized Autoregressive (model) Conditional on Heteroscedasticity

34 Confidential
ARIMA Model for Sensor Data:
Statsmodels
ARIMA order :
Ensure
input to ARIMA is stationary (or : Order of autoregressive part
: degree of first differences to
differenced) stationarity
Ensure residuals are not autocorrelated : Order of moving average part

Choice of order arguments non-trivial


carefully evaluate ACF/PACF

35 Confidential
Vector Autoregression: Multivariate Time Series
Lagged Factors Respons
Always
provide stationary Feed (L1, e
L2)
series to VAR VAR
Variable
Temp
Consider which set of Speed (L1,
Model

equations you really want L2)

(there are sets) (L1, L2)

Experiment with different


lags and evaluate models
Key insight: Reduced order
models simplify but do
not oversimplify
36 Confidential
Granger Causality: Factor-Response Relationship

Use causality tests to


experiment with different
factors

Confirm causality based on


data and system
understanding (domain)
H0 : No Factor-Response Causality
HA : Factor-Response Causality Exists
Insight: System state
reasoning based on causality

37 Confidential
Vector Autoregression: Forecasting
Differenced Variable Data
Standard errors still mean a
small % of anomalies are
typical

Take Bayesian views in


addition to frequentist views VAR Fit and Forecast for Difference

Key insight: Real time


multivariate time series
modeling for sensor systems

38 Confidential
VAR-based Anomaly Detection
Real-time multivariate
anomaly detection can Data
Sensor
data

consider dynamic behavior store


Standard
transform
ations

Differenced data for VAR


Difference
d data
building real-time anomaly model Reduced
order VAR

detection engines
Residual Known
vs new
Smaller footprint compared analysis behavior

to ANN based approaches

39 Confidential
Model Evaluation: Information Criteria

ICs are penalty metrics that are used as evaluation criteria

40 Confidential
Evaluation: Scale Dependent
Metrics/Methods
Error based Model Residual diagnostics
Evaluation
RMS Error (RMSE) Good models will have
Mean Absolute Error (MAE) uncorrelated residuals
Residuals centered around zero
Mean Absolute % Error (MAPE)
Homoscedastic
Mean Abs. Scaled Error (MASE)
Normality

Insights:

Sophisticated models dont necessarily do better


Consider error metrics in conjunction with ICs

41 Confidential
Related Time-Series Techniques
Variants of VAR: Sophisticated
Factor augmented VAR Decomposition Models
VAR Moving Average Additive
Models that consider Multiplicative
variance State Space Models
ARCH Artificial Neural Networks
GARCH RNNs
Bayesian Inference LSTMs
Methods

42 Confidential
Sensor Data and Analysis Opportunities

43 Confidential
Capabilities for Sensor Data Analysis
Scale
Sensor data can quickly reach large scales
Big storage and big compute needs
Distributed data processing
specifically for time series

Aggregate analysis
Regression
Scalable system state
Classification and Clustering
Feature engineering modeling for large sensor
Text analytics
data sets (Spark-ts
specific plan)
Time Series
TS data profiling and IID tests
Signal processing
AR/VAR models
Decomposition models
Better data profiling
frameworks and tools
44
aggregate and time series
Confidential
A Call to Action: Design for Data
Offering Value Proposition

Methodology for IoT-enabled Operational efficiency


business transformation by a data- Predictive maintenance
rich core New Revenue opportunity
End to end consulting
Industry Better cross-selling
Data Science and support
Focus
Manufacturi
ng
Strate Instrumentation Data Science
gy
Energy

Design Implementati Support


Healthcare on

45 Confidential
Concluding Remarks

Design for Data


Testing underlying statistical assumptions
Insights on aggregate data analysis and ML
Insights on time series data analysis
Sensor data analysis opportunities
Questions and Comments
Thank You
Meet The Data Team at Booth #4 at Strata

Email: contact@thedatateam.in
Twitter: @carpedata
LinkedIn: https://www.linkedin.com/company/the-data-team

48

You might also like