Professional Documents
Culture Documents
Rajesh Sampathkumar
Senior Consultant
The Data Team
The Data Team: What We Do
3 Confidential
Talk Overview
5 Confidential
The Explosion of Sensor Data
Intelligence Intelligence
Management Management
Smart Processes
Smart Products
Measurement Measurement
IoT
Explosio
n
Cheap
Compact
Compute-
ready
Sensors
Sources of Sensor Data
7 Confidential
Typical Sensor Fusion Architecture
Position
(rad)
Steering
Computer
Rate Memory
(rad/s)
Layer
Speed
Speedomet (m/s)
er
Vehicle
Area
Accelerati
Network on (m/s2)
Interface Analog to Digital Digital Record
Conversion Layer Creation Layer
Gearbox
Gear
number
Control
Noise Parameter
Physical Noise Digital
System System
Environmental Environmental
9 Confidential
Design For Data
10 Confidential
Design for Data
Considerations of Data-Centric products
Digital transformation of
products and services by Data- Sensor
Measureme
Centric Power
data User
nt and
Requireme
Sampling
Interactions nts
Sensor data is at the Network
Data Diversity of
heart of this new considerati
security protocols
paradigm ons
Static Dynamic
Real time analysis analysis
Application needs define
processing considerati considerati
sensor integration and data ons ons
management
Sensor Characteristics and Analysis
Challenges
12 Confidential
Sensor Data Characteristics
Characteristics take on
importance based on use Static Dynamic
cases characteristics characteristics
Range Stability
Periodic sensor calibration Linearity Sensitivity
Bias Hysteresis
Accuracy Saturation
Interactions of Precision Dynamic
Range
characteristics are also
important
Sensor Characteristics Illustrated
Accuracy Bias Stability
Linearit
Accuracy: Precision
y
Subgroups Comparison
Gauge evaluation
approaches include
R&R analysis,
logical validation,
Type 1 studies
14 Confidential
Domain Considerations in Sensor
Characteristics
Sensor characteristic Product Manufacturing Considerations (Examples)
Accuracy Reduce sampling standard error, high accuracy for precision
manufacturing
Range and Dynamic Sensor range and dynamic range are important in video data
Range and image processing applications. In video applications,
dynamic sensitivity also.
15 Confidential
Modes of Sensor Data Analysis
Data summaries,
Transformation and Time series data
Query processing profiling and aggregate
numerosity handling analysis
analysis
Stream mode query ETL pipelines Aggregate analysis Time domain
processing Numerosity reduction descriptive and considerations
Batch and micro- Feature Engineering inferential statistics autoregressive
batch mode query Machine learning models
processing models for aggregate Multivariate models
analysis and cross correlation
Anomaly detection
and anomaly handling
Time Ordered Sensor Data: Key
Challenges
Managing and summarizing large data sets
Avoiding spurious correlations in Machine
Learning
Studying evolving or changing systems
Identifying points of failure, or failure modes
Detecting anomalies and supporting problem
solving
17 Confidential
Aggregate Sensor Data Analysis
18 Confidential
Aggregate Analysis: IID Data
What makes data independent and identically distributed?
Criterion Testing Approaches
Speed
2. Variance Inflation
Factor
(Multicollinearity)
20 Confidential
ML for Aggregate Sensor Data Analysis
22 Confidential
System State and Sensor Data
Individual sensor
readings can be
misleading about
system state
Sanitizing sensor
data is key
24 Confidential
System State Classification (GMM): Scikit-Learn
25 Confidential
Failure Mode Classification (Tree):
Scikit-Learn
Supervised multi-class classifiers
Burr
can be used to identify known
system failure modes
26 Confidential
One Class SVM Example: Outlier
Detection
Novel behavior may not
always be bad depends
on the domain
Physical significance of
extreme deviations
(anomalies)
28 Confidential
Sensor Data Time Series Analysis
29 Confidential
Considerations for Time Series
Analysis
Time
Autocorrelati Distribution Location Scale
ordered data
on fits invariance invariance
sets
30
Autoregressive Models for Sensor Data
IID Non-IID
Autocorrelation tests
and tests for IID, IID Data Non-IID data
stationarity Non-
Stationarity and
Autocorrelation tests
before during differencing ACF Plot PACF Plot
32 Confidential
Time Series Differencing and
Decomposition
Use differencing to improve
model quality
Prefer Additive/Multiplicative
decomposition over nave
decomposition
33 Confidential
Scale and Location Variance: Good Practices
34 Confidential
ARIMA Model for Sensor Data:
Statsmodels
ARIMA order :
Ensure
input to ARIMA is stationary (or : Order of autoregressive part
: degree of first differences to
differenced) stationarity
Ensure residuals are not autocorrelated : Order of moving average part
35 Confidential
Vector Autoregression: Multivariate Time Series
Lagged Factors Respons
Always
provide stationary Feed (L1, e
L2)
series to VAR VAR
Variable
Temp
Consider which set of Speed (L1,
Model
37 Confidential
Vector Autoregression: Forecasting
Differenced Variable Data
Standard errors still mean a
small % of anomalies are
typical
38 Confidential
VAR-based Anomaly Detection
Real-time multivariate
anomaly detection can Data
Sensor
data
detection engines
Residual Known
vs new
Smaller footprint compared analysis behavior
39 Confidential
Model Evaluation: Information Criteria
40 Confidential
Evaluation: Scale Dependent
Metrics/Methods
Error based Model Residual diagnostics
Evaluation
RMS Error (RMSE) Good models will have
Mean Absolute Error (MAE) uncorrelated residuals
Residuals centered around zero
Mean Absolute % Error (MAPE)
Homoscedastic
Mean Abs. Scaled Error (MASE)
Normality
Insights:
41 Confidential
Related Time-Series Techniques
Variants of VAR: Sophisticated
Factor augmented VAR Decomposition Models
VAR Moving Average Additive
Models that consider Multiplicative
variance State Space Models
ARCH Artificial Neural Networks
GARCH RNNs
Bayesian Inference LSTMs
Methods
42 Confidential
Sensor Data and Analysis Opportunities
43 Confidential
Capabilities for Sensor Data Analysis
Scale
Sensor data can quickly reach large scales
Big storage and big compute needs
Distributed data processing
specifically for time series
Aggregate analysis
Regression
Scalable system state
Classification and Clustering
Feature engineering modeling for large sensor
Text analytics
data sets (Spark-ts
specific plan)
Time Series
TS data profiling and IID tests
Signal processing
AR/VAR models
Decomposition models
Better data profiling
frameworks and tools
44
aggregate and time series
Confidential
A Call to Action: Design for Data
Offering Value Proposition
45 Confidential
Concluding Remarks
Email: contact@thedatateam.in
Twitter: @carpedata
LinkedIn: https://www.linkedin.com/company/the-data-team
48