Professional Documents
Culture Documents
Introduction
As machine learning tools become increasingly easy to use,
the crucial challenge for data science researchers is the
process of data manipulation and creation of properly
designed data-sets that can be used to test ideas and validate
architectures.
Data-set generation
The MIT arrhythmia database contains 48 records, each with
2 signals of 650000 samples. That totals to over 60 million
samples or 120 thousand fragments with 500 samples width.
This number can be easily increased by signal augmentation,
generation of artificial signals, inclusion of other databases,
reducing the fragmenting window’s width and stride, etc.
Splitting the data boils down to choosing the ECG records for
each of the data-set.
Data-set access
To train any neural network using this data, we need to
provide a mechanism for extracting randomized batches of
any given shape. You might have come across a Tensorflow
implementation for the MNIST data set. We are going to use
similar approach, focusing on the next_batch() function.
Stay tuned for part 2.