You are on page 1of 17

28/06/2018 Machine Learning with ML.Net and C#/VB.

Net - CodeProject

Machine Learning with ML.Net and C#/VB.Net


Dirk Bahle, 28 Jun 2018

Solving the Classification problem with ML.Net Version 0.2.

Wikipedia_SentimentAnalysis.zip Wikipedia_SentimentAnalysis_VB.zip
YouGotSpam_Analysis.zip YouGotSpam_Analysis_VB.zip
LanguageDetection.zip LanguageDetection_VB.zip
IrisClassification.zip IrisClassification_VB.zip
IrisClassification_uint.zip IrisClassification_uint_VB.zip

Index
Introduction
Background
Overview

Supervised Machine Learning

Binary Classification
Sentiment Analysis Wikipedia

Training Stage
Prediction Stage

You Got Spam

Multiclass Classification
Language Detection
Iris Flower Classification

Version 1
Version 2

Conclusions
References
History

Introduction
This article introduces machine learning in .Net without touching the mathematical side of things. It will focus on essential work-
flows and their structures of the data handling in .Net to facilitate experimentation with what is available in an open source project
ML.Net version 0.2.

The ML.Net project version 0.2 is available for .Net Core 2.0 and .Net Standard 2.0 with support for x64 architecture only (Any
CPU will not compile right now). It should, thus, be applicable in any framework where .Net Standard 2.0 (eg.: .Net Framework
4.6.1) is applicable. The project is currently on review. APIs may change in the future.

Background

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 1/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
Learning the basics of machine learning has not not been easy, if you want to use an object oriented language like C# or VB.Net.
Because most of the time you have to learn Python, before anything else, and then you have to find tutorials with sample data that
can teach you more. Even looking at object oriented projects like [1] Accord.Net, Tensor.Flow, or CNTK is not easy because each of
them comes with their own API, way of implementing same things differently, and so on. I was thrilled by the presentations at Build
2018 [2] because they indicated that we can use a generic work-flow approach that allows us to evaluate the subject with local
data, local .Net programs, local models, and results, without having to use a service or another programming language like Python.

Overview
Machine learning is a subset of Artificial Intelligence (AI) and it can answer 5 types of questions [3]:

Supervised

1. Classification (Binary and Multiclass)


Question: What class does it belong to?

2. Regression
Question: How much or how many?

Unsupervised

1. Ranking
Question: What should I do next?

2. Clustering
Question: How is this organized?

3. Anomaly Detection
Question: Is this weird?

Each type of question has many applications and in order to use the correct machine learning approach we must first try to
determine if we want to answer any of the given questions, and if so, whether we have the data to support it.

Supervised Machine Learning


This article discusses working .Net examples (source code including sample data) for binary and multiclass classifications. This
type of machine learning algorithm assumes that we can tag an item to determine whether it belongs to:

1. One of two groups (binary classification) or


2. One of many groups (multiclass classification)

A binary classification can be applied when you want to answer a question with a true or false answer. You usually find yourself
sorting an item (an image or text) into one of 2 classes. Consider, for instance, the question of whether a customer feedback to your
recent survey is in a good mood (positive) or not (negative).

Answering this question with machine learning requires us to tag sample items (eg: images or text) as belonging to either group.
The normal work-flow requires two independent sets of tagged data:

1. A Training Data Set (to train the machine learning algorithm) and
2. An Evaluation Data Set (to measure the efficiency of the ml algorithm).

A tagged line of text may look like this:

1 Grow up you biased child.


0 I hope this helps.

where "1" in the first column denotes a negative sentiment and "0" in the first column denotes a positive sentiment. The rule of
thumb is usually that the ml algorithm will work better if we have more training data. And it should also be assured that the training
data and the data used later on is clean and of high quality to support an effective algorithm.

The overall work-flow to determine an effective algorithm using KPIs is denoted by the diagram on the left side below, where we
(ideally) find a model that reflects our classification problem best. The model is not explained in more detail here. It is, in the case
of ML.Net, a zip file containing the persisted facts learned from the tagged training data.

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 2/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject

The second independent data set for evaluation is used to determine KPIs towards the efficiency of the learned classification. This
steps estimates how good our algorithm will classify items in the future by comparing the result from the machine learning algorithm
with the available tag (without using the tag in the algorithm). A KPI to measure efficiency is, for example, the percentage of the
number of items classified right versus the wrong classified items. We can always go back to the training step, and adjust
parameters, or swap one algorithm for the other, if we find that our KPIs do not meet our expectations and we need ways to
optimize the model.

The Training stage hopefully ends with an effective model which can be applied in the second Prediction stage to classify each
item that we see in the future. This stage requires the model from the previous stage and the item to classify, which is used to
output a prediction of a classification (eg.: positive or negative sentiment).

This is a brief overview on the work-flow for attended machine learning. We need to understand this to work with the code samples
discussed in this article further below. So, lets look at each sample in turn.

Binary Classification

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 3/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject

Sentiment Analysis Wikipedia


The sample discussed in this section is based on A Sentiment Analysis Binary Classification Scenario from the ML.Net tutorial.

Wikipedia_SentimentAnalysis.zip

Training Stage
The work-flow discussed in the previous section is implemented to some degree in the demo projects attached to this article. The
demo project contains two executable projects:

Training and
Prediction

We get the following output if we compile and start the Training project:

Training Data Set


-----------------
Not adding a normalizer.
Making per-feature arrays
Changing data from row-wise to column-wise
Processed 250 instances
Binning and forming Feature objects
Reserved memory for tree learner: 1943796 bytes
Starting to train ...
Not training a calibrator because it is not needed.

Evaluating Training Results


---------------------------

PredictionModel quality metrics evaluation


------------------------------------------
Accuracy: 61,11%
Auc: 96,30%
F1Score: 72,00%

We see here how the program first trains a model and evaluates the result in the second step.

The Training and the Prediction modul share a reference to the previously mentioned Model.zip file (most be copied manually -
see details below), a reference to ML.Net library, and a common model of the data input and the classification output defined in the
Models project:

public class ClassificationData


{
[Column(ordinal: "0", name: "Label")]
public float Sentiment;

[Column(ordinal: "1")]
public string Text;
}

public class ClassPrediction


{
[ColumnName("PredictedLabel")]
public bool Class;
}

Public Class ClassificationData


<Column("0", "Label")>
Public Sentiment As Single

<Column("1")>
Public Text As String
End Class

Public Class ClassPrediction


<ColumnName("PredictedLabel")>
Public [Class] As Boolean
End Class

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 4/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
The properties defined in ClassificationData map each column into an input that is present in the text input file. The
Label column defines the item that contains the class definition that we want to train against for each line of text. The Text
property itself cannot be labeled as a "Feature" because it consists of more than one "column" (int the text file). This is why we need
to add the new TextFeaturizer("Features", "Text") line in the pipeline below to read the text into the input
data structure.

The ClassificationData is a rough description of our input and how it should be mapped into either a Label or a Feature.
Try removing the Label column definition, compile and execute, to verify that the system will throw an exception, if a column named
Label cannot be found in the input text.

The ClassPrediction states only one binary output result, which is expected to be a Boolean value that maps the input
to either binary class. This part is relevant to:

1. Verify whether learning was succesful (with known input in the test phase) and
2. Determine the actual classifiaction of the machine learning algorithm when using its model in production.

In summery: The ClassificationData is used to descripe how we whish to process the input (always consisting of Label
and Features), and the ClassPrediction maps this input to a learned result.

The Training pipline that consumes the text input via the ClassificationData definition looks like this:

internal static async Task<PredictionModel<ClassificationData, ClassPrediction>>


TrainAsync(string trainingDataFile, string modelPath)
{
var pipeline = new LearningPipeline();

pipeline.Add(new TextLoader(trainingDataFile).CreateFrom<ClassificationData>());

pipeline.Add(new TextFeaturizer("Features", "Text"));

pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5,


MinDocumentsInLeafs = 2 });

PredictionModel<ClassificationData, ClassPrediction> model =


pipeline.Train<ClassificationData, ClassPrediction>();

// Saves the model we trained to a zip file.


await model.WriteAsync(modelPath);

// Returns the model we trained to use for evaluation.


return model;
}

Async Function TrainAsync(


ByVal trainingDataFile As String,
ByVal modelPath As String)
As Task(Of PredictionModel(Of ClassificationData, ClassPrediction))

Dim pipeline = New LearningPipeline()

pipeline.Add(New TextLoader(trainingDataFile).CreateFrom(Of ClassificationData)())

pipeline.Add(New TextFeaturizer("Features", "Text"))

pipeline.Add(New StochasticDualCoordinateAscentBinaryClassifier())

Dim model As PredictionModel(Of ClassificationData, ClassPrediction) =


pipeline.Train(Of ClassificationData, ClassPrediction)()

' Saves the model we trained to a zip file.


Await model.WriteAsync(modelPath)

' Returns the model we trained to use for evaluation.


Return model
End Function

The ML.Net framework comes with an extensible pipeline concept in which the different processing steps can be plugged in as
shown above. The TextLoader step loads the data from the text file and the TextFeaturizer step converts the given
input text into a feature vector, which is a numerical representation of the given text. This numerical representation is then fed into
something that the ML community calls a learner. The learner in this case a FastTreeBinaryClassifier.

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 5/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
A learner or trainer is the component that converts the numerical feature vectors into a model that can later be used to classify input
in the future. The documentation for these learners is currently under construction and not all learners are fully implemented and
tested, yet. For binary classifications, there are a few alternative learners that could be used as an alternative (just edit the
constructor as shown below):

Accu Auc F1S


Classification Method
racy core

new AveragedPerceptronBinaryClassifier() 61.1 81. 72.0


1% 48 0%
%

new FastForestBinaryClassifier() { NumThreads=2, NumLeaves = 25, 72.2 97. 78.2


NumTrees = 25, MinDocumentsInLeafs = 2 } 2% 53 6%
%

new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, 61,1 96, 72,0


MinDocumentsInLeafs = 2 } 1% 30 0%
%

new GeneralizedAdditiveModelBinaryClassifier() 50.0 83. 66.6


0% 95 7%
%

new LinearSvmBinaryClassifier() 72.2 90. 76.1


2% 12 9%
%

new LogisticRegressionBinaryClassifier() 50.0 86. 66.6


0% 42 7%
%

new StochasticDualCoordinateAscentBinaryClassifier 83.3 98. 85.7


3% 77 1%
%

new StochasticGradientDescentBinaryClassifier() 55.5 90. 69.2


6% 12 3%
%

Testing all of the above learners and shows that the StochasticDualCoordinateAscentBinaryClassifier
works best based on the measured KPIs. These KPIs are measured by an instance of the
BinaryClassificationMetrics which also offers other KPIs, such as, Precision and Recall. Note that you can still
analyse more KPIs, , such as, memory consumption and processing time, which are also not measured here. The test presented is
rather small and brief. We can also use different settings of individual learners which may still reveal significant improvements.
Being able to play with these different scenarious looks like an interesting excercise when we face the problem of an automated
classification of a large amount of items (text or images etc).

So, this in a nutshell how machine learning can work. The machine consumes data (text), converts it into numerical vectors, and
integrates the vectorized data into a model. The Model is the main output of the first stage. Lets have a look at the Classification
stage to understand the complete work-flow.

Prediction Stage
The Prediction stage is the modul that represents the code that runs in production and classifies data as new items arrive in the
system. This part is implemented in the PredictAsync method of the Prediction project in the Wikipedia_SentimentAnalysis
solution. The code for this method looks like this:

async Task<PredictionModel<ClassificationData, ClassPrediction>> PredictAsync(


string modelPath,
string[] classNames,
IEnumerable<ClassificationData> predicts = null,
PredictionModel<ClassificationData, ClassPrediction> model = null)
{
if (model == null)
model = await PredictionModel.ReadAsync<ClassificationData, ClassPrediction>(modelPath);

if (predicts == null) // do we have input to predict a result?


return model;

IEnumerable<ClassPrediction> predictions = model.Predict(predicts);

Console.WriteLine("Classification Predictions");

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 6/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
IEnumerable<(ClassificationData sentiment, ClassPrediction prediction)>
sentimentsAndPredictions =
predicts.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));

foreach (var item in sentimentsAndPredictions)


{
string textDisplay = item.sentiment.Text;

if (textDisplay.Length > 80)


textDisplay = textDisplay.Substring(0, 75) + "...";

Console.WriteLine("Prediction: {0} | Text: '{1}'",


(item.prediction.Class ? classNames[0] : classNames[1]),
textDisplay);
}
Console.WriteLine();

return model;
}

Async Function PredictAsync(ByVal modelPath As String,


ByVal Optional classNames As String() = Nothing,
ByVal Optional predicts As IEnumerable(Of ClassificationData) =
Nothing,
ByVal Optional model As PredictionModel(Of ClassificationData,
ClassPrediction) = Nothing)
As Task(Of PredictionModel(Of ClassificationData, ClassPrediction))
If model Is Nothing Then
model = Await PredictionModel.ReadAsync(Of ClassificationData, ClassPrediction)
(modelPath)
End If

If predicts Is Nothing Then Return model

Console.WriteLine()
Console.WriteLine("Classification Predictions")
Console.WriteLine("--------------------------")

For Each Item In predicts


Dim predictedResult = model.Predict(Item)

Dim textDisplay As String = Item.Text

If textDisplay.Length > 80 Then


textDisplay = textDisplay.Substring(0, 75) & "..."
End If

Dim resultClass = classNames(0)


If predictedResult.[Class] = False Then resultClass = classNames(1)

Console.WriteLine("Prediction: {0} | Text: '{1}'", resultClass, textDisplay)


Next
Console.WriteLine()

Return model
End Function

The PredictionModel.ReadAsync line in the method loads the model from the file system into an in-memory
PredictionModel:

PredictionModel<ClassificationData, ClassPrediction> model = await


PredictionModel.ReadAsync<ClassificationData, ClassPrediction>
(modelPath);

model = Await PredictionModel.ReadAsync(Of ClassificationData, ClassPrediction)(modelPath)

The model loaded is stored in the project's Learned folder. This Model.zip file has to be copied from the Training moduls
output whenever we find a significant improvement and want to take advantage of it in the Prediction modul.

Everything below the model loading code line evaluates input against the loaded model and outputs a predicted classification in the
last part of the method. You can use the interactive input prompt to test sample texts of your own and test on a small scale what

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 7/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
was learned and what was not. Remember that the learned data is usually cleaned (not the same as the original input) and that you
can only test on a small scale like this. A better and more reasonable test is probably to feed in the last n text lines from a real data
source, get their classification, and see if an independent reviewer has a closely matching result or not.

We have seen in this section how binary classification can work for sentiment analysis in a very "simple" scenario. But the real
strength of ml is that each type of question (here: Is this A or B?) can be applied in a wide variety of applications. Let's review one
more sample in the next section to review another binary classification use case.

You Got Spam


YouGotSpam_Analysis.zip
YouGotSpam_Analysis_VB.zip

The data for the sample discussed in this section is based on the codeproject article You've Got Spam. The aim of this binary
classification project is that we want to know determine whether a given text should be classified as Spam or not.

The source code attached to this article for the YouGotSpam_Analysis solution is almost identical to the code explained in the last
section. Even the execute-able projects are in fact almost identical. The only difference here is the datasource for training and test
evaluation, which is in this case the test data from the above codeproject article (see Data folder in Training project). The Training
project produces this output:

Training Data Set


-----------------
Not adding a normalizer.
Making per-feature arrays
Changing data from row-wise to column-wise
Processed 2000 instances
Binning and forming Feature objects
Reserved memory for tree learner: 24082752 bytes
Starting to train ...
Not training a calibrator because it is not needed.

Evaluating Training Results


---------------------------

PredictionModel quality metrics evaluation


------------------------------------------
Accuracy: 100,00%
Auc: 100,00%
F1Score: 100,00%

...which indicates that we can reach the same KPIs as indicated in the original article based on Python.

You can again use the Prediction project to load a model from the file system and test it with further input.

The projects discussed so far have shown that ML.Net can be helpful to determine a binary classification in an automated fashion.
But what if I want to classify more than 2 classes (eg: negative, neutral, and positive sentiment)? The next section examines
classifying data for this use case.

Multiclass Classification
Language Detection
LanguageDetection.zip
LanguageDetection_VB.zip

The data for the sample discussed in this section was downloaded from http://wortschatz.uni-leipzig.de and pre-processed
(removed quote character ") for improved parsing experience.

The multiclass classifications use case discussed here is the detection of a language based on a given text. Just imagine, you have
teams of social media agents and you are trying to relay online customer feedback (eg. chats), in different languages, to the correct
team that speaks that language.

The LanguageDetection solution attached to this section follows the structure of the previously discussed binary classification
samples. We have a Training project, a Prediction project, and a Models class library that is shared between the executables.
The Training project can be used to create a model with a particular learner. A successful model can then be copied from the
Training project to the Prediction project for consumation and multiclass classification of future input.

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 8/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
The Prediction project of the LanguageDetection solution differs in the way of how we define the LanguageClass property
in the ClassificationData class and the Class property in the ClassPrediction class. Both properties must be
of the data type float to support multible classifications:

public class ClassificationData


{
[Column(ordinal: "0", name: "Label")]
public float LanguageClass;

[Column(ordinal: "1")]
public string Text;
}

public class ClassPrediction


{
[ColumnName("PredictedLabel")]
public float Class;
}

Public Class ClassificationData


<Column("0", "Label")>
Public LanguageClass As Single

<Column("1")>
Public Text As String
End Class

Public Class ClassPrediction


<ColumnName("PredictedLabel")>
Public [Class] As Single
End Class

The input mapping in ClassificationData is the same as the one in the binary classification problem. The only difference
is not that we have more than two values in the Label column of the text file that is being fed in.

The output mapping in ClassPrediction is different because we now have to map to a float value in order to classify
towards more than one class.

The required training pipeling looks like this:

async Task<PredictionModel<ClassificationData, ClassPrediction>>


TrainAsync(string trainingDataFile, string modelPath)
{
var pipeline = new LearningPipeline();

pipeline.Add(new TextLoader(trainingDataFile).CreateFrom<ClassificationData>());

pipeline.Add(new Dictionarizer("Label"));
pipeline.Add(new TextFeaturizer("Features", "Text"));

pipeline.Add(new StochasticDualCoordinateAscentClassifier());
//pipeline.Add(new LogisticRegressionClassifier());
//pipeline.Add(new NaiveBayesClassifier());

pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn =


"PredictedLabel" });

// Train the pipeline based on the dataset that has been loaded, transformed.
PredictionModel<ClassificationData, ClassPrediction> model =
pipeline.Train<ClassificationData, ClassPrediction>();

await model.WriteAsync(modelPath); // Saves the model we trained to a zip file.

return model;
}

Async Function TrainAsync(ByVal trainingDataFile As String,


ByVal modelPath As String)
As Task(Of PredictionModel(Of ClassificationData, ClassPrediction))

Dim pipeline = New LearningPipeline()

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 9/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject

pipeline.Add(New TextLoader(trainingDataFile).CreateFrom(Of ClassificationData)())

pipeline.Add(New Dictionarizer("Label"))

pipeline.Add(New TextFeaturizer("Features", "Text"))

pipeline.Add(New StochasticDualCoordinateAscentClassifier())
'pipeline.Add(new LogisticRegressionClassifier());
'pipeline.Add(new NaiveBayesClassifier());

pipeline.Add(New PredictedLabelColumnOriginalValueConverter() With


{
.PredictedLabelColumn = "PredictedLabel"
})

' Train the pipeline based on the dataset that has been loaded, transformed.
Dim model As PredictionModel(Of ClassificationData, ClassPrediction) =
pipeline.Train(Of ClassificationData, ClassPrediction)()

' Saves the model we trained to a zip file.


Await model.WriteAsync(modelPath)

' Returns the model we trained to use for evaluation.


Return model
End Function

The Dictionarizer("Label"); step maps each line with a labeled input value (0-5) into a bucket. The
PredictedLabelColumnOriginalValueConverter maps the predicted value (a vector) to the original values
datatype (a float).

Compiling and running the Training modul gets us this output:

Training Data Set


-----------------
Not adding a normalizer.
Using 4 threads to train.
Automatically choosing a check frequency of 4.
Auto-tuning parameters: maxIterations = 48.
Auto-tuning parameters: L2 = 2.778334E-05.
Auto-tuning parameters: L1Threshold (L1/L2) = 1.
Using best model from iteration 8.
Not training a calibrator because it is not needed.

Evaluating Training Results


---------------------------

PredictionModel quality metrics evaluation


------------------------------------------
Accuracy Macro: 98.66%
Accuracy Micro: 98.66%
Top KAccuracy: 0.00%
LogLoss: 7.50%

PerClassLogLoss:
Class: 0 - 11.18%
Class: 1 - 4.08%
Class: 2 - 5.95%
Class: 3 - 10.43%
Class: 4 - 7.86%
Class: 5 - 5.52%

There are three multiclass classification learners in ML.Net Version 0.2. and their KPIs compare as indicated below:

Classification Method Output

new StochasticDualCoordinateAscentClassifier()
Accuracy Macro: 98.66%
Accuracy Micro: 98.66%
Top KAccuracy: 0.00%
LogLoss: 7.50%

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 10/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject

PerClassLogLoss:
Class: 0 - 11.18%
Class: 1 - 4.08%
Class: 2 - 5.95%
Class: 3 - 10.43%
Class: 4 - 7.86%
Class: 5 - 5.52%

new LogisticRegressionClassifier()
Accuracy Macro: 98.52%
Accuracy Micro: 98.52%
Top KAccuracy: 0.00%
LogLoss: 8.63%

PerClassLogLoss:
Class: 0 - 13.32%
Class: 1 - 4.67%
Class: 2 - 7.09%
Class: 3 - 11.50%
Class: 4 - 8.98%
Class: 5 - 6.19%

new NaiveBayesClassifier()
Accuracy Macro: 96.58%
Accuracy Micro: 96.58%
Top KAccuracy: 0.00%
LogLoss: 3,453.88%

PerClassLogLoss:
Class: 0 - 3,453.88%
Class: 1 - 3,453.88%
Class: 2 - 3,453.88%
Class: 3 - 3,453.88%
Class: 4 - 3,453.88%
Class: 5 - 3,453.88%

So, this is how we can multiclass classify text based on one Feature input column. The same machine learning approach (binary of
multiclass) is also available for more than one feature input column, as we will see next.

Iris Flower Classification


Version 1
IrisClassification.zip
IrisClassification_VB.zip

The Multiclass classification problem discussed in this section is a well known reference test in the pattern recognition community
[4]. The original database was created by Ronald Fisher in 1936 and ML.Net sample reviewed here comes from the Get Started
section of the ML.Net tutorial. The problem statement is to create an algorithm that will accept an input vector of multiple float
values (representing properties of the flower), and the output of that algorithm should be the most likely name of the flower.

Doing this in ML.Net requires us to create an input mapping with more than one column:

public class ClassificationData


{
[Column("0")]
public float SepalLength;

[Column("1")]
public float SepalWidth;

[Column("2")]
public float PetalLength;

[Column("3")]

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 11/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
public float PetalWidth;

[Column("4")]
[ColumnName("Label")]
public string Label;
}

Public Class ClassificationData


<Column("0")>
Public SepalLength As Single

<Column("1")>
Public SepalWidth As Single

<Column("2")>
Public PetalLength As Single

<Column("3")>
Public PetalWidth As Single

<Column("4")>
<ColumnName("Label")>
Public Label As String
End Class

We are inputing a set of feature columns (namely SepalLength, SepalWidth, PetalLength, PetalWidth) that is later combined into
one Features vector. The Label is in this case a string that is given as last column to identify each data row during the training and
test stage of the algorithm.

The result of the predicted class should be (not surprisingly) be a string:

public class ClassPrediction


{
[ColumnName("PredictedLabel")]
public string Class;
}

Public Class ClassPrediction


<ColumnName("PredictedLabel")>
Public [Class] As String
End Class

The training code for this case is very similar to the previous section:

async Task<PredictionModel<ClassificationData, ClassPrediction>>


TrainAsync(string trainingDataFile, string modelPath)
{
var pipeline = new LearningPipeline();

pipeline.Add(new TextLoader(trainingDataFile).CreateFrom<ClassificationData>(separator:
','));

pipeline.Add(new Dictionarizer("Label"));

pipeline.Add(new ColumnConcatenator("Features", "SepalLength", "SepalWidth", "PetalLength",


"PetalWidth"));

pipeline.Add(new StochasticDualCoordinateAscentClassifier());

pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn =


"PredictedLabel" });

// Train the pipeline based on the dataset that has been loaded, transformed.
PredictionModel<ClassificationData, ClassPrediction> model =
pipeline.Train<ClassificationData, ClassPrediction>();

await model.WriteAsync(modelPath);

return model;
}

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 12/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject

Async Function TrainAsync(ByVal trainingDataFile As String,


ByVal modelPath As String)
As Task(Of PredictionModel(Of ClassificationData, ClassPrediction))

Dim pipeline = New LearningPipeline()

pipeline.Add(New TextLoader(trainingDataFile).CreateFrom(Of ClassificationData)


(separator:=","c))

pipeline.Add(New Dictionarizer("Label"))

pipeline.Add(New ColumnConcatenator("Features", "SepalLength", "SepalWidth", "PetalLength",


"PetalWidth"))

pipeline.Add(New StochasticDualCoordinateAscentClassifier())

pipeline.Add(New PredictedLabelColumnOriginalValueConverter() With {


.PredictedLabelColumn = "PredictedLabel"
})

' Train the pipeline based on the dataset that has been loaded, transformed.
' Saves the model we trained to a zip file.
Dim model As PredictionModel(Of ClassificationData, ClassPrediction) =
pipeline.Train(Of ClassificationData, ClassPrediction)()

Await model.WriteAsync(modelPath)

' Returns the model we trained to use for evaluation.


Return model
End Function

There are only two new things here. The raw input data is in this case a comma seperated list, therefore, we have to use a
separator: ',' parameter when loading the data from the text file in the pipeline. And we use the
ColumnConcatenator to convert the set of feature columns into one column consisting of a vector named Features.
The output is similar to what we've seen before (and we can again experiment with the other two learners as shown in the last
section):

Training Data Set


-----------------
Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn
this behavior off.
Using 4 threads to train.
Automatically choosing a check frequency of 4.
Auto-tuning parameters: maxIterations = 45452.
Auto-tuning parameters: L2 = 2.667051E-05.
Auto-tuning parameters: L1Threshold (L1/L2) = 0.
Using best model from iteration 1956.
Not training a calibrator because it is not needed.

Evaluating Training Results


---------------------------

PredictionModel quality metrics evaluation


------------------------------------------
Accuracy Macro: 95.73%
Accuracy Micro: 95.76%
Top KAccuracy: 0.00%
LogLoss: 8.19%

PerClassLogLoss:
Class: 0 - 0.72%
Class: 1 - 10.62%
Class: 2 - 13.43%

Again, we can use the Training module of the IrisClassification solution to train different learners and settings and use the
Prediction module to predict new classifications with the previously determined model.

We have seen in this section how 4 input columns (SepalLength, SepalWidth, PetalLength, PetalWidth) are converted into one
vectorized Features column using the ColumnConcatenator converter. An equivalent approach that does not require us to
use a ColumnConcatenator in the pipeline code is to use the following input class definition:

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 13/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject

public class ClassificationData


{
public float SepalLength
{
get { return Features[0]; }
set { Features[0] = value; }
}

public float SepalWidth


{
get { return Features[1]; }
set { Features[1] = value; }
}

public float PetalLength


{
get { return Features[2]; }
set { Features[2] = value; }
}

public float PetalWidth


{
get { return Features[3]; }
set { Features[3] = value; }
}

[Column("0-3")]
[ColumnName("Features")]
[VectorType(4)] public float[] Features = new float[4];

[Column("4")]
[ColumnName("Label")]
public string Label;
}

Public Class ClassificationData


Public Property SepalLength As Single
Get
Return Features(0)
End Get
Set(ByVal value As Single)
Features(0) = value
End Set
End Property

Public Property SepalWidth As Single


Get
Return Features(1)
End Get
Set(ByVal value As Single)
Features(1) = value
End Set
End Property

Public Property PetalLength As Single


Get
Return Features(2)
End Get
Set(ByVal value As Single)
Features(2) = value
End Set
End Property

Public Property PetalWidth As Single


Get
Return Features(3)
End Get
Set(ByVal value As Single)
Features(3) = value
End Set
End Property

<Column("0-3","Features")>

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 14/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
<VectorType(4)>
Public Features As Single() = New Single(3) {}
<Column("4", "Label")>
Public Label As String
End Class

But it is a bad practice to define the actual feature set through the ClassificationData definition as shown above. We
should, therefore, remove the [ColumnName("Features")] line and add the new
ColumnConcatenator("Features", nameof(Digit.Features)) in the pipeline code instead. This design
can give us more flexability when trying to evaluate different feature configurations.

Version 2
IrisClassification_uint.zip
IrisClassification_uint_VB.zip

Let us suppose for a moment that we do not want the machine learning algorithm to handle strings (since we really want to localize
that part of the application). It would be a better practice to go back to handling integer values and tread each integer as an index to
indicate the classification (type of flower). But how exactly can this be done? We can change the definition of the input and
predicted output like so:

public class ClassificationData


{
public float SepalLength
{
get { return Features[0]; }
set { Features[0] = value; }
}

public float SepalWidth


{
get { return Features[1]; }
set { Features[1] = value; }
}

public float PetalLength


{
get { return Features[2]; }
set { Features[2] = value; }
}

public float PetalWidth


{
get { return Features[3]; }
set { Features[3] = value; }
}

[Column("0-3")]
[ColumnName("Features")]
[VectorType(4)] public float[] Features = new float[4];

[Column("4")]
[ColumnName("Label")]
public float Label;
}

public class ClassPrediction


{
[ColumnName("PredictedLabel")]
public uint Class;
}

Public Class ClassificationData


Public Property SepalLength As Single
Get
Return Features(0)
End Get
Set(ByVal value As Single)
Features(0) = value
End Set
End Property
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 15/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject

Public Property SepalWidth As Single


Get
Return Features(1)
End Get
Set(ByVal value As Single)
Features(1) = value
End Set
End Property

Public Property PetalLength As Single


Get
Return Features(2)
End Get
Set(ByVal value As Single)
Features(2) = value
End Set
End Property

Public Property PetalWidth As Single


Get
Return Features(3)
End Get
Set(ByVal value As Single)
Features(3) = value
End Set
End Property

<Column("0-3")>
<ColumnName("Features")>
<VectorType(4)>
Public Features As Single() = New Single(3) {}

<Column("4")>
<ColumnName("Label")>
Public Label As Single
End Class

Public Class ClassPrediction


<ColumnName("PredictedLabel")>
Public [Class] As UInteger
End Class

Next, we will have to remove the PredictedLabelColumnOriginalValueConverter from the pipeline of the
previous solution and this is how we can adjust for this scenario (assuming we adjusted the data as well). This approach can also
be verified in the attached IrisClassification_uint solution.

Conclusions
The reviewed sample applications have shown that ML.Net has an interesting value (even at version 0.2) when it comes to
delivering machine learning into the .Net framework. We have seen that binary and multiclass classification can be based on
different types of input and output. This input and output always requires:

a Label and a Features column as input and


a PredictedLabel column as output.

The data types of the inputs and outputs are flexible because converters can be used to convert values into numbers and vectors
when feeding the input into the engine and the same conversion is obviously possible when we have to interprete the result of a
classification.

I hope this article was useful and helps getting started with the subject. Give me your feedback in the form of stars or let me know if
you see essential things to add or change since this could help us all to develop this ML.Net based apps even further.

References
[1] Machine Learning Frameworks
ML.Net
Accord.Net
Tensor Flow
Microsoft Cognitive Toolkit (CNTK)

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 16/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
[2] ML.Net at Build 2018

Introducing ML.NET : Build 2018


Fun ways to Explore ML.NET : Build 2018
Introducing ML.NET

[3] Machine Learning for the Absolute Beginner

Data Science for Beginners video 1: The 5 questions data science answers
Demystifying Machine and Deep Learning for Developers : Build 2018
ML.Net Gloassary

[4] Pattern Recognition with the Iris Data Set


Iris flower data set (Wikipedia)
Implementing Simple Neural Network using Keras – With Python Example
Train Iris Data by Batch using CNTK and C#
Java Machine Learning Library (Java-ML)Java Machine Learning Library (Java-ML)
Iris flower classification using mlp in matlab

History
2018-Jun-18 Added VB.Net samples and minor bugfix in Wikipedia sample (changed default learner to best default learner
instead of using worst learner by default).

License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author


Dirk Bahle
G
e
r
m
a
n
y

The Windows Presentation Foundation (WPF) and C# are among my favorites and so I developed Edi

and a few other projects on GitHub. I am normally an algorithms and structure type person but WPF has such interesting UI
sides that I cannot help myself but get into it.

https://de.linkedin.com/in/dirkbahle

Comments and Discussions


0 messages have been posted for this article Visit https://www.codeproject.com/Articles/1249611/Machine-Learning-with-
ML-Net-and-Csharp-VB-Net to post and view comments on this article, or click here to get a print view with messages.

Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile Article Copyright 2018 by Dirk Bahle
Web03-2016 | 2.8.180627.1 | Last Updated 28 Jun 2018 Everything else Copyright © CodeProject, 1999-2018

https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 17/17

You might also like