Professional Documents
Culture Documents
Classification with
Unbalanced Data
A Research Paper by:
Urvesh Bhowan, Mengjie Zhang, Mark
Johnston
(Evolutionary Computation Research
Group,
Victoria University of Wellington, New
Zealand)
Presented by
Noorulain
Amina Asif
Pattern Recognition Lab
Department of Computer Science & Information Sciences
Pakistan Institute of Engineering & Applied Sciences
OUTLINE
Abstract of the paper
Introduction to the basic concepts
Classification
Unbalanced data
Performance bias
Introduction
Classification
a way of predicting class membership for a set of
examples using properties of the examples.
Unbalanced Dataset
Data sets having an uneven distribution of
class examples,
Minority class : a small number of examples in
dataset
Majority class: make up large part of the data
set.
Genetic Programming for Classification with Unbalanced Data
Introduction
Performance Bias:
poor accuracy on the minority class but high
accuracy on the majority class
Solution??
misclassification costs for minority class
examples
GP Approaches
2 GP approaches discussed
Adaptation of Fitness Function
Multi Objective Genetic Programming (MOGP)
Classification Strategy
Translates the output of a genetic program
(floating point number) into two class labels
using the division between positive and nonpositive
Minority class: Positive or 0
Majority class: Negative
Genetic Programming for Classification with Unbalanced Data
Predicted
Non Positive
Actual Positive
TP
FN
Actual Non
Positive
FP
TN
10
11
Weight given to
TPs
Weight given to
TNs
Proportion of
correctly classified
minority class
objects
Proportion of
correctly classified
majority class
objects
When W>0.5
Minority class accuracy given more importance
by a factor W
Goal
Explore the effectiveness of separability-based
evaluation metric
=Overall mean
Incorporates the
Values close to 2 => optimal fitness
ordering preference
negative, so
Values scaled between 0 and 1
1=> highest level of error
0=> no error
Where,
Pair-wise
comparisons
Larger Value => More
Separability
Summary
Having an unbalanced data set may cause
performance bias towards the majority
class
In GP Class Imbalance Problems can be
treated in two ways
Adapting the Fitness Function (introducing new
metrics)
Discussed Today
Weights
Correlation Ratio (Separability)
To be discussed in
Levels of Error
the next
MOGP
presentation
THANK YOU