The Apriori Algorithm: Minimum Confidence Threshold

Introduction
• Mining for associations among items in a large

The Apriori Algorithm database of sales transaction is an important
database mining function.
Andrew Kusiak
Intelligent Systems Laboratory
2139 Seamans Center
The University of Iowa • For example, the information that a customer who
Iowa City, Iowa 52242 - 1527 purchases a keyboard also tends to buy a mouse at
Tel: 319 - 335 5934 Fax: 319 - 335 5669
the same time is represented in association rule
andrew-kusiak@uiowa.edu below: Keyboard ⇒ Mouse [support = 6%,
http://www.icaen.uiowa.edu/~ankusiak confidence = 70%]
The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory
Association Rules
Minimum Support Threshold
• Based on the types of values, the • The support of an association pattern is the percentage
association rules can be classified into two of task-relevant data transactions for which the pattern
categories: Boolean Association Rules and is true.
Quantitative Association Rules
• Boolean Association Rule: Keyboard ⇒ IF A⇒ B
Mouse [support = 6%, confidence = 70%]
• Quantitative Association Rule: (Age = 26 # _ tuples _ containing _ both _ A _ and _ B
support ( A ⇒ B ) =
… 30) ⇒ (Cars =1, 2) [Support 3%, total _# _ of _ tuples
confidence = 36%]
Minimum Confidence Threshold Itemset

• Confidence is defined as the measure of certainty or
trustworthiness associated with each discovered • A set of items is referred to as itemset.
pattern.
• An itemset containing k items is called k-
IF A ⇒ B itemset.
• An itemset can also be seen as a
# _ tuples _ containing _ both _ A _ and _ B conjunction of items (or a predicate)
confidence ( A ⇒ B ) =
# _ tuples _ containing _ A
1
Frequent Itemsets Strong Rules
• Suppose min_sup is the minimum support • Rules that satisfy both a minimum support
threshold. threshold and a minimum confidence threshold are
called strong.
• An itemset satisfies minimum support if the
occurrence frequency of the itemset is
greater than or equal to min_sup.
• If an itemset satisfies minimum support,
then it is a frequent itemset.
Association Rule Mining Apriori Algorithm (1)

• Find all frequent itemsets • Apriori algorithm is an influential algorithm
• Generate strong association rules from the for mining frequent itemsets for Boolean
frequent itemsets association rules.
Apriori Algorithm (2) Association rule mining process

• Uses a Level-wise search, where k-itemsets (An • Find all frequent itemsets:
itemset that contains k items is a k-itemset) are • Each support S of these frequent itemsets will at
used to explore (k+1)-itemsets, to mine frequent least equal to a pre-determined min_sup (An itemset
itemsets from transactional database for Boolean is a subset of items in I, like A)
association rules. • Generate strong association rules from the frequent
• First, the set of frequent 1-itemsets is found. This itemsets:
set is denoted L1. L1 is used to find L2, the set of • These rules must be the frequent itemsets and must
frequent 2-itemsets, which is used to fine L3, and satisfy min_sup and min_conf.
so on, until no more frequent k-itemsets can be
found.
2
Step1
Scan the transaction database to
get the support S of each 1-itemset,
compare S with min_sup, and
get a set of frequent 1-itemsets, L1
Apriori Property
Step3
Scan the transaction database to get
the support S of each candidate
Step2 k-itemset in the final set, compare S • Reducing the search space to avoid finding of each
Use L k-1 join L k-1 to generate a set with min_sup, and get a set of
of candidate k-itemsets. And use
Apriori property to prune the
frequent k-itemsets, L k Lk requires one full scan of the database
unfrequented k-itemsets from this
set • If an itemset I does not satisfy the minimum
support threshold, min_sup, the I is not frequent,
NO that is, P (I) < min_sup.
Step4:
The candidate set = Null
• If an item A is added to the itemset I, then the
YES
resulting itemset (i.e., I∪A) cannot occur more
Step6
Step5
frequently than I. Therefore, I ∪A is not frequent
For every nonempty subset s of l,
output the rule " s => (l-s)" if confidence For each frequent itemset l, generate either, that is, P (I ∪A) < min_sup.
C of the rule " s => (l-s)" (=support S of all nonempty subsets of l
l/support S of s) ³ min_conf
TID List of item_IDs

T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
Apriori Algorithm
T500 I1, I3
Apriori Algorithm T600

T700
T800
T900
I2, I3
I1, I3
I1, I2, I3, I5
I1, I2, I3
1-Itemsets Sup-count 2-Itemsets Sup-count

1 6 1, 2 4
TID List of item_IDs 2 7
Join 1, 3 4
3 6 1, 4 1
1, 5 2
T100 I1, I2, I5 4 2
2, 3 4
5 2
T200 I2, I4 2, 4 2
T300 I2, I3 1-Itemsets Sup-count 2, 5 2
3, 4 0
I1 6
T400 I1, I2, I4 Min support = 2 3, 5 1
I2 7 4, 5 0
T500 I1, I3 I3 6
T600 I2, I3 Prune
I4 2
T700 I1, I3 I5 2 Frequent Sup-count
T800 I1, I2, I3, I5 Frequent Sup-count 2-Itemsets
T900 I1, I2, I3 3-Itemsets
Join 1, 2 4
1, 3 4
1, 2, 3 2 1, 5 2
# _ tuples _ containing _ both _ A _ and _ B 1, 2, 5 2 2, 3 4
support ( A ⇒ B ) = 2, 4 2
total _# _ of _ tuples
2, 5 2
Solution Procedure
Problem data
Step 1 min_sup = 40% (2/5)
An example with a transactional data D contents a list of 5
transactions in a supermarket.
C1 L1
Item_ID Item Support Item_ID Item Support
TID List of items (item_IDs) I1 Beer 4/5 I1 Beer 4/5
I2 Diaper 4/5 I2 Diaper 4/5
1 Beer(I1), Diaper(I2), Baby Powder(I3), Bread(I4), Umbrella(I5)
I3 Baby powder 2/5 I3 2/5
Baby
2 Diaper(I2), Baby Powder(I3) I4 Bread 1/5 I6 2/5
Milk
3 Beer(I1), Diaper(I2), Milk(I6) I5 Umbrella 1/5
I6 Milk 2/5
4 Diaper(I2), Beer(I1), Detergent(I7)
I7 Detergernt 1/5
5 Beer(I1), Milk(I6), Coca Cola (I8) I8 Coca cola 1/5
# _ tuples _ containing _ both _ A _ and _ B
support ( A ⇒ B ) =
total _# _ of _ tuples
3
Solution Procedure Solution Procedure
Step 2 Item_ID Item Support
{I1, I2} Beer, Diaper 3/5
{I1, I3} 1/5
Step 4: L2 is not Null, so repeat Step2
Beer, Baby powder
C2 {I1, I6} Beer, Milk 2/5 Item_ID Item
{I2, I3} Diaper, Baby powder 2/5 {I1, I2, I3} Beer, Diaper, Baby powder
{I2, I6} Diaper, Milk 1/5 {I1, I2, I6} Beer, Diaper, Milk
{I3, I6} Baby powder, Milk 0 (I1, I3, I6} Beer, Baby powder, Milk
(I2, I3, I6} Diaper, Baby powder, Milk
Step 3 Item_ID Item Support
{I1, I2} Beer, Diaper 3/5
L2 {I1, I6} Beer, Milk 2/5
C3 =Null
{I2, I3} Diaper, Baby powder 2/5
Solution Procedure Solution Procedure

TID List of items (item_IDs)
Step 5 1 Beer(I1), Diaper(I2), Baby Powder(I3), Bread(I4), Umbrella(I5)
2 Diaper(I2), Baby Powder(I3)
min_sup=40% min_conf=70%
3 Beer(I1), Diaper(I2), Milk(I6)
Item_ID Item Support( A B) Support A Confidence 4 Diaper(I2), Beer(I1), Detergent(I7)
I1 I2 Beer Diaper 60% 80% 75% 5 Beer(I1), Milk(I6), Coca Cola (I8)
I1 I6 Beer Milk 40% 80% 50%
I2 I3 Diaper Baby powder 40% 80% 50% Item_ID Support( A B) Support A Confidence
Item
I2 I1 Diaper Beer 60% 80% 75%
I1 I2 Beer Diaper 60% 80% 75%
I6 I1 Milk Beer 40% 40% 100%
I1 I6 Beer Milk 40% 80% 50%
I3 I2 Baby powder Diaper 40% 40% 100%
I2 I3 Diaper Baby powder 40% 80% 50%
I2 I1 Diaper Beer 60% 80% 75%
# _ tuples _ containing _ both _ A _ and _ B # _ tuples_ containing_ both_ A _ and _ B
support ( A ⇒ B ) = confidence( A ⇒ B) = I6 I1 Milk Beer 40% 40% 100%
total _# _ of _ tuples # _ tuples_ containing_ A
I3 I2 Baby powder Diaper 40% 40% 100%
Solution Procedure Results

• Some rules are believable, like Baby powder ⇒
Step 6 Diaper.
min_sup = 40% min_conf = 70%
• Some rules need additional analysis, like Milk ⇒
Strong rules Support Confidence Beer.
I1=> I2 Beer=> Diaper 60% 75%
I2=> I1 Diaper=> Beer 60% 75%
I6 => I1 Milk=> Beer 40% 100% • Some rules are unbelievable, like Diaper ⇒ Beer.
I3 => I2 Baby powder=> Diaper 40% 100%
Note this example could contain unreal results

because its small data.
4
Reference
• J. Han, M. Kamber (2001), Data Mining, Morgan
Kaufmann Publishers, San Francisco, CA.
The University of Iowa Intelligent Systems Laboratory

The Apriori Algorithm: Minimum Confidence Threshold

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Apriori Algorithm: Minimum Confidence Threshold

Uploaded by

Copyright:

Available Formats

Introduction

• Mining for associations among items in a large

Minimum Confidence Threshold Itemset

Association Rule Mining Apriori Algorithm (1)

Apriori Algorithm (2) Association rule mining process

TID List of item_IDs

Apriori Algorithm T600

1-Itemsets Sup-count 2-Itemsets Sup-count

Solution Procedure Solution Procedure

Solution Procedure Results

Note this example could contain unreal results

The University of Iowa Intelligent Systems Laboratory

You might also like