1. Open data file (supermarket)
2. In our data we can find 217 attribute and 4,627 Instance means transactions.
3. Here I am investigate data with Weka view
There is no much mean of department attribute since we looking items that is purchased. As you can see, it is big data set;
4627 * 217 =
It is big data set and some data attributes are meaningless for me so I will remove them for now. We can eliminate those attribute from execution of algorithm.
4. After select all department attribute in data set now we click on ‘Invert’. Now we have selected all attributes that is interested to us (non departments)
[NOTE]
Finale item is called ‘total’, I can be amount customer paid for particular purchased. So not much important to us for now
5. Now we go to Associate tab in Weka, We can change attributes in algorithm by click on top option bar.
6. ‘Output item set’ true will show the item set in results console.
7. Apriori[1] is an algorithm for frequent item set mining, Here is weka java class interface[] on apriori
Confidence
Lift
Leverage
Conviction
Now we run ‘start’ and wait, you will see result
Here is summary of the out come. In top ypu can see attribute log such as ‘support’ and ‘confidence’ that we set for algo.
Apriori
=======
Minimum support: 0.15 (694 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 17
Best rules found:
1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 conf:(0.92)
2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 conf:(0.92)
3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705 conf:(0.92)
4. biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746 conf:(0.92)
5. party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779 conf:(0.91)
6. biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725 conf:(0.91)
7. baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701 conf:(0.91)
8. biscuits=t fruit=t total=high 954 ==> bread and cake=t 866 conf:(0.91)
9. frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757 conf:(0.91)
10. frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877 conf:(0.91)
---------
Decision on outcome
In here we can see conf is 0.92
when customer buy biscuits, frozen foods, fruit with total bill is high. he\she have bought ‘bread and cake’
When we look for next one it also same. No much interested information can be found so now we can do some attribute change and try it.
2nd try get best rule
Make it Minimum support: 0.5 (2314 instances)
and run algo.
result
Decision on outcome of 2nd
Since Minimum support: 0.5 we will not able to get rule with support was very high.
In data mining we have to change attributes and check for best out come
Now we will reduce confidence to 0.5 and run our algo again. Here we will get some rules but not much stronger as 1st result confidence where it was 0.92
Best rules found:
1. milk-cream=t 2939 ==> bread and cake=t 2337 conf:(0.8)
2. fruit=t 2962 ==> bread and cake=t 2325 conf:(0.78)
3. bread and cake=t 3330 ==> milk-cream=t 2337 conf:(0.7)
4. bread and cake=t 3330 ==> fruit=t 2325 conf:(0.7)
[NOTE]
As we have big data set we can try more good information with good confidence rate.
Now we change algo and try.
Just to tell you will need some far good machine or increase you memory heap if not you can see such message or alert!
Weka also support clean you heap
Here is few rules from
Scheme: weka.associations.Apriori -N 10 -T 1 -C 1.1 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c –1 (metric type Lift)
1. fruit=t 2962 ==> bread and cake=t vegetables=t 1791 conf:(0.6) < lift:(1.22)> lev:(0.07) [319] conv:(1.27)
2. bread and cake=t vegetables=t 2298 ==> fruit=t 1791 conf:(0.78) < lift:(1.22)> lev:(0.07) [319] conv:(1.63)
3. bread and cake=t fruit=t 2325 ==> vegetables=t 1791 conf:(0.77) < lift:(1.2)> lev:(0.07) [303] conv:(1.56)
4. vegetables=t 2961 ==> bread and cake=t fruit=t 1791 conf:(0.6) < lift:(1.2)> lev:(0.07) [303] conv:(1.26)
5. baking needs=t 2795 ==> margarine=t 1645 conf:(0.59) < lift:(1.19)> lev:(0.06) [262] conv:(1.23)
6. margarine=t 2288 ==> baking needs=t 1645 conf:(0.72) < lift:(1.19)> lev:(0.06) [262] conv:(1.41)
7. biscuits=t 2605 ==> frozen foods=t 1810 conf:(0.69) < lift:(1.18)> lev:(0.06) [280] conv:(1.35)
8. frozen foods=t 2717 ==> biscuits=t 1810 conf:(0.67) < lift:(1.18)> lev:(0.06) [280] conv:(1.31)
9. fruit=t 2962 ==> vegetables=t 2207 conf:(0.75) < lift:(1.16)> lev:(0.07) [311] conv:(1.41)
10. vegetables=t 2961 ==> fruit=t 2207 conf:(0.75) < lift:(1.16)> lev:(0.07) [311] conv:(1.41)
Few result with Apriori
Minimum support: 0.1 (463 instances)
Minimum metric <leverage>: 0.1
No rule came from here
-------------------
Minimum support: 0.45 (2082 instances)
Minimum metric <conviction>: 1.1
Number of cycles performed: 11
1. vegetables=t 2961 ==> fruit=t 2207 conf:(0.75) lift:(1.16) lev:(0.07) [311] < conv:(1.41)>
2. fruit=t 2962 ==> vegetables=t 2207 conf:(0.75) lift:(1.16) lev:(0.07) [311] < conv:(1.41)>
3. biscuits=t 2605 ==> bread and cake=t 2083 conf:(0.8) lift:(1.11) lev:(0.04) [208] < conv:(1.4)>
4. milk-cream=t 2939 ==> bread and cake=t 2337 conf:(0.8) lift:(1.1) lev:(0.05) [221] < conv:(1.37)>
5. fruit=t 2962 ==> bread and cake=t 2325 conf:(0.78) lift:(1.09) lev:(0.04) [193] < conv:(1.3)>
6. baking needs=t 2795 ==> bread and cake=t 2191 conf:(0.78) lift:(1.09) lev:(0.04) [179] < conv:(1.29)>
7. frozen foods=t 2717 ==> bread and cake=t 2129 conf:(0.78) lift:(1.09) lev:(0.04) [173] < conv:(1.29)>
8. vegetables=t 2961 ==> bread and cake=t 2298 conf:(0.78) lift:(1.08) lev:(0.04) [167] < conv:(1.25)>
9. bread and cake=t 3330 ==> milk-cream=t 2337 conf:(0.7) lift:(1.1) lev:(0.05) [221] < conv:(1.22)>
10. bread and cake=t 3330 ==> fruit=t 2325 conf:(0.7) lift:(1.09) lev:(0.04) [193] < conv:(1.19)>
references
[1] http://weka.sourceforge.net/doc.stable/weka/associations/Apriori.html
[2] http://weka.sourceforge.net/doc.dev/weka/associations/FilteredAssociator.html
No comments:
Post a Comment