Wednesday, June 4, 2014

Weka with Associations

1. Open data file  (supermarket)

image

2. In our data we can find 217 attribute and 4,627 Instance means transactions.

image

3. Here I am investigate data with Weka view

image

There is no much mean of department attribute since we looking items that is purchased. As you can see, it is big data set;

4627 * 217 =

It is big data set and some data attributes are meaningless for me so I will remove them for now. We can eliminate those attribute from execution of algorithm.

4.  After select all department attribute in data set now we click on ‘Invert’. Now we have selected all attributes that is interested to us (non departments)

image

[NOTE]

Finale item is called ‘total’, I can be amount customer paid for particular purchased. So not much important to us for now

5. Now we go to Associate tab in Weka, We can change attributes in algorithm by click on top option bar.

image

6. ‘Output item set’ true will show the item set in results console.

7.  Apriori[1] is an algorithm for frequent item set mining, Here is weka java class interface[] on apriori
Confidence

Lift
Leverage
Conviction

 

Now we run ‘start’ and wait, you will see result

image

Here is summary of the out come. In top ypu can see attribute log such as  ‘support’ and ‘confidence’ that we set for algo.

Apriori
=======

Minimum support: 0.15 (694 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 17

Best rules found:

1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723    conf:(0.92)
2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696    conf:(0.92)
3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705    conf:(0.92)
4. biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746    conf:(0.92)
5. party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779    conf:(0.91)
6. biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725    conf:(0.91)
7. baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701    conf:(0.91)
8. biscuits=t fruit=t total=high 954 ==> bread and cake=t 866    conf:(0.91)
9. frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757    conf:(0.91)
10. frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877    conf:(0.91)


---------

 

Decision on outcome

In here we can see conf is 0.92

when customer buy biscuits, frozen foods, fruit with total bill is high. he\she have bought ‘bread and cake’

When we look for next one it also same. No much interested information can be found so now we can do some attribute change and try it.

 

2nd try get best rule

Make it Minimum support: 0.5 (2314 instances)

and run algo.

result

image

Decision on outcome of 2nd

Since Minimum support: 0.5 we will not able to get rule with support was very high.

In data mining we have to change attributes and check for best out come

Now we will reduce confidence to 0.5 and run our algo again. Here we will get some rules but not much stronger as 1st result confidence where it was 0.92

image

 

Best rules found:

1. milk-cream=t 2939 ==> bread and cake=t 2337    conf:(0.8)
2. fruit=t 2962 ==> bread and cake=t 2325    conf:(0.78)
3. bread and cake=t 3330 ==> milk-cream=t 2337    conf:(0.7)
4. bread and cake=t 3330 ==> fruit=t 2325    conf:(0.7)

[NOTE]

As we have big data set we can try more good information with good confidence rate.

Now we change algo and try.

image 

Just to tell you will need some far good machine or increase you memory heap if not you can see such message or alert!

image

Weka also support clean you heap

image

Here is few rules from

Scheme:       weka.associations.Apriori -N 10 -T 1 -C 1.1 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c –1 (metric type Lift)

 

1. fruit=t 2962 ==> bread and cake=t vegetables=t 1791    conf:(0.6) < lift:(1.22)> lev:(0.07) [319] conv:(1.27)
2. bread and cake=t vegetables=t 2298 ==> fruit=t 1791    conf:(0.78) < lift:(1.22)> lev:(0.07) [319] conv:(1.63)
3. bread and cake=t fruit=t 2325 ==> vegetables=t 1791    conf:(0.77) < lift:(1.2)> lev:(0.07) [303] conv:(1.56)
4. vegetables=t 2961 ==> bread and cake=t fruit=t 1791    conf:(0.6) < lift:(1.2)> lev:(0.07) [303] conv:(1.26)
5. baking needs=t 2795 ==> margarine=t 1645    conf:(0.59) < lift:(1.19)> lev:(0.06) [262] conv:(1.23)
6. margarine=t 2288 ==> baking needs=t 1645    conf:(0.72) < lift:(1.19)> lev:(0.06) [262] conv:(1.41)
7. biscuits=t 2605 ==> frozen foods=t 1810    conf:(0.69) < lift:(1.18)> lev:(0.06) [280] conv:(1.35)
8. frozen foods=t 2717 ==> biscuits=t 1810    conf:(0.67) < lift:(1.18)> lev:(0.06) [280] conv:(1.31)
9. fruit=t 2962 ==> vegetables=t 2207    conf:(0.75) < lift:(1.16)> lev:(0.07) [311] conv:(1.41)
10. vegetables=t 2961 ==> fruit=t 2207    conf:(0.75) < lift:(1.16)> lev:(0.07) [311] conv:(1.41)

 

Few result with Apriori

Minimum support: 0.1 (463 instances)
Minimum metric <leverage>: 0.1

No rule came from here

-------------------

Minimum support: 0.45 (2082 instances)
Minimum metric <conviction>: 1.1
Number of cycles performed: 11

1. vegetables=t 2961 ==> fruit=t 2207    conf:(0.75) lift:(1.16) lev:(0.07) [311] < conv:(1.41)>
2. fruit=t 2962 ==> vegetables=t 2207    conf:(0.75) lift:(1.16) lev:(0.07) [311] < conv:(1.41)>
3. biscuits=t 2605 ==> bread and cake=t 2083    conf:(0.8) lift:(1.11) lev:(0.04) [208] < conv:(1.4)>
4. milk-cream=t 2939 ==> bread and cake=t 2337    conf:(0.8) lift:(1.1) lev:(0.05) [221] < conv:(1.37)>
5. fruit=t 2962 ==> bread and cake=t 2325    conf:(0.78) lift:(1.09) lev:(0.04) [193] < conv:(1.3)>
6. baking needs=t 2795 ==> bread and cake=t 2191    conf:(0.78) lift:(1.09) lev:(0.04) [179] < conv:(1.29)>
7. frozen foods=t 2717 ==> bread and cake=t 2129    conf:(0.78) lift:(1.09) lev:(0.04) [173] < conv:(1.29)>
8. vegetables=t 2961 ==> bread and cake=t 2298    conf:(0.78) lift:(1.08) lev:(0.04) [167] < conv:(1.25)>
9. bread and cake=t 3330 ==> milk-cream=t 2337    conf:(0.7) lift:(1.1) lev:(0.05) [221] < conv:(1.22)>
10. bread and cake=t 3330 ==> fruit=t 2325    conf:(0.7) lift:(1.09) lev:(0.04) [193] < conv:(1.19)>

references

[1] http://weka.sourceforge.net/doc.stable/weka/associations/Apriori.html

[2] http://weka.sourceforge.net/doc.dev/weka/associations/FilteredAssociator.html

No comments:

Post a Comment