Tuesday, September 30, 2008

MiRABIT ALGORITHM

The Apriori algorithm is the most basic algorithm for finding association rules in large databases but for those databases in which the number of items per transaction is less, the Apriori algorithm is not very optimum. For this purpose ‘MiRABIT’ Algorithm is used to optimize databases with low average of items per transaction. In Apriori algorithm in the first pass individual item support is counted and the items with at least minimum support are considered as large items. Later on for each pass candidate itemsets are found out. But the candidate itemset generation becomes important for the databases in which the items per transaction are less. For this purpose the MiRABIT algorithm generates candidate itemsets in each transaction rather than in the beginning of each pass. The MiRABIT algorithm makes multiple passes over the database. Like the Apriori algorithm in the first pass the individual item support is counted and all large items are considered. In the second pass at each transaction the items are grouped into sets of two items. Now for each set with two items it’s verified whether the subset is large or not. If it is large then the set support is counted or else it is not counted. Consequently all other passes are undergone and at the end of each pass all sets of candidates with at least minimum support are considered as large itemsets. This process continues until the large itemsets have only one element or are empty. The advantage of MiRABIT algorithm is that it generates a smaller candidate itemsets as compared to Apriori. Thus for the databases in which the average number of items per transaction is less, the performance of MiRABIT algorithm is better as compared to the Apriori algorithm.


Working Example for MiRABIT:

Database:
TID
ITEMS
100
XZ
200
MXZ
300
MXYZ
400
MN

Support = 50% of total number of transactions
=50% of 4
= 2.


Iteration-1:

Individual Item Support

ITEMSET
SUPPORT
{X}
3
{Z}
3
{M}
3
{Y}
1
{N}
1

In MiRABIT algorithm in Iteration-1 first the individual item support is counted as show in the table above.



Iteration – 2:
ITEMSET
SUPPORT
{XZ}
3
{MX}
2
{XY}
-
{MY}
-
{YZ}
-
{MZ}
2
{MN}
-










Then for Iteration-2 we find all candidate large 2- itemsets. Once this is done we need to find out support of only those candidate 2 -itemsets whose subsets are large. Thus in this example {Y} and {N} are not large itemsets because they have a support of only 1. Thus we only find out support of {XZ},{MX} and {MZ} and not for the others which involve {Y} or {N}. Thus {XZ},{MX} and {MZ} are large -2 itemsets.


Iteration-3:
ITEMSET
SUPPORT
{MXZ}
2

In iteration -3 we first find out all candidate large 3- itemsets from all large 2-itemsets. Since this contains only one itemset {MXZ} the algorithm terminates.

No comments: