Tuesday, September 30, 2008

CASE STUDY 1

Retail organizations generate large volume of data called POS (Point of Sale) data through daily transactions. This data when collected and analyzed can be used to make important business decisions and plan strategies for innovations and competitive edge over their competitors. It is not possible to analyze this data manually and hence techniques such as data mining are used. To analyze these data it is necessary to use an appropriate framework, tool or environment. In this case study we discuss the use of Association rules to find the associations that exist between products in the market basket data.

Association rules technique has been widely used in retail industry with the name “Market Basket Analysis”. This technique when applied can deliver measurable benefits to the organizations such as improved profitability and improved quality of service. The transactional data can be quite challenging for the Data Mining approach due to it’s:

Massiveness: The transactional data collected is vast with thousands of transactions.
Sparseness: A basket contains only a small fraction of the total possible items.
Heterogeneity: Variability in purchasing behavior across different individual and purchasing pattern of an individual over a certain period of time.

Strategy:
Due to the massive amount of data collected it is difficult to explore it for analysis purpose as we can have as many as 4 trillion possible rules generated from 10,000 transactions. In other words, if the transactions are accumulated over a longer period of time then it will become all the more difficult to analyze the collected data. To overcome this situation the divide-and-conquer strategy can be powerful. It is based on problem decomposition, solution and aggregation. This means that it will be more advantageous if data is collected over smaller time intervals and then analyzed. The output is collected over each of these small intervals and then combined to solve the entire problem. All this is done by creating a separate database to store the results collected over smaller intervals and then retrieving interesting rules. A rule is interesting if it remains stable and satisfies a minimum threshold within a specific number of intervals. This strategy allows extraction of interesting rules with good support and confidence on a daily basis. Also, it helps in dumping the raw data after analysis, which can reduce the unwanted data in the database.

This strategy was applied to two supermarket stores in Porto Alegre, Brazil. Data corresponding to the purchases made by consumers over a period of one hundred and twenty days were used.

Mining Process:
The transactional data were recorded and associations were generated on a daily basis. Certain criterions were defined to reduce the number of rules discovered. The aggregation level was defined to treat products mainly on their functions and far from brand and major physical aspects. This approach enabled to reduce the number of transactions to 4500 and thus reducing the total number of possible rules to 10 million. But as a trade of to this, the approach required transformation of each product prior to mining. This considerably reduced the time spent on analysis although including extra time to get the data ready. Also, by the time of this conversion, all the operational transactions were removed to leave behind only the useful information for data mining.

In order to exclude the products that are not important in the total sales of the stores, the daily extracted rules were minimized by setting the extraction support parameter to a minimum limit of 1%. Therefore, the products below this minimum limit were excluded and were not considered for the rule formation. This step was also helpful in excluding the groups that didn’t represent 1% of the total number of transaction for that day. The products in the special offers were excluded from the base of rules for the number of days the products were in the special offer. The products in special offer were analyzed in a separate database so that they do not affect the final results of associations.

Results:
The resulting rules base stored more than 6000 extracted rules. The rules differed in the frequency of their occurrence. The resulting rules for both the stores were compared. In this the first store was selected as a reference to select the associations and then their presence was checked in the second store. There were many associations found common to both the rules base but for space constraints only six associations were shown. The associations were classified as “Usual and Non-usual”. Usual products are the one with shared use or application which justifies the choice of consumers whereas Non-usual products have no direct link between the associated product in terms of application or use. It is seen that both the “Usual and Non-usual products” have different usage patterns and intensities in both the stores. For the Usual products the stability in both the stores is recorded as around 100%. But the average of the confidence differs for both the stores. This indicates that the associations though stable in both the stores had different intensities. In case of the Non-usual products both the stability and confidence factors fluctuate considerably. In store A the associations for Non-usual products were greater than 95% whereas in store B the associations were well below the 95% suggested for a stable association. This study of the Non-usual products reveals that their associations are peculiar to the store in which they’ve been found and may not be found in another store. It also reveals that the Usual products’ associations can repeat themselves in the stores around the same culture. Thus, it can be concluded that the associations for the usual products tend to repeat themselves in a group of consumers of similar profile whereas for the association of the Non-usual products is specific to the store following the same degree of consumer specifics. The results also revealed that there were huge fluctuations in the occurrences of the associations of both “Usual and Non-usual products” in store B as compared to store A. But there are no significant differences in the confidence of the products in both Store A and Store B. This meant that confidence cannot be used as a classification criterion.

Conclusion:
From the above case study, it was concluded that association rules enable the retail organizations to analyze the data collected from the transactions to make intelligent decisions. The technique can be helpful in finding interesting associations from their transaction data. It can also be useful in cross selling of the associated products and carefully plan promotions for the products. It also helps in setting up the margins of profits of the associated products. But it is difficult to classify customers into specific classes and to set commercial value for the customers. The technique does not provide precise information, as the quantity involved is not addressed. This inclusion of quantity in the analysis can lead to better promotions and even increased profits.

No comments: