Chapter 2: | Segmenting Customer Transactions Using a Pattern-Based Clustering Approach |
This is a limited free preview of this book. Please buy full access.
or
For each pattern Pa (Pa∈{P1, P2, …, Pm}) considered, we calculate the support of this pattern in cluster Ci and the support of the pattern in cluster Cj, then compute the relative difference between these two support values and aggregate these relative differences across all patterns. The support of a pattern in a cluster is the proportion of the transactions containing that pattern in the cluster. The intuition behind the definition of difference is that the support of the patterns in one cluster should be different from the support of the patterns in the other cluster if the underlying behavioral patterns are different. Here we use the relative difference between two support values instead of the absolute difference. The reason is that we consider the difference between 3% and 9% greater than the difference between 90% and 96%, even though they have the same absolute difference. The denominator in the definition is the average value of the two support values. In Appendix 1 we show that under certain natural distributional assumptions the difference metric above is maximized when the correct clusters are discovered.
The set of patterns we consider are the frequent patterns in the original data set. Let FIS be the set of all frequent itemsets based on the entire transaction data (before any clustering). Specifically, FIS = {is1, is2, … , isp} where isi is an itemset in FIS. Given a clustering, we calculate the support for each itemset in FIS for each cluster and use Definition 1 to