Chapter 2: | Segmenting Customer Transactions Using a Pattern-Based Clustering Approach |
Figure 1. Algorithm GHIC

The main heuristic used in the hierarchical algorithm is as follows. Rather than considering the best single partition at each node in the tree – a procedure that will involve considering 2|D| possible partitions (where D is the data to be divided at a specific level of the tree) – we just consider |FIS| (<< 2|D|) partitions as described below. We consider a single itemset in FIS and divide the data into two parts such that all records containing the itemset are in one cluster and the remaining are in the other cluster. There are |FIS| possible ways to do this and we choose the specific partition for which the global objective M is maximized. This procedure is motivated by a reasonable assumption that while there are many different patterns that may differentiate two clusters, there is one dominating pattern that is present mostly in one of the clusters. As we show in the experiments below, based on the quality of the discovered clusters, this assumption appears to be a reasonable one.
2.4 Segmentation-Based Modeling
Segments discovered cannot always be used directly without a proper description. Segments must first be profiled with descriptors. Predictive