Chapter 2: | Segmenting Customer Transactions Using a Pattern-Based Clustering Approach |
transactions from different users (without maintaining the user ID) – and examine how well pattern-based clustering functions in separating transactions that belong to the individual users compared with the traditional clustering techniques.
Motivated by the above argument, we investigate the utility of pattern-based clustering for grouping Web transactions. In particular, we argue that itemsets are a natural representation for patterns in Web transactions and present GHIC (Greedy Hierarchical Itemset-based Clustering), a pattern-based clustering algorithm for domains in which itemsets are the natural pattern representation.
After evaluating GHIC on 80 sub-datasets generated from a Web browsing data set, we further develop a modeling framework for building segment-level predictive models based on the pattern-based clustering approach and signature discovery techniques. Each category/cluster of customer transactions discovered by the pattern-based clustering approach can be characterized by its own distinguishing patterns. After we elicit multiple categories of customer transactions, we build one signature capturing the salient behavioral patterns for each category, as well as one predictive model for each category. In the prediction stage, a new transaction is compared with all the signatures, and the closest signature is chosen. Then this new transaction is assigned to the category of transactions for this signature and the model associated with this signature is used to predict this transaction (or the models combined using a weighting scheme). Experiments conducted on online purchasing data are used in this study to evaluate the modeling technique and compare the proposed approach with other approaches from data mining and marketing (RFM, GLIMMIX, and k-means).
The rest of this essay is structured as follows. In Section 2.2 we describe the domain and develop an objective function that will be used to generate pattern-based clusters. Section 2.3 describes GHIC, the technique used to generate the clusters. We present the segmentation-based modeling framework in Section 2.4. Experimental results for