The Online Customer:  New Data Mining and Marketing Approaches
Powered By Xquantum

The Online Customer: New Data Mining and Marketing Approaches By ...

Chapter 2:  Segmenting Customer Transactions Using a Pattern-Based Clustering Approach
Read
image Next

This is a limited free preview of this book. Please buy full access.


pattern-based clustering and the modeling framework are presented in Section 2.5. Related work for both pattern-based clustering and segmentation-based modeling is covered in Section 2.6. In Section 2.7 conclusions are presented.

2.2 Pattern-Based Clustering of Web Transactions

How pattern-based clustering works will depend on the choice of pattern representation. In this research we focus on clustering Web transactions and argue that itemsets (Agrawal et al. 1995) are a good representation. Hence in Section 2.2.1 we first describe the domain in some detail to motivate the choice of itemsets as the pattern representation scheme. Subsequently in Section 2.2.2 we present the objective function of our pattern-based clustering.

2.2.1 Features of Web Transactions

In this essay we consider users’ Web transactions as data records containing categorical and numeric attributes created from the raw data consisting of a series of URLs visited. In our experiments we start with real session-level Web browsing data for users (a session contains a list of consecutive hits within a span of 30 minutes) and create 46 features describing the session/transaction. The terms session and transaction are used interchangeably. The specific features are listed in Table 2.

Note that the features include items concerning time (e.g., average time spent per page), quantity (e.g., number of sites visited), and order of pages visited (e.g., first site); these features include both categorical and numeric types. A conjunction of atomic conditions on these attributes is a good representation for common behavioral patterns in the transaction data. For example, {starting_time = morning, average_time_page < 2 minutes, num_categories = 3, total_time < 10 minutes} is a behavioral pattern that may capture a user’s specific “morning” pattern of Web usage that involves looking at multiple sites (e.g., work email,