The Online Customer: New Data Mining and Marketing Approaches By Yinghui Yang

The Online Customer: New Data Mining and Marketing Approaches By ...

Chapter 2:

Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

Read

ABOUT
- Data
- Description
- Reviews
- About Author
- Cover
RESEARCH
- Bookmarks
- Notes
FONT
- Font Size
  - x-small
  - small
  - medium
  - large
  - x-large
- Line Height
  - 1.00
  - 1.25
  - 1.50
  - 1.75
  - 2.00
- Letter Spacing
  - Default
  - 1 pixel
  - 2 pixels
  - 3 pixels
  - 4 pixels
DOWNLOAD
- Complete Edition
PAGE
- TOC
- Copyright
- Foreword
- Preface
- References
- Subject Index
- Author Index

For each pattern P_a (P_a∈{P₁, P₂, …, P_m}) considered, we calculate the support of this pattern in cluster C_i and the support of the pattern in cluster C_j, then compute the relative difference between these two support values and aggregate these relative differences across all patterns. The support of a pattern in a cluster is the proportion of the transactions containing that pattern in the cluster. The intuition behind the definition of difference is that the support of the patterns in one cluster should be different from the support of the patterns in the other cluster if the underlying behavioral patterns are different. Here we use the relative difference between two support values instead of the absolute difference. The reason is that we consider the difference between 3% and 9% greater than the difference between 90% and 96%, even though they have the same absolute difference. The denominator in the definition is the average value of the two support values. In Appendix 1 we show that under certain natural distributional assumptions the difference metric above is maximized when the correct clusters are discovered.

The set of patterns we consider are the frequent patterns in the original data set. Let FIS be the set of all frequent itemsets based on the entire transaction data (before any clustering). Specifically, FIS = {is₁, is₂, … , is_p} where is_i is an itemset in FIS. Given a clustering, we calculate the support for each itemset in FIS for each cluster and use Definition 1 to