The Online Customer: New Data Mining and Marketing Approaches By Yinghui Yang

The Online Customer: New Data Mining and Marketing Approaches By ...

Chapter 2:

Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

Read

ABOUT
- Data
- Description
- Reviews
- About Author
- Cover
RESEARCH
- Bookmarks
- Notes
FONT
- Font Size
  - x-small
  - small
  - medium
  - large
  - x-large
- Line Height
  - 1.00
  - 1.25
  - 1.50
  - 1.75
  - 2.00
- Letter Spacing
  - Default
  - 1 pixel
  - 2 pixels
  - 3 pixels
  - 4 pixels
DOWNLOAD
- Complete Edition
PAGE
- TOC
- Copyright
- Foreword
- Preface
- References
- Subject Index
- Author Index

In order to generate balanced clusters, we introduce another component to M. Also because each division creates two clusters, the revised M is specified as follows:

M (C₁, C₂) = K₁ · D(C₁, C₂) + K₂ · S(C₁) + K₃ · S(C₂) + ƒ_BALANCE (N₁, N₂)

K₁, K₂, K₃are user-specified weights used to bring the difference and similarity to relatively compatible scales and can be decided upon according to simulation. For example, in the simulation, the difference component ranges between 1000 and 2000, and the two similarity components range between 100 and 200. If the difference and similarity need to be considered equally, we can set K₁ as 1 and set K₂ and K₃ as 10.

D(C₁, C₂) represents the inter-cluster difference, S(C₁) and S(C₂) are the intra-cluster similarity for cluster-1 and cluster-2 respectively, and N₁and N₂ are the number of transactions in cluster-1 and cluster-2 respectively.f_BALANCE(N₁, N₂) can take one of the following formats.

ƒ_BALANCE(N₁, N₂) serves as a tool to balance the size of the two clusters generated. It can either be a linear function of the absolute difference between the size of the two clusters (A), or a discrete function of the relative difference of the size of the two clusters (B). In (B), if the size of cluster-1 is very different from the size of cluster-2 (case 2 in (B)), this balancing component of M will drive M to – ∞ meaning that this cluster will not be considered. K_l and K_h are the desired lower bound and upper bound of N₁/N₂. For a given clustering, if N₁/N₂ is out of these two bounds (clusters generated are unbalanced), this clustering will not be considered. GHIC is presented in Figure 1.