The Online Customer: New Data Mining and Marketing Approaches By Yinghui Yang

The Online Customer: New Data Mining and Marketing Approaches By ...

Chapter 2:

Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

Read

ABOUT
- Data
- Description
- Reviews
- About Author
- Cover
RESEARCH
- Bookmarks
- Notes
FONT
- Font Size
  - x-small
  - small
  - medium
  - large
  - x-large
- Line Height
  - 1.00
  - 1.25
  - 1.50
  - 1.75
  - 2.00
- Letter Spacing
  - Default
  - 1 pixel
  - 2 pixels
  - 3 pixels
  - 4 pixels
DOWNLOAD
- Complete Edition
PAGE
- TOC
- Copyright
- Foreword
- Preface
- References
- Subject Index
- Author Index

This is a limited free preview of this book. Please buy full access.

and that the functions defined below for the clusters are for the general case. However, in the implementation described in Section 4, we collapse the numeric attributes into categories in order to discover frequent itemsets from data using existing algorithms such as A priori (Agrawal et al. 1995). Below we describe the objective function of the pattern-based clustering method.

2.2.2 Objective Function for Pattern-Based Clustering

Consider a collection of transactions to be clustered {T₁, T₂, … , T_n}. Each transaction T_i contains a subset of a list of candidate items {i₁, i₂, … , i_m}. A clustering C is a partition {C₁, C₂, … , C_k} of {T₁, T₂, … , T_n} and each C_i is a cluster. The goal of this method is to maximize the difference between clusters and the similarity of transactions within clusters. We cluster to maximize a quantity M, where M is defined as follows:

Here we only give a specific definition for the difference between two clusters. This is sufficient since hierarchical clustering techniques can be used to cluster the transactions repeatedly into two groups in such a way that the process results in clustering the transactions into an arbitrary number of clusters (which is generally desirable because the number of clusters does not have to be specified up front).

Now, we define the difference between two clusters.

Definition 1 (Difference between two clusters):

Let S_ad = the number of transactions containing pattern P_a in cluster C_d.

|C_d| is the number of transactions in cluster C_d.

Let , and S (P_a, C_d) is called the support of pattern P_a in cluster C_d. It is the fraction of transactions in cluster C_d that contain pattern P_a.