The Online Customer: New Data Mining and Marketing Approaches By Yinghui Yang

The Online Customer: New Data Mining and Marketing Approaches By ...

Chapter 2:

Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

Read

ABOUT
- Data
- Description
- Reviews
- About Author
- Cover
RESEARCH
- Bookmarks
- Notes
FONT
- Font Size
  - x-small
  - small
  - medium
  - large
  - x-large
- Line Height
  - 1.00
  - 1.25
  - 1.50
  - 1.75
  - 2.00
- Letter Spacing
  - Default
  - 1 pixel
  - 2 pixels
  - 3 pixels
  - 4 pixels
DOWNLOAD
- Complete Edition
PAGE
- TOC
- Copyright
- Foreword
- Preface
- References
- Subject Index
- Author Index

In the next section we describe GHIC, an algorithm for pattern-based clustering.

2.3 The Clustering Algorithm

The ideal algorithm will be one that maximizes M (defined in previous section). However, for the objective function defined above, if there are n transactions and two clusters that we are interested in learning, the number of possible clustering schemes to examine is 2ⁿ. Hence, a heuristic approach is called for.

After we obtain FIS using association rule discovery algorithms, as described in Section 2, we convert the initial transactions into the format shown in Table 3, where rows represent transactions to be clustered and columns represent itemsets. A “1” in a cell indicates that a certain transaction contains a certain itemset. We use T′ = {T′₁, T′₂, … , T′_n} to represent this new transaction set. Instead of clustering the original transactions, we convert the problem into clustering these binary vectors and present a divisive hierarchical algorithm. The entire transaction set is first divided into two clusters. Each cluster is further divided into two clusters if it has more than a predefined number of transactions. This process is repeated until no cluster is big enough to be divided further. In addition to cluster size, we assume that the stopping conditions are user specified and can contain additional criteria such that if a cluster’s size is bigger than the threshold but its quality is very good (contains very similar transactions), we can stop dividing that cluster.

Table 3. Transactions Represented Initemsets

	is₁	is₂	…	is_p
T₁	1	0	…	0
T₂	1	0	…	1
…	…	…	…	…
T_n	0	1	…	0