- \( d(x,x) \ge 0, \forall x \)
- \( d(x,y) = 0 \Leftrightarrow x = y \)
- \( d(x,y) = d(y,x) \)
- \( d(x,y) \le d(x,z) + d(z,y) \)
Fortunately, some clustering algorithms are applicable to pairwise dissimilarity matrices computed using (symmetric) non-metric dissimilarity functions (as well as distance functions in arbitrary metric spaces). For example, [1] describes the SLINK algorithm for single-linkage hierarchical clustering, and mentions that it only requires a "symmetric non-negative dissimilarity coefficient" for which \(d(a,a)=0, \forall a \). Also, [2] presents the k-medoids algorithm, and explicitly states that "such dissimilarities do not have to satisfy the triangle inequality" and provides an example of application for such a scenario.
While there exist other clustering algorithms, such as [3] and [4], that do not require a vector space representation of the objects to be clustered, they do require that the function used to compute the dissimilarity matrix is actually a metric.
[1] Sibson. "SLINK: An optimally efficient algorithm for the single-link cluster method"
(http://www.cs.gsu.edu/~wkim/index_files/papers/sibson.pdf)
[2] L. Kaufman and P.J. Rousseeuw. "Clustering by means of medoids"
(ftp://ftp.win.ua.ac.be/pub/preprints/87/Clumea87.pdf)
[3] J. Zhou and J. Sander. "Data Bubbles for Non-Vector Data: Speeding-up Hierarchical
Clustering in Arbitrary Metric Spaces"
(http://www.vldb.org/conf/2003/papers/S14P03.pdf)
[4] V. Ganti et al. "Clustering Large Datasets in Arbitrary Metric Spaces"
(http://www.cs.virginia.edu/~cyberia/papers/ICDE99.pdf)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.