Let s(k)ij be the similarity between two objects i and j according to their k-th feature. Also let δ(k)ij∈{0,1} indicate whether feature k of i and j can be compared (δ(k)ij=1) or not (δ(k)ij=0). For example, when i and/or j are missing feature k, then δ(k)ij=0. Accordingly, the overall similarity Sij between i and j can be written as
Sij=∑kw(x(k)i,x(k)j)s(k)ij∑kw(x(k)i,x(k)j)δ(k)ij
where x(k)i and x(k)j are the values for the k-th feature of i and j, respectively, and the w(x(k)i,x(k)j) are the weights assigned to the different features. Interestingly, the weights are expressed as a function of the feature values, rather than being constant. This allows for elegantly dealing with hierarchical features, among other things (see [1], Section 4.1).
[1] J. C. Gower. "A General Coefficient of Similarity and Some of Its Properties." Biometrics, Vol. 27, No. 4. (Dec., 1971), pp. 857-871. (http://venus.unive.it/romanaz/modstat_ba/gowdis.pdf)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.