I've recently read an interested paper published in 1971 [1] that presents a really well organized discussion on what it means to measure the similarity between two objects (or individuals), and how to deal with missing information and dichotomous features.
Let s(k)ij be the similarity between two objects i and j according to their k-th feature. Also let δ(k)ij∈{0,1} indicate whether feature k of i and j can be compared (δ(k)ij=1) or not (δ(k)ij=0). For example, when i and/or j are missing feature k, then δ(k)ij=0. Accordingly, the overall similarity Sij between i and j can be written as
Sij=∑kw(x(k)i,x(k)j)s(k)ij∑kw(x(k)i,x(k)j)δ(k)ij
where x(k)i and x(k)j are the values for the k-th feature of i and j, respectively, and the w(x(k)i,x(k)j) are the weights assigned to the different features. Interestingly, the weights are expressed as a function of the feature values, rather than being constant. This allows for elegantly dealing with hierarchical features, among other things (see [1], Section 4.1).
[1] J. C. Gower. "A General Coefficient of Similarity and Some of Its Properties." Biometrics, Vol. 27, No. 4. (Dec., 1971), pp. 857-871. (http://venus.unive.it/romanaz/modstat_ba/gowdis.pdf)