The application of chosen similarity measures for binary data in multivariate analysis in molecular experiments

Dariusz R. Mańkowski
Pracownia Ekonomiki Nasiennictwa i Hodowli Roślin, Zakład Nasiennictwa i Nasionoznawstwa, Instytut Hodowli i Aklimatyzacji Roślin — Państwowy Instytut Badawczy w Radzikowie (Poland)

Zbigniew Laudański

Katedra Ekonometrii i Statystyki, Wydział Zastosowań Informatyki i Matematyki, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie (Poland)

Monika Janaszek

Katedra Podstaw Inżynierii, Wydział Inżynierii Produkcji, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie (Poland)


The article presents the possibility of using eight measures of genetic similarity in analysis of binary data which are a mathematical image of the electrophoresis gels obtained in molecular studies. We characterized similarity measures: simple matching (Gower), Jaccard, Nei and Li (Dice), Hamann, Ochiai, Yule Y coefficient, Yule Q coefficient and zero-one equivalent of the Pearson correlation coefficient (phi 4-point correlation). Then, the example of a comparative analysis of 14 varieties of carrots (Daucus carota L.) presents the use of these measures in a multivariate analysis — UPGMA cluster analysis and principal coordinates analysis PCoA. The results of the analysis and the differences between them were presented and discussed. The similarity measures for the molecular data existing in the literature were compared in terms of results compliance obtained from statistical analyses.


PCoA, binary data, molecular analysis, similarity measures, cluster analysis, carrot

Mańkowski, D. R., Laudański, Z. and Janaszek, M. (2011) "The application of chosen similarity measures for binary data in multivariate analysis in molecular experiments", Bulletin of Plant Breeding and Acclimatization Institute, (262), pp. 155–173. doi: 10.37317/biul-2011-0014.


