The application of chosen similarity measures for binary data in multivariate analysis in molecular experiments

Dariusz R. Mańkowski

d.mankowski@ihar.edu.pl
Pracownia Ekonomiki Nasiennictwa i Hodowli Roślin, Zakład Nasiennictwa i Nasionoznawstwa, Instytut Hodowli i Aklimatyzacji Roślin — Państwowy Instytut Badawczy w Radzikowie (Poland)
https://orcid.org/0000-0002-7499-8016

Zbigniew Laudański


Katedra Ekonometrii i Statystyki, Wydział Zastosowań Informatyki i Matematyki, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie (Poland)

Monika Janaszek


Katedra Podstaw Inżynierii, Wydział Inżynierii Produkcji, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie (Poland)

Abstract

The article presents the possibility of using eight measures of genetic similarity in analysis of binary data which are a mathematical image of the electrophoresis gels obtained in molecular studies. We characterized similarity measures: simple matching (Gower), Jaccard, Nei and Li (Dice), Hamann, Ochiai, Yule Y coefficient, Yule Q coefficient and zero-one equivalent of the Pearson correlation coefficient (phi 4-point correlation). Then, the example of a comparative analysis of 14 varieties of carrots (Daucus carota L.) presents the use of these measures in a multivariate analysis — UPGMA cluster analysis and principal coordinates analysis PCoA. The results of the analysis and the differences between them were presented and discussed. The similarity measures for the molecular data existing in the literature were compared in terms of results compliance obtained from statistical analyses.


Keywords:

PCoA, binary data, molecular analysis, similarity measures, cluster analysis, carrot

Backhaus K., Erichson B., Plinke W., Weiber R. 2000. Multivariaten Analysemethoden. Eine anwendungsorientierte Einführung. Springer, Berlin.
Google Scholar

Caliński T., Harabasz J. S. 1974. A dendrite method for cluster analysis. Communications in Statistics, vol. 3: 1 — 27.
Google Scholar

Chudzik H., Karoński M. 1979. Skupianie obserwacji metodą k-średnich. Roczniki AR w Poznaniu, Algorytmy Biomedyczne i Statystyczne, 78: 133 — 152.
Google Scholar

Davis L. G., Dibner M. D., Battey J. F. 1986. Basic methods in molecular biology. Elsevier Sci. Publ., New York: 42 — 43.
Google Scholar

Díaz-Perales A., Linacero R., Vázquez A. M. 2002. Analysis of genetic relationships among 22 European barley varieties based on two PCR markers. Euphytica, 129: 53 — 60.
Google Scholar

Dice L. R. 1945. Measures of the amount of ecologic association between species. Ecology, 26: 297 — 302.
Google Scholar

Duda R. O., Hart P. E. 1973. Pattern Classification and Scene Analysis. New York: John Wiley & Sons.
Google Scholar

Goodman M. M. 1972. Distance analysis in biology. Syst. Zool.: 174 — 186.
Google Scholar

Gower J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53: 325 — 338.
Google Scholar

Gower J. C. 1971. A general coefficient of similarity and some of its properties. Biometrics, 27: 857 — 874.
Google Scholar

Gower J. C. 1985. Measures of similarity, dissimilarity and distances. In: Klotz S. et al. (ed.), Encyclopedia of statistical sciences. Vol. 5. Wiley & Sons, New York, USA.
Google Scholar

Gower J. C., Legendre P. 1986. Metric and Euclidean properties of dissimilarity coefficients. J. Classification, 3: 5 — 48.
Google Scholar

Guilford J. 1936. Psychometric Methods. New York: McGraw–Hill Book Company, Inc.
Google Scholar

Guthridge K. M., Dupal M. P., Kölliker R., Jones E. S., Smith K. F., Forster J. W. 2001. AFLP analysis of genetic diversity within and between populations of perennial ryegrass (Lolium perenne L.). Euphytica, 122: 191 — 201.
Google Scholar

Hamann U. 1961: Merkmalsbestand und Verwandtschafsbeziehungen der farinosae. Ein Beitrag zum System der Monokotyledonen. Willdenowia, 2: 639 — 768.
Google Scholar

Harabasz J. S., Karoński M. 1977. Dendrytowa metoda analizy skupień. Roczniki AR w Poznaniu, Algorytmy Biomedyczne i Statystyczne, 57: 135 — 148.
Google Scholar

Hotelling H. 1933. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24: 417 — 441, 498 — 520.
Google Scholar

Huang X.-Q., Wolf M., Ganal M. W., Orford S., Koebner R. M. D., Röder M. S. 2007. Did modern plant breeding lead to genetic erosion in European winter wheat varieties? Crop Sci., 47: 343 — 349.
Google Scholar

Jaccard P. 1908. Nouvelles recherches sur la distribution florae. Bull. Soc. Vaud. Sci. Nat., 44: 223 — 270.
Google Scholar

Janaszek M. 2008. Identyfikacja cech korzeni marchwi jadalnej z wykorzystaniem komputerowej analizy obrazów. SGGW, Warszawa, rozprawa doktorska.
Google Scholar

Kaczmarek Z., Czajka S., Adamska E. 2008. Propozycja metody grupowania obiektów jedno i wielocechowych z zastosowaniem odległości Mahalanobisa i analizy skupień. Biuletyn IHAR, Nr 249: 9 — 18.
Google Scholar

Karoński M. 1971. Algorytm grupowania populacji w rozkładach metodą krok po kroku. Roczniki AR w Poznaniu, Algorytmy Biomedyczne i Statystyczne, 4: 30 — 33.
Google Scholar

Kenkel N. C. 2006. On selecting an appropriate multivariate analysis. Canadian Journal of Plant Science, 86: 663 — 676.
Google Scholar

Krzanowski W. J. 2004. Biplots for multifactorial analysis of distance. Biometrics, 60: 517 — 524.
Google Scholar

Lance G. M., Williams W. T. 1967. A general theory of classificatory sorting strategies. Hierarchical Systems, Computer Journal, 9: 373 — 380.
Google Scholar

Laudański Z., Mańkowski D. R. 2007. Planowanie i wnioskowanie statystyczne w badaniach rolniczych. IHAR Radzików.
Google Scholar

Lienert G. A., von Eye A. 1986. Yule-Coefficients for Second- and Higher-Order Associations. Biometrical Journal, 28: 539 — 545.
Google Scholar

Liu F., von Bothmer R., Salomon B. 2000. Genetic diversity in European accessions of the barley core collection as detected by isozyme electrophoresis. Genetic Resources and Crop Evolution, 47: 571 — 581.
Google Scholar

Manimekalai R., Nagarajan P. 2006. Interrelationships among coconut (Cocos nucifera L.) accessions using RAPD technique. Genetic Resources and Crop Evolution, 53: 1137 — 1144.
Google Scholar

Mc Queen J. B. 1966. Some methods for classification and analysis of multivariate observations. Proc. Fifth Barkeley Symposium on Mathematical Statistics and Probability Theory. Barkeley University of California Press, vol.1: 281 — 287.
Google Scholar

Moncada K. M., Ehlke N. J., Muehlbauer G. J., Sheaffer C. C., Wyse D. L., DeHaan L. R., 2007. Genetic variation in three native plant species across the State of Minnesota. Crop Sci., 47: 2379 — 2389.
Google Scholar

Nei M. 1978. The theory of genetic distance and evolution of human races. Jpn. J. Hum. Genet., 23: 341 — 369.
Google Scholar

Nei M., Li W. H. 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA, 76: 5269 — 5273.
Google Scholar

Ochiai A. 1957. Zoographic studies on the soleoid fishes found in Japan and its neighboring regions. Bull. Japan Soc. Sci. Fish., 22: 526 — 530.
Google Scholar

Rafalski A. 2004.Semi-specyficzny PCR w badaniach genetyczno-hodowlanych roślin. Monografie i Rozprawy Naukowe IHAR, Nr 23.
Google Scholar

Rao C. R. 1964. The use and interpretation of principal component analysis in applied research. Sankhyã, A26: 329 — 358.
Google Scholar

Reif J. C., Melcinger A. E., Frisch M. 2005. Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Science, 45: 1 — 7.
Google Scholar

Sarle W. S. 1983. Cubic Clustering Criterion. SAS Technical Report A-108, Cary, NC: SAS Institute Inc.
Google Scholar

SAS Institute Inc. 2009. SAS/STAT 9.2 user’s guide. Second edition. SAS Institute Inc., Cary, NC, USA.
Google Scholar

Siatkowski I., Goszczurna T., Szabelska A., Zyprych J. 2010. Coefficients of dissimilarity and similarity with application. Colloquium Biometricum, 40: 13 — 23.
Google Scholar

Sieczko L. 2003. Kryteria wstępnego przecięcia dendrogramu w hierarchicznej analizie skupień. Colloquium Biometryczne, 33: 249 — 258.
Google Scholar

Sneath P. H. A., Sokal R. R. 1973. Numerical taxonomy. Freeman, San Francisco.
Google Scholar

Sokal R. R., Michener C. D. 1958. A statistical method for evaluating systemic relationships. University of Kansas Science Bulletin, 38: 1409 — 1438.
Google Scholar

Takezaki N., Nei M. 1996. Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics, 144: 389 — 399.
Google Scholar

Timm N. H. 2002. Applied multivariate analysis. New York, USA: Springer-Verlag Inc.
Google Scholar


Published
2011-12-29

Cited by

Mańkowski, D. R., Laudański, Z. and Janaszek, M. (2011) “The application of chosen similarity measures for binary data in multivariate analysis in molecular experiments”, Bulletin of Plant Breeding and Acclimatization Institute, (262), pp. 155–173. doi: 10.37317/biul-2011-0014.

Authors

Dariusz R. Mańkowski 
d.mankowski@ihar.edu.pl
Pracownia Ekonomiki Nasiennictwa i Hodowli Roślin, Zakład Nasiennictwa i Nasionoznawstwa, Instytut Hodowli i Aklimatyzacji Roślin — Państwowy Instytut Badawczy w Radzikowie Poland
https://orcid.org/0000-0002-7499-8016

Authors

Zbigniew Laudański 

Katedra Ekonometrii i Statystyki, Wydział Zastosowań Informatyki i Matematyki, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie Poland

Authors

Monika Janaszek 

Katedra Podstaw Inżynierii, Wydział Inżynierii Produkcji, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie Poland

Statistics

Abstract views: 89
PDF downloads: 59


License

Copyright (c) 2011 Dariusz R. Mańkowski, Zbigniew Laudański, Monika Janaszek

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Upon submitting the article, the Authors grant the Publisher a non-exclusive and free license to use the article for an indefinite period of time throughout the world in the following fields of use:

  1. Production and reproduction of copies of the article using a specific technique, including printing and digital technology.
  2. Placing on the market, lending or renting the original or copies of the article.
  3. Public performance, exhibition, display, reproduction, broadcasting and re-broadcasting, as well as making the article publicly available in such a way that everyone can access it at a place and time of their choice.
  4. Including the article in a collective work.
  5. Uploading an article in electronic form to electronic platforms or otherwise introducing an article in electronic form to the Internet or other network.
  6. Dissemination of the article in electronic form on the Internet or other network, in collective work as well as independently.
  7. Making the article available in an electronic version in such a way that everyone can access it at a place and time of their choice, in particular via the Internet.

Authors by sending a request for publication:

  1. They consent to the publication of the article in the journal,
  2. They agree to give the publication a DOI (Digital Object Identifier),
  3. They undertake to comply with the publishing house's code of ethics in accordance with the guidelines of the Committee on Publication Ethics (COPE), (http://ihar.edu.pl/biblioteka_i_wydawnictwa.php),
  4. They consent to the articles being made available in electronic form under the CC BY-SA 4.0 license, in open access,
  5. They agree to send article metadata to commercial and non-commercial journal indexing databases.

Most read articles by the same author(s)

1 2 3 > >>