The application of chosen similarity measures for binary data in multivariate analysis in molecular experiments
Dariusz R. Mańkowski
d.mankowski@ihar.edu.plPracownia Ekonomiki Nasiennictwa i Hodowli Roślin, Zakład Nasiennictwa i Nasionoznawstwa, Instytut Hodowli i Aklimatyzacji Roślin — Państwowy Instytut Badawczy w Radzikowie (Poland)
https://orcid.org/0000-0002-7499-8016
Zbigniew Laudański
Katedra Ekonometrii i Statystyki, Wydział Zastosowań Informatyki i Matematyki, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie (Poland)
Monika Janaszek
Katedra Podstaw Inżynierii, Wydział Inżynierii Produkcji, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie (Poland)
Abstract
The article presents the possibility of using eight measures of genetic similarity in analysis of binary data which are a mathematical image of the electrophoresis gels obtained in molecular studies. We characterized similarity measures: simple matching (Gower), Jaccard, Nei and Li (Dice), Hamann, Ochiai, Yule Y coefficient, Yule Q coefficient and zero-one equivalent of the Pearson correlation coefficient (phi 4-point correlation). Then, the example of a comparative analysis of 14 varieties of carrots (Daucus carota L.) presents the use of these measures in a multivariate analysis — UPGMA cluster analysis and principal coordinates analysis PCoA. The results of the analysis and the differences between them were presented and discussed. The similarity measures for the molecular data existing in the literature were compared in terms of results compliance obtained from statistical analyses.
Keywords:
PCoA, binary data, molecular analysis, similarity measures, cluster analysis, carrotReferences
Backhaus K., Erichson B., Plinke W., Weiber R. 2000. Multivariaten Analysemethoden. Eine anwendungsorientierte Einführung. Springer, Berlin.
Google Scholar
Caliński T., Harabasz J. S. 1974. A dendrite method for cluster analysis. Communications in Statistics, vol. 3: 1 — 27.
Google Scholar
Chudzik H., Karoński M. 1979. Skupianie obserwacji metodą k-średnich. Roczniki AR w Poznaniu, Algorytmy Biomedyczne i Statystyczne, 78: 133 — 152.
Google Scholar
Davis L. G., Dibner M. D., Battey J. F. 1986. Basic methods in molecular biology. Elsevier Sci. Publ., New York: 42 — 43.
Google Scholar
Díaz-Perales A., Linacero R., Vázquez A. M. 2002. Analysis of genetic relationships among 22 European barley varieties based on two PCR markers. Euphytica, 129: 53 — 60.
Google Scholar
Dice L. R. 1945. Measures of the amount of ecologic association between species. Ecology, 26: 297 — 302.
Google Scholar
Duda R. O., Hart P. E. 1973. Pattern Classification and Scene Analysis. New York: John Wiley & Sons.
Google Scholar
Goodman M. M. 1972. Distance analysis in biology. Syst. Zool.: 174 — 186.
Google Scholar
Gower J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53: 325 — 338.
Google Scholar
Gower J. C. 1971. A general coefficient of similarity and some of its properties. Biometrics, 27: 857 — 874.
Google Scholar
Gower J. C. 1985. Measures of similarity, dissimilarity and distances. In: Klotz S. et al. (ed.), Encyclopedia of statistical sciences. Vol. 5. Wiley & Sons, New York, USA.
Google Scholar
Gower J. C., Legendre P. 1986. Metric and Euclidean properties of dissimilarity coefficients. J. Classification, 3: 5 — 48.
Google Scholar
Guilford J. 1936. Psychometric Methods. New York: McGraw–Hill Book Company, Inc.
Google Scholar
Guthridge K. M., Dupal M. P., Kölliker R., Jones E. S., Smith K. F., Forster J. W. 2001. AFLP analysis of genetic diversity within and between populations of perennial ryegrass (Lolium perenne L.). Euphytica, 122: 191 — 201.
Google Scholar
Hamann U. 1961: Merkmalsbestand und Verwandtschafsbeziehungen der farinosae. Ein Beitrag zum System der Monokotyledonen. Willdenowia, 2: 639 — 768.
Google Scholar
Harabasz J. S., Karoński M. 1977. Dendrytowa metoda analizy skupień. Roczniki AR w Poznaniu, Algorytmy Biomedyczne i Statystyczne, 57: 135 — 148.
Google Scholar
Hotelling H. 1933. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24: 417 — 441, 498 — 520.
Google Scholar
Huang X.-Q., Wolf M., Ganal M. W., Orford S., Koebner R. M. D., Röder M. S. 2007. Did modern plant breeding lead to genetic erosion in European winter wheat varieties? Crop Sci., 47: 343 — 349.
Google Scholar
Jaccard P. 1908. Nouvelles recherches sur la distribution florae. Bull. Soc. Vaud. Sci. Nat., 44: 223 — 270.
Google Scholar
Janaszek M. 2008. Identyfikacja cech korzeni marchwi jadalnej z wykorzystaniem komputerowej analizy obrazów. SGGW, Warszawa, rozprawa doktorska.
Google Scholar
Kaczmarek Z., Czajka S., Adamska E. 2008. Propozycja metody grupowania obiektów jedno i wielocechowych z zastosowaniem odległości Mahalanobisa i analizy skupień. Biuletyn IHAR, Nr 249: 9 — 18.
Google Scholar
Karoński M. 1971. Algorytm grupowania populacji w rozkładach metodą krok po kroku. Roczniki AR w Poznaniu, Algorytmy Biomedyczne i Statystyczne, 4: 30 — 33.
Google Scholar
Kenkel N. C. 2006. On selecting an appropriate multivariate analysis. Canadian Journal of Plant Science, 86: 663 — 676.
Google Scholar
Krzanowski W. J. 2004. Biplots for multifactorial analysis of distance. Biometrics, 60: 517 — 524.
Google Scholar
Lance G. M., Williams W. T. 1967. A general theory of classificatory sorting strategies. Hierarchical Systems, Computer Journal, 9: 373 — 380.
Google Scholar
Laudański Z., Mańkowski D. R. 2007. Planowanie i wnioskowanie statystyczne w badaniach rolniczych. IHAR Radzików.
Google Scholar
Lienert G. A., von Eye A. 1986. Yule-Coefficients for Second- and Higher-Order Associations. Biometrical Journal, 28: 539 — 545.
Google Scholar
Liu F., von Bothmer R., Salomon B. 2000. Genetic diversity in European accessions of the barley core collection as detected by isozyme electrophoresis. Genetic Resources and Crop Evolution, 47: 571 — 581.
Google Scholar
Manimekalai R., Nagarajan P. 2006. Interrelationships among coconut (Cocos nucifera L.) accessions using RAPD technique. Genetic Resources and Crop Evolution, 53: 1137 — 1144.
Google Scholar
Mc Queen J. B. 1966. Some methods for classification and analysis of multivariate observations. Proc. Fifth Barkeley Symposium on Mathematical Statistics and Probability Theory. Barkeley University of California Press, vol.1: 281 — 287.
Google Scholar
Moncada K. M., Ehlke N. J., Muehlbauer G. J., Sheaffer C. C., Wyse D. L., DeHaan L. R., 2007. Genetic variation in three native plant species across the State of Minnesota. Crop Sci., 47: 2379 — 2389.
Google Scholar
Nei M. 1978. The theory of genetic distance and evolution of human races. Jpn. J. Hum. Genet., 23: 341 — 369.
Google Scholar
Nei M., Li W. H. 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA, 76: 5269 — 5273.
Google Scholar
Ochiai A. 1957. Zoographic studies on the soleoid fishes found in Japan and its neighboring regions. Bull. Japan Soc. Sci. Fish., 22: 526 — 530.
Google Scholar
Rafalski A. 2004.Semi-specyficzny PCR w badaniach genetyczno-hodowlanych roślin. Monografie i Rozprawy Naukowe IHAR, Nr 23.
Google Scholar
Rao C. R. 1964. The use and interpretation of principal component analysis in applied research. Sankhyã, A26: 329 — 358.
Google Scholar
Reif J. C., Melcinger A. E., Frisch M. 2005. Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Science, 45: 1 — 7.
Google Scholar
Sarle W. S. 1983. Cubic Clustering Criterion. SAS Technical Report A-108, Cary, NC: SAS Institute Inc.
Google Scholar
SAS Institute Inc. 2009. SAS/STAT 9.2 user’s guide. Second edition. SAS Institute Inc., Cary, NC, USA.
Google Scholar
Siatkowski I., Goszczurna T., Szabelska A., Zyprych J. 2010. Coefficients of dissimilarity and similarity with application. Colloquium Biometricum, 40: 13 — 23.
Google Scholar
Sieczko L. 2003. Kryteria wstępnego przecięcia dendrogramu w hierarchicznej analizie skupień. Colloquium Biometryczne, 33: 249 — 258.
Google Scholar
Sneath P. H. A., Sokal R. R. 1973. Numerical taxonomy. Freeman, San Francisco.
Google Scholar
Sokal R. R., Michener C. D. 1958. A statistical method for evaluating systemic relationships. University of Kansas Science Bulletin, 38: 1409 — 1438.
Google Scholar
Takezaki N., Nei M. 1996. Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics, 144: 389 — 399.
Google Scholar
Timm N. H. 2002. Applied multivariate analysis. New York, USA: Springer-Verlag Inc.
Google Scholar
Authors
Dariusz R. Mańkowskid.mankowski@ihar.edu.pl
Pracownia Ekonomiki Nasiennictwa i Hodowli Roślin, Zakład Nasiennictwa i Nasionoznawstwa, Instytut Hodowli i Aklimatyzacji Roślin — Państwowy Instytut Badawczy w Radzikowie Poland
https://orcid.org/0000-0002-7499-8016
Authors
Zbigniew LaudańskiKatedra Ekonometrii i Statystyki, Wydział Zastosowań Informatyki i Matematyki, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie Poland
Authors
Monika JanaszekKatedra Podstaw Inżynierii, Wydział Inżynierii Produkcji, Szkoła Główna Gospodarstwa Wiejskiego w Warszawie Poland
Statistics
Abstract views: 89PDF downloads: 59
License
Copyright (c) 2011 Dariusz R. Mańkowski, Zbigniew Laudański, Monika Janaszek
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Upon submitting the article, the Authors grant the Publisher a non-exclusive and free license to use the article for an indefinite period of time throughout the world in the following fields of use:
- Production and reproduction of copies of the article using a specific technique, including printing and digital technology.
- Placing on the market, lending or renting the original or copies of the article.
- Public performance, exhibition, display, reproduction, broadcasting and re-broadcasting, as well as making the article publicly available in such a way that everyone can access it at a place and time of their choice.
- Including the article in a collective work.
- Uploading an article in electronic form to electronic platforms or otherwise introducing an article in electronic form to the Internet or other network.
- Dissemination of the article in electronic form on the Internet or other network, in collective work as well as independently.
- Making the article available in an electronic version in such a way that everyone can access it at a place and time of their choice, in particular via the Internet.
Authors by sending a request for publication:
- They consent to the publication of the article in the journal,
- They agree to give the publication a DOI (Digital Object Identifier),
- They undertake to comply with the publishing house's code of ethics in accordance with the guidelines of the Committee on Publication Ethics (COPE), (http://ihar.edu.pl/biblioteka_i_wydawnictwa.php),
- They consent to the articles being made available in electronic form under the CC BY-SA 4.0 license, in open access,
- They agree to send article metadata to commercial and non-commercial journal indexing databases.
Most read articles by the same author(s)
- Zbigniew Laudański, Dariusz R. Mańkowski, Leszek Sieczko, Attempt to evaluate winter wheat cultivation technology on the basis of survey data from individual farms. Part II. Evaluation of cultivation technology , Bulletin of Plant Breeding and Acclimatization Institute: No. 244 (2007): Regular issue
- Dariusz R. Mańkowski, Zbigniew Laudański, Biological progress in breeding, seed technology and production of potato in Poland. Part II. Estimation of quantitative breeding and cultivar progress based on cultivar trials 1957–2003 , Bulletin of Plant Breeding and Acclimatization Institute: No. 251 (2009): Regular issue
- Leszek Domański, Dariusz R. Mańkowski, Bogdan Flis, Henryka Jakuczun, Ewa Zimnoch-Guzowska, Multivariate analysis of phenotypic diversity in the tetraploid × diploid hybrid progenies of potatoes , Bulletin of Plant Breeding and Acclimatization Institute: No. 264 (2012): Regular issue
- Zygmunt Kaczmarek, Dariusz R. Mańkowski, An introduction to multivariate statistical analyses. Part II. The application , Bulletin of Plant Breeding and Acclimatization Institute: No. 259 (2011): Regular issue
- Damian Gołębiewski, Kinga Myszka, Janusz Burek, Dariusz R. Mańkowski, Danuta Boros, Study of genetic variation and environmental impact on traits that determine malting quality of spring barley lines included in preliminary trials in 2011 , Bulletin of Plant Breeding and Acclimatization Institute: No. 263 (2012): Regular issue
- Maria Prończuk, Jan Bojanowski, Roman Warzecha, Zbigniew Laudański, Studies on resistance of maize to fusarium stalk rot. Part I. Evaluation of susceptibility of hybrid cultivars under natural infection , Bulletin of Plant Breeding and Acclimatization Institute: No. 245 (2007): Regular issue
- Dariusz R. Mańkowski, Zbigniew Laudański, Danuta Martyniak, Małgorzata Flaszka, The structure of multivariable cultivar variation of Poa pratensis L. , Bulletin of Plant Breeding and Acclimatization Institute: No. 254 (2009): Regular issue
- Dariusz R. Mańkowski, Zbigniew Laudański, Biological progress in breeding, seed technology and production of potato in Poland. Part IV. Assessment of cultivar quality progress in respect of resistance to pathogens , Bulletin of Plant Breeding and Acclimatization Institute: No. 254 (2009): Regular issue
- Dariusz R. Mańkowski, Zbigniew Laudański, Biological progress in breeding, seed technology and production of potato in Poland. Part VI. Assessment of biological progress on the basis of experiments and survey data , Bulletin of Plant Breeding and Acclimatization Institute: No. 254 (2009): Regular issue