An official website of the United States government.

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Population Genomics of Streptococcus Agalactiae from Infected Bovine and Fish Sources

Richards, Vincent P
Cornell University
Start date
End date

1. Describe the Streptococcus agalactiae population structure, genetic diversity, migration patterns, and demographic history using genome wide single nucleotide polymorphism (SNP) data in a global strain collection from humans, bovines, and fish, identifying associations between phylogenetic lineages and isolation source.

2. Determination of the core genome, unique core genes, and dispensable genome components of distinct S. agalactiae lineages, as determined by genome wide SNP genotyping, characteristic of different host species and disease states.

More information

Streptococccus agalactiae (group B Streptococcus - GBS) is a member of the commensal microbiota of the intestinal and genitourinary tracts of humans, but is also a leading cause of morbidity and mortality in newborn babies, pregnant women, and the elderly [1]. The other major reservoir for the bacterium is bovine, where it has long been recognized as a common cause of mastitis, a major production limiting disease in developed and developing countries around the world. S. agalactiae has also been identified as an aetiological agent of septicaemia and meningo-encephalitis in saltwater and freshwater fish species, and is now considered an important threat to the aquaculture industry [2-4]. Although there is considerable evidence for host-adaptation among strains of S. agalactiae, there is potential for foodborne and human-to-animal or animal-to-human transmission of the pathogen [5-7]. In addition to anthroponotic or zoonotic transmission, the possibility of emergence of human-pathogenic clones from an animal reservoir has been raised [8], as well as the suggestion that re-emergence of the pathogen in animal populations may be due to spill-over or adaptation of strains from humans [9]. There is therefore, a clear need for a better understanding of the evolution and transmission dynamics of S. agalactiae both within and across host species, including humans, agricultural, and aquaculture species. We take advantage of the rapidly decreasing cost of next-generation genome sequencing technology to examine for the first time the population genetic structure and dynamics of S. agalactiae on a global scale involving different hosts and disease states. This study will provide valuable insight on the population structure and diversity of S. agalactiae, uncover emergent clones, and identify associations between specific lineages and isolation source. Comparative genomics of multiple isolates derived from diseased bovine and fish sources, compared to human sourced isolates, should (i) identify genes that are key to host adaptation and possibly linked to the cause of disease, and (ii) allow estimation of the direction and rate of bacteria migration among populations and therefore provide insight into transmission dynamics among hosts. This project is directly related to the Foundational Program's priority area of "Animal Health and Production" and the challenge area of "Keeping American agriculture competitive" by identifying genes linked to bovine mastitis and fish septicaemia, caused by S. agalactiae. It will also provide data on transmission dynamics between these sources of infection and is therefore also linked to the challenge area of "Improving food safety." Collectively, the information arising from this project could ultimately lead to more effective preventative and treatment programs. Findings will be disseminated via scientific conferences and peer reviewed publications. REFERENCES 1. Phares CR, Lynfield R, Farley MM, Mohle-Boetani J, Harrison LH, Petit S, Craig AS, Schaffner W, Zansky SM, Gershman K et al: Epidemiology of invasive group B streptococcal disease in the United States, 1999-2005. Jama 2008, 299(17):2056-2065. 2. Mian GF, Godoy DT, Leal CA, Yuhara TY, Costa GM, Figueiredo HC: Aspects of the natural history and virulence of S. agalactiae infection in Nile tilapia. Veterinary Microbiology 2009, 136(1-2):180-183. 3. Suanyuk N, Kong F, Ko D, Gilbert GL, Supamattaya K: Occurrence of rare genotypes of Streptococcus agalactiae in cultured red tilapia Oreochromis sp. and Nile tilapia O. niloticus in Thailand--Relationship to human isolates? Aquaculture 2008, 284(1-4):35-40. 4. Ye X, Li J, Lu M, Deng G, Jiang X, Tian Y, Quan Y, Jian Q: Identification and molecular typing of Streptococcus agalactiae isolated from pond-cultured tilapia in China. Fisheries Science 2011, 77(4):623-632. 5. Dogan B, Schukken YH, Santisteban C, Boor KJ: Distribution of serotypes and antimicrobial resistance genes among Streptococcus agalactiae isolates from bovine and human hosts. J Clin Microbiol 2005, 43(12):5899-5906. 6. Evans JJ, Bohnsack JF, Klesius PH, Whiting AA, Garcia JC, Shoemaker CA, Takahashi S: Phylogenetic relationships among Streptococcus agalactiae isolated from piscine, dolphin, bovine and human sources: a dolphin and piscine lineage associated with a fish epidemic in Kuwait is also associated with human neonatal infections in Japan. Journal of Medical Microbiology 2008, 57(Pt 11):1369-1376. 7. Manning SD, Springman AC, Million AD, Milton NR, McNamara SE, Somsel PA, Bartlett P, Davies HD: Association of Group B Streptococcus colonization and bovine exposure: a prospective cross-sectional cohort study. PLoS One 2010, 5(1):e8795. 8. Bisharat N, Crook DW, Leigh J, Harding RM, Ward PN, Coffey TJ, Maiden MC, Peto T, Jones N: Hyperinvasive neonatal group B streptococcus has arisen from a bovine ancestor. J Clin Microbiol 2004, 42(5):2161-2167. 9. Zadoks RN, Middleton JR, McDougall S, Katholm J, Schukken YH: Molecular epidemiology of mastitis pathogens of dairy cattle and comparative relevance to humans. Journal of Mammary Gland Biology and Neoplasia 2011, 16(4):357-372.

Isolates and SNP discovery This study takes advantage of genome sequence data available on GenBank for multiple isolates of S. agalactiae and supplements this dataset with new genome sequences acquired as part of this project. Currently on GenBank there are 162 genome sequences for human sourced isolates of S. agalactiae, 37 from bovine, and 15 from fish. To this collection of data I will add an additional 100 sequences from bovine, most of these from cows with clinical or subclinical mastitis and 50 from fish, most of these with septicaemia. Genome sequencing will take place as follows: (i) Nextera™ DNA library prep and Illumina HiSeq® 2000 sequencing technologies will be combined to produce 100bp paired end reads for each isolate, (ii) de-novo assembly of reads will be performed using Velvet v0.7.55 [1], and the script VelvetOptimizer v2.1.4 ( The MCL algorithm [2] as implemented in the MCLBLASTLINE pipeline (available at will be used to delineate homologous protein sequences among all genome sequences. Core homologous gene clusters (homologs shared among all genomes) will be delineated and clusters containing multiple homolog copies for any genome removed. Genes within each cluster will be aligned using Probalign v1.1 [3]. Gene alignments will be tested for evidence of intragenic recombination using multiple approaches: (i) GARD [4], (ii) PHI and NSS [5, 6], (iii) Max chi2 [7], and Profile [5]. Finally, a concatenation of SNP sites will be used to construct a contiguous genotype sequence for each strain. Population genetic analysis of SNP data The number of S. agalactiae populations (K) will be estimated using two separate approaches as implemented in the programs STRUCTURE v2.3 [8] and GENELAND v3.1.4 [9]. Both STRUCTURE and GENELAND use Markov Chain Monte Carlo (MCMC) algorithms within Bayesian frameworks to estimate K. For STRUCTURE, K will be estimated by first performing an evaluation of genetic partitioning and then calculating the ad hoc statistic DK [10]. Following Falush et al. [11], two models of ancestry will be utilized: (i) the no admixture model, which assumes ancestry is derived from only one population, and (ii) the linkage model, which accounts for mixed population ancestry due to recombination. Once K has been determined, assignment probabilities for each individual in each population will be calculated. For GENELAND, the spatial and haploid data models will be combined. Levels of differentiation among the delineated populations will be measured using an analysis of molecular variation (AMOVA) as implemented in ARLEQUIN version 3.11 [12]. Complete genome data will be used to examine the distribution and diversity of genes implicated in virulence. Migration rates among populations will be estimated using the Bayesian framework implemented in the program MIGRATE v3.2 [13, 14]. Historical population dynamics The diversity captured by the extensive sampling scheme and high resolution SNP genotyping will be exploited to elucidate population demographic history. More specifically, accurate representation of the diversity within distinct populations, will facilitate recovery of sufficient phylogenetic signal to generate Skyline plots so as to estimate changes in effective population size through time [15]. Evaluation Plan This research will transpire over two years. The first year will include assembly of the genome sequence data, genome wide SNP genotyping, population genetic and phylogeographic analyses. The second year will include the historical demographic analysis and the comparative genomic work to determine gene content differences between hosts and any specific groups identified in the phylogeographic analyses. Results will be disseminated via scientific publications and conference presentations. REFERENCES 1. Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18(5):821-829. 2. van Dongen S: Graph clustering by flow simulation University of Utrecht; 2000. 3. Roshan U, Livesay DR: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 2006, 22(22):2715-2721. 4. Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SD: GARD: a genetic algorithm for recombination detection. Bioinformatics 2006, 22(24):3096-3098. 5. Bruen TC, Philippe H, Bryant D: A simple and robust statistical test for detecting the presence of recombination. Genetics 2006, 172(4):2665-2681. 6. Jakobsen IB, Easteal S: A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Computer Applications in the Biosciences 1996, 12(4):291-295. 7. Maynard Smith M: Analyzing the mosaic structure of genes. Journal of Molecular Evolution 1992, 34(2):126-129. 8. Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 2000, 155(2):945-959. 9. Guillot G, Estoup A, Mortier F, Cosson JF: A spatial statistical model for landscape genetics. Genetics 2005, 170(3):1261-1280. 10. Evanno G, Regnaut S, Goudet J: Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 2005, 14(8):2611-2620. 11. Falush D, Wirth T, Linz B, Pritchard JK, Stephens M, Kidd M, Blaser MJ, Graham DY, Vacher S, Perez-Perez GI et al: Traces of human migrations in Helicobacter pylori populations. Science 2003, 299(5612):1582-1585. 12. Excoffier L, Laval G, Schneider S: Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinform Online 2005, 1:47-50. 13. Beerli P: Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations. Mol Ecol 2004, 13(4):827-836. 14. Beerli P, Felsenstein J: Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc Natl Acad Sci U S A 2001, 98(8):4563-4568. 15. Pybus OG, Rambaut A, Harvey PH: An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 2000, 155(3):1429-1437.

Funding Source
Nat'l. Inst. of Food and Agriculture
Project source
View this project
Project number
Accession number
Prevention and Control
Viruses and Prions