Objectives:
Objective 1: determine the evolutionary history of REPEXH02 strains in the broader context of E. coli O157:H7 clade 2 and identify unique gene content with potential to impact strain persistence.
Objective 2: identify non-host reservoirs and environmental harborage sites of REPEXH02 strains utilizing environmental sample collection, phenotypic assays, and genome wide association analysis.
Objective 3: develop a predictive model for the persistence of REPEXH02 strains based on the physicochemical properties of collected samples using AI models.
Abstract: Recent analysis of genomic and epidemiological data has shown that a specific subtype of E. coli O157:H7 has been associated with illnesses since 2016. The CDC has designated this subtype (REPEXH02) as one that is reoccurring, emerging, or persistent and outbreaks of this subtype were linked to leafy greens grown in the Salinas and Santa Maria regions in California. Available data appear to indicate the potential for environmental reservoirs of this subtype, as it has so far only been found in this limited geographic area. Comparative genomics has identified genetic differences among isolates within the REPEXH02 group, though comparisons with a broader group of related E. coli O157:H7 strains have not been conducted. Detailed genomic characterization and comparison within the larger group of clade 2 is necessary to provide context and identify genetic elements associated with emergence and persistence of these isolates that may be linked to specific environmental features. This study aims to 1) conduct a comparative genomic analysis within a broader group of E. coli O157:H7 genomes to determine evolutionary history of the REPEXH02 subtype, 2) identify genes in REPEXH02 associated with persistence, 3) link genomic, phenotypic, and environmental data to determine the features underlying emergence and persistence of REPEXH02. Historical collections of E. coli O157:H7 as well as newly isolated strains from the Salinas and Santa Maria regions will be used for comparative genomics and measurements of mutation, recombination, and horizontal gene transfer within REPEXH02. Soil and water samples from the Santa Maria region will be evaluated for physicochemical parameters, which will be used to develop assays to measure stress tolerance phenotypes of existing REPEXH02 isolates. Links between genes, phenotypes, and environmental parameters will be determined using bioinformatics and AI-based modeling tools. A model to predict the persistence of REPEXH02 based on soil physicochemical parameters will be developed. These data will be used to provide growers and producers with data-based mitigation strategies for this persistent subtype of E. coli O157:H7.