An official website of the United States government.

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

III: Medium: Adaptive Information Extraction from Social Media for Actionable Inferences in Public Health

Gravano, Luis; Hsu, Daniel
Columbia University
Start date
End date
Social media is a major source for non-curated, user-generated feedback on virtually all products and services. Users increasingly rely on social media to disclose serious real-life incidents, such as a food poisoning incident at a restaurant, rather than visiting official communication channels. This valuable user-generated information, if identified reliably, may have a dramatic positive impact on critical applications related to public health -- the family of applications of interest in this project -- and beyond. For example, a local health department might launch an investigation of a potential foodborne disease outbreak at a restaurant if compelling evidence supporting the investigation can be inferred from social media. This project will address fundamental research challenges associated with processing social media to produce actionable inferences, where the output of the process leads to concrete actions in the real world. In addition to producing broadly applicable research results, the project will have as its centerpiece a critical public health application, namely, detecting and acting on foodborne disease outbreaks in restaurants.

Overall, this project will develop (1) strategies for entity-centric modeling and selection of social media, to cover the vast volumes of user-produced content across sources; (2) non-traditional information extraction strategies over informal, noisy, and ungrammatical text, as well as learning-based approaches to produce actionable, entity-centric inferences for public health applications; and (3) methods for general online active learning and search that are tuned for detecting the rare and infrequent occurrences required for actionable inferences. Furthermore, this project will center around an application of detecting and acting on foodborne disease outbreaks, in a joint collaboration between Columbia University and the New York City Department of Health and Mental Hygiene (DOHMH). This collaboration will provide a robust, real-world platform for a continuous, end-to-end evaluation of the novel research results as applied to a large-scale data science problem, a rare opportunity in the evaluation of Computer Science research. This collaboration will include the development and deployment of a system with a direct impact on public health and society. A proof-of-concept prototype is already in use at DOHMH and has helped identify and act on several previously unknown outbreaks. The public health findings from the project will be shared across governmental agencies, following DOHMH's best practices. Developed code and annotated datasets will be shared with other researchers and agencies via the project web site (
Funding Source
United States Nat'l. Science Fndn.
Project source
View this project
Project number
Bacterial Pathogens
Natural Toxins
Prevention and Control