Project Description

SBIR: A Cloud-Based WGS Platform for Establishing Phylogeny within Epidemic Outbreaks at Hospitals

Agency: National Institute of Allergy and Infectious Diseases

Project Number: 1R43AI143267-01

Outbreaks have become a significant public threat in hospitals, impacting both patient lives and creating a financial burden. With the increasing rise in antibiotic resistance, new interventions are urgently needed to contain ongoing outbreaks. Recent studies have confirmed that whole genome sequencing (WGS) is able to identity unique mutations within each outbreak strain, which can then be utilized to establish transmission routes. However, significant bioinformatics challenges exist in utilizing WGS for outbreak analyses. WGS reads are inherently noisy, and traditional read-mapping techniques require the careful selection of quality criteria to identify and remove artefactual SNPs. This can be a challenge when often only a single-nucleotide polymorphism (SNP) may separate two outbreak isolates. This has resulted in a high barrier for routine adoption of genomics-based interventions during outbreaks. In this proposal, we look to develop a fully-automated cloud-based bioinformatics platform that can be rapidly leveraged in the event of an outbreak. Our platform will avoid current read-mapping approaches to curate erroneous assemblies and instead adopt a different methodology that utilizes new Amazon Web Services (AWS) cloud computing components such as AWS Lambda and DynamoDB.

Our Phase 1 aims are:

  • Develop an assembly module that assembles all the raw sequence data and then identifies artefactual SNPs within each outbreak assembly
  • Develop a biomarker module that establishes unique biomarker sequences by removing both artefactual SNPs and low-quality SNPs
  • Develop a control module that combines the results of both the assembly module and the biomarker module to establish phylogeny.