logo

Taiwan set up AI tools tracing virus transmission with sequencing and multi-omics data

Abstract

Taiwan Central Disease Control (CDC) works with Taiwan AI Labs (AI Labs), set up AI transmission tracing system with the virus sequencing data and unstructured multi-omics data. This system identifies the transmission based on virus sequence data with contact history. This study also demonstrated source-unknown cluster and cases being set up the transmission path by adding sequencing data. One case is without sequencing, but the connection made by correlated sequenced cases. This system also helps to understand the international transmission path and confirmed the infection clusters. Taiwan identified the transmission path of all viruses with sequencing data. 

Background

As next-generation sequencing (NGS) technology develops rapidly in these years, the researchers can quickly obtain the whole genome sequence (WGS) of pathogens. Utilization of WGS in epidemic transmission analysis facilitates the rapid and accurate identification of samples' relativeness, which can be used to identify the path of transmission within a population and provide information on the probable source.
Since the beginning of the SARS-CoV-2 outbreak on January 22, 2020, Taiwan Executive Yuan (EY), when the CDC announced the first case of SARS-CoV-2 subsidizing the cost of virus sequencing for confirmed cases. The CDC works with hospitals and research institutes, to accelerate data accumulation. These data provide perspectives for epidemic researchers and laying a foundation for subsequent infectious disease researches.
In order to discover the virus transmission path for confirmed cases, Taiwan start using virus sequence data tracing virus transmission with the tool branch out from NextStrain and developed as an open-source tool https://covirus.cc/phylogeny/COVID19-latest.

Taiwan CDC constructed virus transmission path by AI algorithm

AI Labs designed an AI algorithm to take the features from confirmed case information including traveling history, daily activities, cases relationship, and features from sequencing data. These data are supported by the CDC. AI algorithm is applied to establish a case-to-case propagation map. All Taiwan confirmed cases are displayed with a clear association graph according to the features including geographic whereabouts and genetic phylogenetic relationships. 
In the genetic analysis, the system takes international virus data from GISAID automatically and constructs the evolution and the spreading history of this epidemic. The display model utilizes the tool branched from NextStrain for data visualization.

Figure 1. The development flow of the transmission tracing system.

For the cases with an unknown source, Taiwan CDC leverage the WGS the phylogenetic tree to discover the cases that are highly relative to their virus strain, and with sample’s travel information, geographical location, and activity information to identify transmission routes. (Figure 2)

Figure 2. The flow of finding the possible transmission path.w
Case study: Identified transmission source for infection clusters

Taiwan identified infection clusters by phylogenetic tree and tour groups. 2 clusters, for example, are returning from Egypt (Figure 3) and Turkey (Figure 4). Cases from each group are all posited on the same branch of the phylogeny tree. Based on the result, cases in each group share the same source of infection as confirmed cluster.

Figure 3. The phylogenetic tree of Egypt infected cluster. From the perspective of the case correlation, the system shows 6 genomics data from Egypt infected cluster are all positioned at the same branch of the phylogenetic tree.
Figure 4. The phylogenetic tree of Turkey infected cluster. From the perspective of the case correlation, the system shows 5 genomics data from Turkey infected cluster are all positioned at the same branch of the phylogenetic tree.
Case 34 study: Identified possible transmission source by sequencing data

The Case 34 is a 50-year-old woman with an unknown source. Figure 4 shows that samples relevant to this case are posited on the same branch including Case 48. Case 48 traveled from the United Kingdom. Based on this information, Taiwan CDC is able to trace Case 48 by associated member traveled from UK.

Figure 5. Phylogenetic tree of Case34 and relevant cases. The system shows that all relevant samples of Case34 are positioned at the same branch of the phylogenetic tree, and the branch also contains Case48 who came back from the United Kingdom. (Upper: compare the phylogenetic relationship with the relevance of the case; lower: use genetic information to infer possible geographic origin of the case).
Case 336 study: Identified possible transmission source by sequencing associated Case

Case 336 is a confirmed case without sequencing data. To trace the transmission path of Case 336, we analyzed Case 347, infected by Case 336. Phylogenetic analysis shows that Case 347 is associated with Case 77, Case 143, and Case 144 who are having transmission history back to Europe (Figure 6). Therefore, Taiwan CDC is able to set up associated transmission path.

Figure 6. Phylogenetic tree of Case 336 and relevant cases. Analysis results confirm the relationship with case 77, case 143, and Case 144 from the position of Case 336 on the phylogenetic tree and observe the cluster relationship of other cases from the case relationship diagram. (Upper: compare the phylogenetic relationship with the relevance of the case; lower: use genetic information to infer possible geographic origin of the case).
Conclusion

With the deployment of a virus transmission tool with an AI algorithm in Taiwan, Taiwan CDC can identify 100% of the sources for virus with sequence data. The infection clusters can be confirmed with sequencing data. The source of source-unknown case can be identified by sequencing the virus. Case without sequencing can also be traced back based on other sequenced cases.

Reference

  1. Carol A. Gilchrist, Stephen D. Turner, Margaret F. Riley, William A. Petri Jr., Erik L. Hewlett. Whole-Genome Sequencing in Outbreak Analysis. Clinical Microbiology Reviews 2015; 28 (3) 541-563; DOI: 10.1128/CMR.00075-13.
  2. Sagulenko P, Puller V, Neher RA. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 2018;4(1):vex042. Published 2018 Jan 8. doi:10.1093/ve/vex042
  3. Neher RA, Bedford T. nextflu: real-time tracking of seasonal influenza virus evolution in humans. Bioinformatics. 2015;31(21):3546‐3548. doi:10.1093/bioinformatics/btv381
  4. James Hadfield, Colin Megill, Sidney M Bell, John Huddleston, Barney Potter, Charlton Callender, Pavel Sagulenko, Trevor Bedford, Richard A Neher, Nextstrain: real-time tracking of pathogen evolution, Bioinformatics, Volume 34, Issue 23, Dec 1. 2018, Pages 4121–4123

© 2020 Taiwan AI Labs
All Rights Reserved.

  • Terms of Use
  • Privacy Policy