In a recent study posted to the bioRxiv* preprint server, researchers critically evaluated the findings of a study that claimed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) entered humans via two separate zoonotic spillover events.
Background
Globally, biologists are still struggling to understand how SARS-CoV-2 entered the human population, however, it is crucial to learn about the origins of SARS-CoV-2 to prevent another pandemic of an unprecedented scale such as the COVID-19 pandemic.
About the study
In the present study, researchers used outbreak simulations to demonstrate the futility of the findings of the Pekar et al. study. They also raised concerns about another study performed by Worobey et al., which excluded key data points and arrived at unjustified conclusions about the origin of SARS-CoV-2.
The Pekar et al. study provided strong evidence for SARS-CoV-2 natural origin hypothesis. However, according to the authors, this study heavily relied on unrealistic SARS-CoV-2 phylodynamic models and dubiously excluded information on SARS-CoV-2 genomes.
Study findings
Pekar et al. used FAVITES, a computational tool, to simulate the SARS-CoV-2 outbreak working on the premise that superspreading occurs when patients with a high degree of secondary connections fall ill and start infecting their close contacts. Lythgoe et al. showed that superspreading events generate polytomies because there is low within-host viral diversity at any given time. FAVITES used the scale-free network and machinery built for human immunodeficiency virus (HIV) and failed to capture real-world SARS-CoV-2 superspreading events.
Most superspreading events of SARS-CoV-2 occurred in a short time, during which within-host diversity and the probability of within-host evolution were low. Consequently, SARS-CoV-2 superspreading generated polytomies at a higher rate than the transmission & mutation processes in the Pekar et al. study. The researchers noted that the premise of the true early SARS-CoV-2 outbreak phylogeny comprising two basal polytomies in the Pekar et al. study was thus uncertain. It also proved that their phylodynamic model was a poor fit for the timescale of SARS-CoV-2 evolution relative to its superspreading timescale.
The early SARS-CoV-2 phylogeny comprised of two basal polytomies, is not an empirical fact but an estimate of phylogenetic structure, which depends on the data used. Thus, the empirical premise behind their testing procedure was flawed. They claimed that there must have been two spillover events at the HSM, one generating lineage A and the other generating lineage B, which differed by only two mutations. On the contrary, it is possible that there were many sequences intermediate between lineage A and lineage B, which they excluded from the study analysis.
The researchers illustrated that Pekar et al. excluded multiple potential C/C intermediate genomes, 11 from Sichuan and one from Wuhan in China. Lin et al. sequenced multiple C/C intermediates, five of which had a genotype identical to the reference genome Wuhan-Hu-1. These five C/C intermediates likely represented the true C/C intermediates, providing further empirical evidence against the two spillover or natural origin hypotheses.
Furthermore, the random sampling process of Pekar et al. was unrealistic and significantly biased against polytomies. They seeded their outbreaks with one randomly infected individual to run the previously described transmission and mutation simulations until they sampled 50,000 infected individuals. Afterwards, they subsampled individuals at random. However, during the early SARS-CoV-2 outbreak in Wuhan, researchers did not randomly collect samples from the general population but rather by contact tracing. Contact tracing intuitively adds a bias in case ascertainment, increasing the likelihood of polytomous lineages. Although it is not easy to quantify the effect of contact tracing, it must be accounted for when revealing something as important as the origin of SARS-CoV-2 that caused the unprecedented COVID-19 pandemic.
Conclusions
Pekar et al. hypothesis tests presented an underestimate of the probability of two basal polytomies. Perhaps medical alerts and biased case ascertainment confounded the inferences Pekar et al. (and Worobey et al.) attempted to make from the phylogenetic and spatial patterns of early SARS-CoV-2 outbreak data. While their analysis contributed to the efforts to make sense of early outbreak phylogenies, the models used may have been insensitive to their underlying assumptions. Also, those underlying assumptions conflicted with the empirical realities of early outbreak case ascertainment. Hence, their conclusions do not appear justified, and the origin of SARS-CoV-2 remains unknown.
*Important notice
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.