Predicting the Emergence of SARS-CoV-2 Clades
Siddharth Jain, Xiongye Xiao, Paul Bogdan, Jehoshua Bruck
Received date: 4th September 2020
Evolution is a process of change where mutations in the viral RNA are selected based on their fitness for replication and survival. Given that current phylogenetic analysis of SARS-CoV-2 identifies new viral clades after they exhibit evolutionary selections, one wonders whether we can identify the viral selection and predict the emergence of new viral clades? Inspired by the Kolmogorov complexity concept, we propose a generative complexity (algorithmic) framework capable to analyze the viral RNA sequences by mapping the multiscale nucleotide dependencies onto a state machine, where states represent subsequences of nucleotides and state-transition probabilities encode the higher order interactions between these states. We apply computational learning and classification techniques to identify the active state-transitions and use those as features in clade classifiers to decipher the transient mutations (still evolving within a clade) and stable mutations (typical to a clade). As opposed to current analysis tools that rely on the edit distance between sequences and require sequence alignment, our method is computationally local, does not require sequence alignment and is robust to random errors (substitution, insertions and deletions). Relying on the GISAID viral sequence database, we demonstrate that our method can predict clade emergence, potentially aiding with the design of medications and vaccines.
This is an abstract of a preprint hosted on a preprint server, which is currently undergoing peer review at Scientific Reports. The findings have yet to be thoroughly evaluated, nor has a decision on ultimate publication been made. Therefore, the results reported should not be considered conclusive, and these findings should not be used to inform clinical practice, or public health policy, or be promoted as verified information.