The Effects of Mutations in SARS-CoV-2 Variants on Viral Infectivity
June 2021- July 2021
Andrea Ortiz, Aryan Gala, Nicole Richani, Ranya Sevilleno, & Sonica Prakash
Mentor(s): Dr. Elizabeth Stroupe and Professor George McGuire
Florida State University Young Scholars Program​
​
Abstract
Discovering the most influential measures to determine the relative infectivity of a virus variant and its optimal conformation for evolutionary survival, could allow us to develop better solutions to inhibit the entry of the virus into the host cell. If a mutation caused a conformational change in the spike protein of a SARS-CoV-2 variant that increased its infectivity, then it was predicted that factors like shorter bond length, a greater difference in surface charge, and chemical complementarity of amino acids would be present in this binding site with the ACE2 human receptor. Using protein sequence alignment tools in Clustal Omega, bond length measurement and Coulombic surface charge analysis tools in ChimeraX, and isoelectricity and aromaticity protein analysis in BioPython, a rank in infectivity was determined between the four variants of concern in the United States according to the CDC: Alpha, Beta, Gamma, and Delta. Our criteria showed Delta, Alpha, Gamma, and Beta from most to least infective in molecular structure, which appears to be very similar to the ranking produced by epidemiological data in the United States. Therefore, from this study, we can conclude that bond length, surface charge, isoelectricity, aromaticity, and the complementarity of amino acid substitutions from the mutations are accurate measures of viral infectivity in SARS-CoV-2.
​
Research Proposal
Introduction
The capacity of SARS-CoV-2 transmitting horizontally within its same generation and their ability to interact with antibodies has varied according to differences in conformation due to genetic mutations. Those with a higher capacity to transmit, also recognized as having a higher infectivity, and those with less ability to interact with antibodies are more evolutionarily fit to spread, explaining the uprisings in some SARS-CoV-2 variants and putting public health at greater risk with the lack of knowledge of the structural composition and behavior that these mutations cause. Discovering which mutations and structural changes would impact the fitness of the virus through discovering its optimal conformation for infectivity and antibody interaction, could allow us to develop better solutions to inhibit the path of the virus.
Research suggests that the SARS-CoV-2 mutations that affect the infectivity the most are ones that cause changes in conformation in the spike protein that interacts with the ACE2 human receptors. Therefore, determining how the infectivity of the virus is affected by the interactions between different spike proteins of known SARS-CoV-2 variants and other lab created mutants with the ACE2 human receptor can help us determine which mutation would prevail. In relation to antibody interaction, examining how the responses produced by natural antibodies differ from the responses produced by vaccine-induced antibodies can provide more insight on the effectiveness of vaccines and be a step towards figuring out the most efficient antibody. Another beneficial piece of information to investigate could be how the different variants of the SARS-CoV-2 virus compare in their interactions with a specific human antibody structure. If we can find commonalities, an antibody could be designed to interact with all variants, which could have the potential for defense against all or multiple variants.
​
Proposed question: How do the interactions between the different spike proteins of SARS-CoV-2 variants and the human ACE2 receptor affect the infectivity of each virus?
​
Motivation: After being in the pandemic for over a year, we are at a point where many different variants are being made and some being said to be more infectious than others. Although we have heard that SARS-CoV-2 variants have different virulence and immune evasion ability we want to know specifically why these different variants exhibit different properties when it comes to ability to infect host cells, not only by acknowledging differences in the spike protein, however by looking at how this influences the actual interactions between he spike protein and the ACE2 receptor. Not only is this an interesting project for us however, this information can also be spread around and help people understand why some of these variants may be more dangerous than the original virus we have been dealing with.
​
How the question relates to the extant scientific literature: There are currently many ongoing studies that aim to find the comparisons of infectivity between the numerous prevalent variants across the world. Many of these studies have already been completed, and a common conclusion that researchers across the world have come to is that the stronger the attraction and interaction between the receptor-binding domain in the spike protein of a variant and the ACE2 receptor, the more contagious or infectious the particular variant is. In other words, the speculation is that infectivity increases as the bonding between virus and receptor strengthens. Our study will aim to not only support this claim, as it still has not been affirmatively confirmed, but also explore these spike-receptor interactions and discuss how and why some variants are able to form stronger bonds with receptors and thus become more infectious.
​
Methodology
First, we would need to perform further literature review to understand the general interactions between spike proteins and the human ACE2 receptor, analyzing factors such as what types of bonds are common, bond lengths, etc. We should also conduct further literature review into what properties make other respiratory viruses highly infective. To answer our research question of what properties affect infectivity of the different SARS-CoV-2 viral variants, we will need to define a metric for infectivity. This can be a macro-scale metric such as the R number of a SARS-CoV-2 viral variant or a micro-scale metric such as how quickly each variant binds to the ACE2 receptor (with quicker binding indicating higher infectivity). We should then delineate viral variants of SARS-CoV-2 into approximately three categories using these metrics such as: High infectivity, Medium infectivity, and Low infectivity. Clustal Omega should be our first method of analysis to conduct multiple sequence alignments between the spike protein of each SARS-CoV-2 variant and to identify point mutations that compose the variants (Sievers et al., 2011). To further this analysis, we will overlay the multiple sequence alignment results onto ChimeraX three-dimensional structures of each variant and identify structural differences between each variant (Pettersen et al, 2021). Our group will interpret the three-dimensional structures with the multiple sequence alignments, to determine what amino acids, amino acid properties, and structural properties (such as formation of salt bridges) increase the open conformation of the spike protein to allow it to more easily infect the human ACE2 receptor.
For the computational part of the analysis, we can perform K-Means clustering of the sequences of the spike protein variants to subgroup variants into groups. A similar analysis has been performed before in which our group should reference when conducting this analysis (Wang et al., 2021). The clustering that we perform will be used to determine support for the predictions that we make using Clustal Omega and ChimeraX. This is because the variants of the SARS-CoV-2 virus do not all follow the same lineage of mutations. Rather, each variant has a different subset of mutations, or co-mutations, that allow it to be more infective. The subgroups of variants identified through clustering can be compared to the subgroups of spike protein features that we find through Clustal Omega/ChimeraX analysis.
​
Watch our presentation on 16:18:




Annotated Bibliography
Plante, J. A., Liu, Y., Liu, J., Xia, H., Johnson, B. A., Lokugamage, K. G., Zhang, X., Muruato, A. E., Zou, J., Fontes-Garfias, C. R., Mirchandani, D., Scharton, D., Bilello, J. P., Ku, Z., An, Z., Kalveram, B., Freiberg, A. N., Menachery, V. D., Xie, X., … Shi, P.-Y. (2020). Spike mutation D614G alters SARS-CoV-2 fitness. Nature, 592(7852), 116–121. https://doi.org/10.1038/s41586-020-2895-3
This study sought to figure out how the spike protein substitution D614G in SARS-CoV02 affects the virus’ spread and the effectiveness of vaccines. By recreating the spike D614G substitution in the USA-WA1/2020 SARS-CoV-2 strain, researchers were able to determine that the spike mutation makes virions more ineffective and therefore increases viral replication of lung epithelial cells and airway tissues in humans. Upon additional testing on hamsters, the researchers saw that sera from hamsters infected with D614 virus had significantly higher neutralization antibodies against G614 rather than D614, suggesting that vaccine efficacy will not be affected by the mutation.
Our project is about mutations of the Covid-19 virus. By telling us about the effects of the D614 spike protein mutation, we get important insight on what could happen to cells attacked by the mutated virus, and we might be able to predict the possible effects of other mutations. The experiment conducted with the hamsters and analyzing the titres in the nasal washes is helpful because it lets us know that we should look for antibody related data to compare how infectious different strains of Covid-19 are.
Gheblawi, M., Wang, K., Viveiros, A., Nguyen, Q., Zhong, J.-C., Turner, A. J., Raizada, M. K., Grant, M. B., & Oudit, G. Y. (2020). Angiotensin-Converting Enzyme 2: SARS-CoV-2 Receptor and Regulator of the Renin-Angiotensin System. Circulation Research, 126(10), 1456–1474. https://doi.org/10.1161/circresaha.120.317015
This study’s purpose was to learn more about the several roles ACE2 has. ACE2 is present in several tissues of the body. It is important in preventing diseases like heart failure and heart attacks. However, ACE2 becomes ineffective once it has been bound to SARS-CoV-2, which leads to the several possible ailments those with Covid-19 could experience. The study looks further into how ACE2 binds with the virus by comparing how it binds to SARS-CoV-2 compared to SARS-Cov. The most significant difference is with how the amino acid changes in SARS-CoV-2 lead to more hydrophobic interactions and salt bridge formations, which overall leads to a stronger bond and higher infectivity.
This study gave so many facts and so much insight about the interaction between host sites and SARS-CoV-2. Through detailing the roles of ACE2 and how it reacts to SARS-CoV and SARS-CoV-2, we can make predictions about how further amino acid changes will affect ACE2’s binding. The study also talked a lot about the places in which ACE2 is present in the body. We can correlate parts of the body that express significant ACE2 presence to illnesses related to that organ or tissue. This could give us an idea of what symptoms someone might get when ACE2 is bound with SARS-CoV-2.
ChimeraX: UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, Ferrin TE. Protein Sci. 2021 Jan;30(1):70-82.
This reference is a perfect supplement to the ChimeraX software. It not only gives a brief tutorial on how to use Chimera, but also explains the multitude of features that the tool has, and how we can use it for our purposes. The article even goes to the extent of describing how, in the context of the pandemic, we can use ChimeraX to analyze the 229 protein structures that have been released to depict the sars-cov-2 virus. It gives an overview of possibly all the features of the software, including the toolbar, log, command bar, viewer, and more. It also explains how to interact with the structures with a mouse or trackpad. Lastly, it thoroughly goes through how and what one can analyze through ChimeraX, such as type of bonds, color, hydrophobicity, shape, sequence, volume, and more.
This reference is great to put in this annotated bibliography because it is extremely relevant to our project, no matter what topic we choose. Through our brainstorming, we have determined a few shortlisted topics, all of which use ChimeraX to analyze structures such as spike proteins or the ACE2 receptor. So, this tutorial and overview of ChimeraX and its various features will assist us in having a knowledge of our capabilities and what exactly we can analyze. For instance, we know that we will be using the bond length and “swapaa” features frequently so this article can help.
Liu, H., Zhang, Q., Wei, P., Chen, Z., Aviszus, K., Yang, J., ... & Zhang, G. (2021). The basis of a more contagious 501Y. V1 variant of SARS-COV-2. Cell research, 31(6), 720-722.
This study studies and explores an apparently more contagious variant of sars-cov-2, Y501. In this study, the researchers discuss why this might be the case, which leads them to finding that the receptor-binding domain of Y501 is attracted 10 times more to the human ACE2 than another variant, N501, is to ACE2. From the structure, they also observe that the receptor-binding domain of Y501 forms more ring interactions and hydrogen bonds with ACE2. This leads them to the conclusion that the stronger the attraction between a spike protein and receptor might be, the more contagious it is.
Once again, this reference is great to put in this annotated bibliography because it can potentially be the foundation of our project. One of the ideas that we are thinking of (which we are also leaning towards) is comparing the infectivity of different select variants (maybe 5 of the most prevalent variants) and seeing which are the most contagious. So, this can give us a foundation because now we are aware that the stronger the binding between spike protein and human receptor, the more infectious it probably is. With this information, we can use ChimeraX to analyze the interactions between various spike proteins and the human ACE2 receptor and determine which have the strongest attractions based on bond type and length. Then, we can rank them based on infectivity and also potentially compare them to current epidemiological data for verification of trends.
Groves, D. C., Rowland-Jones, S. L., & Angyal, A. (2021). The D614G mutations in the SARS-CoV-2 spike protein: Implications for viral infectivity, disease severity and vaccine design. Biochemical and biophysical research communications, 538, 104–107. doi.org/10.1016/j.bbrc.2020.10.109
Groves et. al. (2021) investigated the point mutations in the spike protein of the SARS-CoV-2 virus as this was the mutation that started to be pervasive internationally in patients infected with COVID-19. The authors used the GISAID (Global Initiative on Sharing All Influenza Data) database to monitor and track the genomic variations of the virus over time for their analysis. (This database might be useful for our group to further look into for our project.) The authors specifically identified the D614G mutation as the pervasive variant that also causes more infectivity compared to other experimentally generated variants of the virus. The authors conclude that the D614G mutation is more likely to lead to an open conformation of the spike protein when binding with the human ACE-2 receptor. It is interesting to note that the D614G mutation also has the capability to infect other mammalian species.
The study conducted by Groves et. al. (2021) is important to help our group understand why some variants of the virus are more infectious than others. In this case, the open conformation allowed by the mutation within the spike protein is critical to increasing infectivity. Groves et al. (2021) rule out other reasons for the increased infectivity of the virus such as efficiency of spike protein synthesis. We should take these potential factors (that affect spike protein infectivity) into account for our research project. If our group decides on incorporating an epidemiological angle in addition to a structural biology angle, the D614G variant identified by Groves et. al. (2021) should be something that our group focuses on when finding SARS-CoV-2 sequence data online. Our group should also consider looking into the GISAID database if we decide to compare different variants of the SARS-CoV-2 virus to each other, considering that sequence data of variants are continuously being added to this database.
Pandurangan, A. P., & Blundell, T. L. (2020). Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning. Protein science : a publication of the Protein Society, 29(1), 247–257. doi.org/10.1002/pro.3774
Pandurangan and Blundell (2020) use computational biology tools through a combination of statistical and machine learning approaches to determine which missense mutations in the spike protein of the SARS-CoV-2 virus are “drivers” of increased infectivity and which are merely “passengers” of a variant. They take this computational approach due to the cost (and in our case inability) to perform experimental structural biology work . Computational tools allow for predictions of the effects of mutations without needing the direct data from a wet lab. The first tool that Pandurangan and Blundell (2020) use in their paper is the SDM or Site Directed Mutation approach that uses data from environment-specific substitution tables (ESST) to simulate the in vitro environment of a spike protein and human receptor interaction. The SDM approach is conformationally restrictive, as ESST data is only available for certain conformations of the protein obtained from previous experimental data. Pandurangan and Blundell (2020) also use a mCSM machine learning software, layered on top of SDM techniques, to predict the effect of mutations in the protein on protein-protein and protein-environment interactions.
The two methods (SDM and mCSM) are based on thermodynamics equations and graph based methods. These techniques are worth looking into if we decide to overlay ChimeraX analysis with computational biology methods applied to structural biology. The predictions made by SDM and mCSM methods can justify whether to reject or fail to reject the conclusions made about spike protein variant infectivity through ChimeraX. If we decide to go down this route, we should look into how accessible these softwares are and whether the data needed to input into them are also publicly available or not.
Li, Q., Wu, J., Nie, J., Zhang, L., Hao, H., Liu, S., Zhao, C., Zhang, Q., Liu, H., Nie, L., Qin, H., Wang, M., Lu, Q., Li, X., Sun, Q., Liu, J., Zhang, L., Li, X., Huang, W., & Wang, Y. (2020). The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity. Cell, 182(5). doi.org/10.1016/j.cell.2020.07.012
This study sought to examine the significance of mutations occuring in N-linked glycosylation sites in spike protein and the amino acid changes in SARS-CoV-2 variants to determine the viral infectivity and antigenicity. It contains the scientist’s procedure of analyzing the infectivity and reactivity of the mutated versions of the virus with neutralizing antibodies, which are known to be altered significantly with amino acid changes in the surface protein. It was found that both variants, which are considered viruses with naturally formed mutations, and mutants, viruses with experimentally formed mutations, evolve to have alterations in their antigenicity and infectivity. In all but one natural variant, the reactivity of neutralizing monoclonal antibodies occurred in the receptor binding domain, since all antibodies were targeting this region.
This study presents a very logical experimental design that is closely related to what we will investigate and constitutes many options for further experimentation. It also includes an overview of the basics of biology concepts in relation to SARS-CoV-2 and the mutations. Because it is an extensive study, there are multiple examples of what could serve as possible routes to an extension for our project, as well as show what other conclusions we can make using the data and reasoning.
Singh, P. K., Kulsum, U., Rufai, S. B., Mudliar, S. R., & Singh, S. (2020). Mutations in SARS-CoV-2 Leading to Antigenic Variations in Spike Protein: A Challenge in Vaccine Development. Journal of laboratory physicians, 12(2), 154–160. https://doi.org/10.1055/s-0040-1715790
Similar to the study we will conduct, this reference is of a study done to predict the mutations in the spike protein of the SARS-CoV-2 genomes and analyze the mutation’s impact on the viral antigenicity. The method to verify this included collecting data of spike proteins, having them aligned by their sequences, and finding the mutations of the single nucleotide polymorphisms. Then, the inferred predicted mutations’ antigenicity and the epitopes were superimposed on the spike protein. The study showed that most epitopes showing high antigenicity were located in the receptor binding domain.
This source proves that the mutations that cause an increase in antigenicity are those that cause a conformational change in the spike protein. This is relevant to our study because it provides more information on the nature of the spike protein, which will help us make a connection to the behavior of its binding to the human receptor, known as ACE2. Which mutation will make the virus no longer be able to bind and to what extent will it become more antigenic?
Yadav, R., Bajpai, P. K., Srivastava, D. K., & Kumar, R. (2021). Epidemiological characteristics, reinfection possibilities and vaccine development of SARS CoV2: A global review. Journal of family medicine and primary care, 10(3), 1095–1101. doi.org/10.4103/jfmpc.jfmpc_2151_20
This study explores the epidemiological characteristics of SARS COV2 mutation. The study was more specifically run to review the characteristics as well as control measures for the different strains/ mutations of the covid virus. The researchers explored statistics including the life-cycle, intermediate hosts, viability on various surfaces, strains, case fatality rate, and their implication to reduce the transmission of SARS CoV2. After reviewing those statistics and most importantly the case fatality of Covid in different countries the researchers concluded that because case fatalities vary so largely from country to country, this leads to the inference that research regarding SARS COV2 should put an emphasis on the molecular level of the virus suggesting to explore things on mutations and virulence of the different viral strains.
Not only does this study follow a data analysis type of experiment similar to what we are expected to do, it also provides information to some of the points we were questioning when discussing our own research project. Because we were thinking of researching the differences in infectivity and virulence of the different covid strains, we were in doubt about whether to use molecular data or epidemiological data, or even both. This study reaches the conclusion that epidemiological data alone does not provide very strong evidence for the differences in virulence of the different strains and explains that in order to study this, a stronger emphasis on the molecular aspect of the virus must be made.
Begum, F., Mukherjee, D., Thagriki, D., Das, S., Tripathi, P. P., Banerjee, A. K., & Ray, U. (2020). Analyses of spike protein from first deposited sequences of SARS-CoV2 from West Bengal, India. F1000Research, 9, 371. doi.org/10.12688/f1000research.23805.1
This study researches the spike protein from the sequences of SARS-COV2. The genome of this new strain has started to be sequenced where only a few sequences were available at the time of the study. The researchers of this study used the first 5 published sequences to analyse the protein sequences of the five isolates as well as compare with other mutations/sequences of Covid. A unique mutation was found in position 723 and 1124 in the S2 domain of the spike protein from West Bengal. There was also one mutation downstream of the receptor binding domain at position 614 in S1 domain. Mutations in the S2 domain were found to show changes in the secondary structure of the spike protein. Because these mutations were important to the receptor binding domain, these mutations define the effectiveness of the receptor binding.
This study entry had a slightly different structure compared to the previous study however I found it nice to pair these two together because while the previous study goes into the epidemiological data of SAR COV2 this study goes into more of the molecular features and mutations of different sequences of the virus.