Ask an expert: Finlay Maguire on using genomic data to better understand how COVID‑19 and its variants behave

- January 28, 2021

One of the ways in which scientists are able to detect and track variants of the COVID-19 virus and also other viruses is through technology known as genome sequencing. (Illustration from the Centers for Disease Control and Prevention via Unplash)
One of the ways in which scientists are able to detect and track variants of the COVID-19 virus and also other viruses is through technology known as genome sequencing. (Illustration from the Centers for Disease Control and Prevention via Unplash)

It’s not uncommon for viruses to transform during their life cycle and while understanding the process and rate at which they evolve is incredibly complex, it’s extremely important for developing preventative medicines and treatments that will mitigate against transmission and any serious health impacts experienced by infected individuals.  

In recent months, confirmation of several more prominent variants of the SARS-CoV-2 virus (the cause of the ongoing global pandemic), have been made in the United Kingdom, South Africa and Brazil —  with two of those variants recently detected in Nova Scotia. As these new variants are believed to spread more quickly, it’s critical that scientists continue to study and document these mutations so that public health officials can control their spread and ensure that current COVID-19 tests are able to detect all variants of the virus and vaccines able to defend against them.

One of the ways in which scientists are able to detect and track variants of the COVID-19 virus and also other viruses, is through technology known as genome sequencing. We asked Finlay Maguire, who specializes in the fields of public health, epidemiology and bioinformatics and is the Faculty of Computer Science's Donald Hill Family Fellow, to explain how genomic data can be used to help fight against COVID-19.

What exactly is genome sequencing?

Genome sequencing is any of several methods that we use to work out the make-up of all the genetic material in an organism or virus. These methods and the analysis of the data that they produce are foundational to modern life sciences/medicine. Genome sequencing is being actively used to help understand the evolution and epidemiology of the SARS-CoV-2 virus i.e., how the virus is changing and spreading over time. Led by Jalees Nasir in Prof. Andrew McArthur’s lab at McMaster University, we, along with many other groups, have done research comparing the strengths and weaknesses of different methods to sequence the genome of SARS-CoV-2. The raw data generated by genome sequencing requires a fair amount of computational processing to remove errors and identify any variants.

In an interesting bit of cross-discipline collaboration, I worked with a cosmologist from the Perimeter Institute (Prof. Kendrick Smith) to scale up the methods being used to do this in the McArthur lab. This has enabled hundreds to thousands of genomes to be processed at the same time and is now used by McMaster and the Public Health Agency of Canada for this particular type of sequencing.

How can genomic data be used to predict the evolution of SARS-CoV-2? How does this information help scientists develop vaccines that fight against current and future genetic variants of the virus?

Genomic data gives us the precise genetic fingerprint of the virus that infected the person or animal from which it was sampled. By comparing this fingerprint to the genomes of other SARS-CoV-2 viruses, we can investigate how the virus is spreading. For example, whether an outbreak is all from one source and what that source is likely to be, such as travel from a specific country or region. The pattern of variations, or differences in genetic material, across these genomes can also be used to predict things like how much SARS-CoV-2 is circulating in a population and how quickly it is being transmitted. Using genomic data to answer epidemiological questions like these is known as “Genomic Epidemiology.”

In terms of vaccine development, genomic data allows researchers to better understand how the virus infects human cells, how quickly specific components change over time, and thus which parts of the virus to target with a vaccine.  Every single SARS-CoV-2 vaccine and vaccine candidate was developed using genomic data in some way. This link is particularly clear in the mRNA vaccines being deployed against SARS-CoV-2, as these vaccines are based on a copied portion of the viral genome. We also use genomic sequencing to monitor whether a virus has evolved to overcome the vaccine. By sequencing any person that might become infected after being vaccinated, we can determine how the virus has changed and whether the vaccine needs updating.

Given recent news that mutations of the COVID-19 virus have been found in areas like Great Britain and Canada, are researchers around the world collaborating in the study of these mutations? If so, how are they tracking sharing this information?
The analysis of SARS-CoV-2 is a truly international and cross-discipline effort, with open and rapid sharing of data, tools, and preliminary findings outside of traditional (and slow) academic publishing channels. There are currently over 425,000 genomes deposited in databases such as the European Nucleotide Archive and GISAID’s EpiCoV through the work of individual research groups and national sequencing initiatives such as the Canadian COVID-19 Genomics Network (CanCOGeN).

However, genomes are only so useful without accompanying high-quality metadata (e.g., details about the genomes such as when/where the genome was collected).  To this end, as part of the Public Health Alliance for Genomic Epidemiology, we (led by Dr. Emma Griffths at the BCCDC/Simon Fraser University) have developed international consensus standards to try and ensure that everyone is generating consistent, specific, usable metadata with their sequencing. This allows us to more easily collaborate on a global scale to study this virus.

To enable all this data to be used effectively, open-source projects such as Nextstrain develop tools and facilitate the sharing of results. These projects allow researchers and public health officials to analyse data from their area of interest and automatically sample the global data to provide context for their results.  Through a collaboration with Ontario’s ONCoV Genomics Rapid Response Coalition and Prof. Andrew McArthur, I currently maintain and host a continually updated evolutionary analysis of SARS-CoV-2 in Canada using these tools.

How have insights gained from genome sequencing been used by governments and health officials to develop policy and protocols to manage the global pandemic and spread of the virus?

Historically, the process of sequencing and analyzing genomic data took too long to actively drive public health policy/interventions during an active outbreak. Thanks to work on improving open international standards and infrastructure through initiatives like the Public Health Alliance for Genomic Epidemiology, Nextstrain, and Canada’s Integrated Rapid Infectious Disease Analysis project, it is now possible to perform genomic epidemiological analyses in essentially real-time.

This has allowed the genomic epidemiology of SARS-CoV-2 to inform public health interventions throughout the pandemic at every level from the imposition of international travel restrictions to try to control the spread of new rapidly transmitting variants down to changes in cleaning and staff-screening policies within single hospitals or long-term care facilities to control outbreaks. Beyond this pandemic, genomic epidemiology is likely going to continue to become an increasingly important tool in the management of infectious diseases.


All comments require a name and email address. You may also choose to log-in using your preferred social network or register with Disqus, the software we use for our commenting system. Join the conversation, but keep it clean, stay on the topic and be brief. Read comments policy.

comments powered by Disqus