Wednesday, 7 August 2013

Tracking MERS-CoV through time: a spikey problem

This morning on Twitter, Helen Branswell (@HelenBranswell) asked this question, with a comment...

So I thought a little perspective might be nice. 

The SARS epidemic had its origins around Nov 16th 2002, although the major activity started in Feb of 2003. 

  • 64 human SARS-CoV genomes had been produced by September 2003 ([UPDATED:] see Science paper). That is by 317-days later, or 10-months, 13-days (perhaps less given that the genome sequences were possibly sequenced well before the paper was submitted e.g. late phase genome s seem to have been submitted to GenBank by July 2003). 
  • For MERS-CoV we currently have 9 genomes at 505-days (give or take), or 1-year, 4-months.
Not that anyone needs to be reminded, but 80% of MERS-CoV cases come from the Kingdom of Saudi Arabia. The world is relying on them, or their collaborators, to turn the nucleic acid extracts used to define these cases (PCR-POSs hopefully kept in a -80'C freezer), into templates for gene or genome sequencing.

I personally don't believe we need to have complete genomes right now in order to fulfil the fairly urgent public health need to monitor the virus and notice if it changes, or is changing, or is not changing. These changes tell us whether the virus is still adapting or has settled in - perhaps having done so prior to this outbreak's indicator, severe disease. 

What else to use to track adaptation?

Perhaps the 4,000nt Spike (S) gene, or some smaller but suitably variable portion of it, could be a target for sequencing? 

Zhang and colleagues have data showing it could be used to track an animal coronavirus's adaptation to humans, through its 3 pandemic phases. This was done using phylogeny (a way to show how one sequence relates to another through time and space) of nucleic acid sequences and alignments of the translated version of these sequences. All we need is primer sequences that could be used to reliably amplify the S gene of the MERS-CoV. If anyone has those already perhaps they could publish them...if they haven't already. A very brief look at the 9 MERS-CoV genomes already shows some variety. Perhaps unsurprisingly, there is very little change among the 4 Al-Ahsa genomes; their collection dates are separated in time by 17-days.

This shows a schematic of the aligned Spike genes. The black lines within the grey boxes represent nucleotides that differ from the consensus. More differences are obvious in the earlier sequences. The oldest MERS-CoV isolate is at the bottom, the most recent, at the top (detailed below). See the full version here at VDU.
Interestingly, the phylogeny of the complete Spike genes looks  similar to that of the complete MERS-CoV genomes. However  its doe snot place the isolates in order of increasing time to the extent that the full genomes do. I also looked at a 900bp fragment of the 3' of the Spike gene - easier to amplify but a very similar tree to that of the complete Spike.

All 9 complete MERS-CoV spike protein genes (nt). Alignment in Geneious Pro, tree in MEGA 5.10.
Full version will be here at VDU.

All 9 complete MERS-CoV genomes (nt). The arrow indicates moving forward in time; the oldest MERS-CoV isolates at the bottom, the most recent at the top. Alignment in Geneious Pro, tree in MEGA 5.10.
Full version will be here at VDU.

So where does that leave us?

Adaptive pressures on the SARS-CoV drove its genome towards settling down in the late stage of the 3-phase outbreak (defined by the Chinese SARS Molecular Epidemiology Consortium), with changes in the Spike gene occurring before that. Complete genomes are clearly the gold standard - so I dial down that personal belief from earlier.

The Spike gene still seems a useful target for MERS-CoV too, although not as accurate at plotting the time of virus isolation as complete MERS-CoV genomes were in my example above. Still, it, or some part of it, is still of use as an early-warning system to alert us to viral change and it will prove easier to amplify by smaller or less genomics-focussed laboratories. Something we need to consider in order to get some information, which is far better than none.

While we've seen predictive modelling for the age of MERS-CoV, we don't actually know when the virus came to be or when it started spilling over to humans. More full genome sequences would certainly help address that question. And finding its origin.

However, perhaps we should make the trade off and use the 3' end of the Spike gene now, in an effort to keep some sort of eye on how the MERS-CoV is travelling? Anyone else have a good region that fits the bill?