Friday, 14 March 2014

Google Flu Trends: not so perfectly predictive?

I'm no expert at the algorithms that go into the search giant's Google Flu Trends (GFT) predictive website so take what follows as a very superficial opinion. It does not surprise me at all that a recent paper in Science [1], backing up previous chatter on this subject [2], finds GFT is is not very accurate. Specifically, it has been overestimating peak influenza levels compared to more traditional laboratory-confirmed cases (itself only a subset of all cases) and influenza-like illness presentations to Doctors (a non-specific method of trying to identify influenza from a swarm of other ILI-capable viruses). 

A note: this recent paper is more a look at big data and whether it deserves our complete trust yet (it doesn't, is the message) than it is an analysis of how best to predict influenza virus activity in the future.

It would be fantastic if we could have a predictive system that could work around the need for actual testing of sampled people and give us an informed guess as to what flu was doing, how long it would be doing it, how severely it would do it and when it might start and stop doing it...I just don't have a lot of faith in predictive things like this. Perhaps I've just entered into a grumpy middle-aged male phase of my life....but I think that if we want to find out what's happening, we don't need to look too much beyond simply (not so simple when it comes to lab capacity and funding of course) upping the level of testing and typing that we do.

Even now, the current situation in Queensland of a 2-fold increase in influenza virus notifications compared to the mean of the past 5-years does not really show up so clearely on GFT.

Given that so many variables will contribute to a person's choice to search for "flu" (or whatever related text GFT includes in its algorithms), it makes perfect sense that a website showing flu activity in your area that is based on that component of the results, will be an over-estimate, especially during the peak times of flu activity. Why then? Imagine the impact on search when the media is most active in trying to get your pageviews using headline banners with "killer flu" or "early flu season" in them. People don't just chat over the back fence in response to those headlines any more, they go looking to the internet to provide their answers, news and sometimes poorly communicated facts. This will not just indicate that the have the flu, it will reflect concern that they may get it at some point in the future.

GFT also taps "real" flu data from real testing labs and Doctors clinics. This means its performance is probably not "off the rails" wrong, just overly influenced by non-infectious factors at peak times.

Are the inflated results positively affecting flu vaccine uptake I wonder? That would be a good thing. Might even have an impact on the size of the peak season.

Of course, no one knows what the actual numbers of flu cases in the community are; because flu is not always a serious disease that leads us to get a sampled collected and tested. The serious disease gets to a hospital and does get tested. these get added to notifications. Sure, influenza virus can cause a more serious outcome than many other respiratory virus infections, especially in certain groups, including death on occasion. There are also many mild infections that fly "under the radar". Those numbers won't be accounted for anywhere except through modelling. Perhaps the overestimate isn't that much of an overestimate; very hard to actually know that.

It's that damn iceberg tip again.