In case you can't tell - I'm a little sick of the number of electrons being wasted on writing the same thing about this paper in Science. I know, there is no shortage of electrons. Still, I hope to see this same degree of ire elicited by and directed toward other places, corporations and States who have trouble providing data to the public within the expected realms of accuracy. I'd also hope for more focus on what and how we test now and how representative that is of what a virus is doing; or what we might be missing.
I think Olson et al said it well when noting GFT's earlier failure to predict the H1N1 2009 pandemic's influenza-like illness activity..
"Current internet search query data are no substitute for timely local clinical and laboratory surveillance, or national surveillance based on local data collection"
Okay. Google Flu Trends (GFT) was not 100% accurate. Wow. Who'd would have thunk it? Who could possibly have guessed this would happen? The disappointment is clearly widespread. A predictive computer-based system set up for devising regulatory guidelines, formulating vaccine formulations, ensuring suitable laboratory testing capacity and preparation or national surveillance guidelines failed. Wait. What? It wasn't setup for any of that! It’s really just a pretty thing you can go look at to get an estimate of flu activity near you; much easier to wade through than some country's public health efforts. Estimate. When did we expect an estimate to be perfect?
Come on people-interpreting-this-paper. GFT isn't a failure unless you were honestly expecting it to be 100% correct.
Of course it couldn't ever be that. THERE. WAS. NO. VIRUS. TESTING. Not done by GFT anyway. Some lab testing went into it apparently, but even that was a sliver of a slice of a shard. And if you know anything about respiratory virus testing, then you know that even the testing we do, represents only a tiny fraction of the amount of virus-positive cases out there, extrapolating from those. That testing even varies from place-to-place in type, quantity and extent of reporting. The choice of what to test (sampling) is itself biased in a number of ways, not the least of which is that we favour testing pretty sick people or those that feel crook enough to present to a Doctor. We’re comparing GFT’s “fail” to an estimate. You’re all comfortable with using that to lambaste GFT? You’re comfortable to call that a total fail?
"The folks at Google figured that, with all their massive data, they could outsmart anyone."
Really? Is that what the folks thought? Did Google really get bitten by the flu bug?; can Google truly not track the flu? Certainly catchy headlines one and all. I guess no-one would read something entitled "Google Flu Trend's estimates not in agreement with some national testing data which also represents only a portion of those who get infected". I can see where that might not be a real mouse-wheel turner.
GFT was and could only ever be a predictive system. Just like that shiny App you have on your phone that predicts the weather forecast. Let's drag "big weather" through the interwebs flailing it at every turn so we can suitably express our righteous indignation at its failure to predict the rain we wanted on the weekend. It failed! OMG! Now I have to water my lawn to stop it from drying up. But that's all I have to do. No-one died when the clouds held their watery payload. My child was no more or less safe because the weather bug bit the Bureau of Meteorology here in Queensland. I didn’t have to get a new lawn because it is now 24-hours drier.
Does GFT's overestimate of the number of predicted cases by 0.5-2 fold (depending on the story you read) really have a real-world impact on anyone? Seriously? Keep in mind that its estimates still followed the trend of flu activity pretty closely; they peaked when actual flu was peaking, just not (my other estimates) perfectly. But apparently someone 100% concordance between lab sampling and GFT estimate data.
GFT has been doing a perfectly good job given what it is and what it could ever hope to be in its current setup. Perhaps centralizing and plotting the WORLD'S lab-based data alongside Google “flu”-related search-result data would be a useful next step for GFT. Then we could make up our own
In the meantime, keep it in context people.
- Crof's blog post
- The Parable of Google Flu: Traps in Big Data Analysis