VDU's blog: Graphs

Showing posts with label Graphs. Show all posts

Sunday, 18 January 2015

Some changes to my Ebola virus disease (EVD) graphs...

To perhaps provide clearer info and to accommodate the changes in the epidemic, namely the reduction in cases and the focus on ridding Guinea, Liberia and Sierra Leone from any and all cases of EVD, I've made some tweaks to my Tableau data visualizations (or dashboards). Briefly...

The dots take their leave.

Was this.

Gone are the dots in my cumulative chart, to be replaced by a third "area under the curve" style graph.

This brings out the importance of the confirmed cases-more on why that matters later. This week Cedric Moro @Moro_Cedric) asked why we seem to have a relatively large number of suspect and probable cases released in each report World Health Organization situation report (WHO SitRep) or summary (SitSumm). I imagine this is due to the turnaround time once the sample arrives, occasions when results may need to be repeated to confirm strange results, time between seeing a patient and sampling them for Ebola virus testing...but there are probably more obvious reasons. Chime in.

Is now this.

Plot the right data for now.

I'm not an epidemiologist - yes, I know you epidemiologists out there already know that. But I like to play with numbers and pretty colours. So this week I got some information that I didn't have before - the reason why use of cumulative curves was frowned upon by the excellent numbers communicator, Prof Hans Rosling (@HansRosling).

I had read previously that Prof Rosling was no fan of cumulative curves in graphically explaining progress in ridding west Africa of EVD. But I like them - I've even explained, in my epidemiologically unprofessional opinion - how a flat plateau on a cumulative curve clearly shows the stalling of an outbreak or epidemic. Turns out I either didn't read all of that quote, or the text I read didn't contain the key fact.

It's not really that cumulative curves are at fault, it's what they are plotting that can mislead. The important thing to plot, especially now that cases are fewer and laboratory capacity is in place, are the confirmed cases, not the total cases which include suspected+probable+confirmed cases.

Confirmed cases are Ebola virus, unconfirmed cases may never be.

In the last WHO SitRep (14-Jan-15) it was noted..

All 54 EVD-affected districts (those that have ever reported a probable or confirmed case) have access to laboratory support within 24 hours of sample collection.

"Access" doesn't mean a result will appear 24 hours after sampling though. But even with this shorter access period, suspected and probable cases are in fact still making up a decent proportion of the total cases reported in even the most recent reports. For example...

In Guinea the numbers between 14-Jan-2015 and 15-Jan-2015 saw suspected cases rise by 3, probables stayed the same and confirmed cases lifted by 8; 27% of the total cases reported between this pair of reports were not confirmed to be Ebola virus infections, at the time of reporting.
In Liberia over this period, suspected cases rose by 29, probables by 2 and confirmed case numbers did not change-so none of the 31 cases were laboratory confirmed as an Ebola virus infection.
In Sierra Leone over this period, suspected cases rose by 10, probables remained the same and confirmed cases lifted by 16; 38% of the total cases were not confirmed to be Ebola virus infections.

If we compare those figures to 2 SitReps from well before the WHO had declared the 24 hour laboratory support, dated 24-Sept-2014 and 26-Sept-2014, we find that Guinea only had 8% of its tally unable to be confirmed, Sierra Leone was at 12% not confirmed while 87% of EVD-like cases added to Liberia's tally between reports were not confirmed as due to and Ebola virus infection.

This may not be a fair comparison of course and it's not one that accounts for every report - just the 2 pairs of reports I arbitrarily chose as being from 'now' and 'back then'. Nonetheless, I expected there to be a bigger and more obvious difference in the proportion of cases that were now being quickly confirmed-I thought that percentage would have gone up as the unconfirmed cases were less frequent. Instead, it seems that the proportion is not that much better. Perhaps this is an indication of the other diseases which mimic EVD early on, that normally emerge at this time of year or have emerged because of the state of healthcare in the countries blasted by the EVD epidemic. As I said above, it may also just be the time it takes to observe, collect a good history and make a clinical decision before a sample is collected. It may also be that laboratory turnaround times (including testing, verifying and reporting) take a bit longer than we naively expect from reading that quote from the WHO above.

More visualizations of confirmed case numbers.

So for the reasons above, I've added the changes I've mentioned and I've also duplicated some of the "total case" graphs by creating versions that only include confirmed cases.

In the example below I'm showing that it looked like Liberia was experiencing an uptick in cases for 2 consecutive reporting weeks (blue bar graph, right column). I tweeted about this during the week. In fact, those rises were due to unconfirmed cases. The confirmed case plots (green titles in the right-hand column of graphs) show the consistent decline in new EVD cases we had been hearing about.

Live and learn.

Graphs plotting total EVD cases (including suspected, probable and laboratory confirmed;
brown title bars, left-hand column) versus graphs plotting only the laboratory confirmed cases
(green title bars, right-hand column).
Data are from WHO SitReps and SitSumms
Click on image to enlarge.

Saturday, 6 September 2014

Case number changes between Ebola virus disease reports...

This is one of my favourite charts for following the Ebola virus disease outbreak in West Africa because it shows how things are changing from report to report.

It plots the total number of suspected, probable and laboratory-confirmed cases between reports - which is a measure of change over time that is not cumulative.

That's not to say that understanding this chart is easy for everyone...as with everything, what you take away from it may be heavily influenced by your own perspective and your background in reading graphs. I have written something about how to read some of the graphs on my blog here, which may be helpful too.

Uses World Health Organization data up to and including the Situation Report from the 5th-Sept, 2014.
Click on chart to enlarge.

I've marked up the last three periods between reports to highlight that the time changes differently. You can see this for yourself if you look carefully at the horizontal or "x" axis (the one that has the dates) and look at where each dot lines up with its date. Some are further apart than others.

You can also mouse over the dots on the interactive version of the graph here. That will tell you the dates. THe subtraction is up to you though!

The lines joining the dots here suggest what is happening between the WHO Reports, but the line do not actually use any real collected values...because we don't have them to plot.

Technically, a bar graph would be more accurate, but I find a line graph easier to read at a glance. So do remember - we don't know what is happening between those dots. We're just presuming it.

Sunday, 10 August 2014

How to read a VDU graph...

I'm a pretty simple guy. So the stuff that I put onto Virology Down Under's (VDU) blog is usually something I think can be understood by you - my yard stick is that if I can understand it, then I think you can. Sometimes it can get pretty technical though and with things always done in a rush, I don't stop and explain as much as I could. Which is why I value feedback. And I've had some good stuff from @DeclanButlerNat, @JorgeCastillaE and @Moro_Cedric this week.

Different levels of experience read this blog and my posts on Twitter, so sometimes I direct my graphs towards them. But I do understand that we scientists can be easily carried away by our interests and forget that we're quite used to interpreting our own presentation styles in a certain and speedy way. We've had lots of experience doing it that way. I can change a tyre (as I was reminded a couple of nights ago, at midnight) but I couldn't fix my engine.

At the heart of reading a graph is this fact: you have to look at the axes to understand what the lines or bars or areas mean. Once you know the style, you can understand it at a glance - but first time, examine it with care. If it's one of mine, feel free to ask me what I'm trying to show if it is not immediately obvious. I very well may have failed to make it clear.

So this is a little overview of how to read some of the graphs which I use to communicate what I consider to be otherwise yawn-inducing tables of numbers about viral infection and disease numbers.

A picture is worth a thousand words..

This is a good thing because with my lack of typing skills, if I had to type 1,000 word all the time, that would be at least 200 typos. Graphs plot those tabular numbers in a more colourful and visual way. Once you know how to read a graph, they can become powerful and quick ways to get a quick update on the state of play. On VDU the game seems to be about outbreak data. That's just the way things have evolved for me since I first blogged on 28-March 2013. This includes graphing the number of people with disease (cases), changes in the number of cases, numbers that are suspected versus the number that are actually laboratory confirmed (my currency), dates of onset illness (favoured piece of data and the hardest to come by publicly), the numbers who die, the proportion (%) of all cases/detections who die, dates when disease was reported, sex, age and all of that can be plotted on graphs by day, week, month or year.

Interpreting a basic graph on VDU...

The graph below (Graph 1) comes from following Middle East respiratory syndrome (MERS) public data. It shows the key parts of the structure of the graphs - the axes (the horizontal and vertical lines that are the key to reading the plotted numbers) and the axes.

A basic graph has a bottom horizontal line called the x-axis and it has a vertical line on the side called the y-axis. These are used to tell you what the numbers plotted on the graph mean; they are a key to the placement of each point on a graph, according to at least 2 different values.
Each point on a graph represents a coordinate. Its made up of an x-axis values (abscissa) and a y-axis value (ordinate). For example we plot 50 cases reported on Thursday or 50 on the y-axis and Thursday one the x-axis (x,y)
The points that we plot as pairs of x and y data can be joined up and shown as a line (the area underneath the line can also be coloured in which looks like a mountain that may have peaks and troughs) or they can be plotted as bars. There are other ways too - but I keep it simple. Joining up these dots is not always accurate - we may have no idea what is really happening to the numbers between any 2 points, in that case a bar graph may be more realistic as it shows the numbers at a distinct point in time. Sometimes bar graphs don't work from a formatting perspective (eg bars get so skinny you can't see them). Other times, joining the dots reveals the trends (the general direction that events are heading even if we don't know the values). Trends are useful in infectious disease as they show what has happened and what the latest data mean in the context of what has come before - so not too unrealistic. Some of this is about being accurate while not being too overly obsessive.

The particular example graph I've included below (Graph 1) is a little trickier than some because it has 2 y-axes (vertical lines) - a primary (left-hand side) and a secondary (right-hand side). Some of the numbers are plotted against the primary y-axis (left vertical line) and some against the secondary y-axis (the right hand vertical line). This lets me "double-dip" on shared x-axis numbers, in this case, dates. I'm graphing the course of 2 different things (number of actual cases by day of illness onset) and the number of reported detected by date. These are 2 different things that have dates in common.

This graph lets us compare, using the same x-axis, what the MERS case numbers look like when they are plotted by the day the people were reported to have become ill compared to the date of public reporting of the cases. There are differences that become more clear when you can run the 2 lines on the same graph, that may be a bit harder to see when they are plotted on 2 separate graphs. This graph highlights that when cases become ill and when they are reported are different things. It also shows that there were a bunch of cases (113) reported in 1 day that have never been given dates of illness onset (or hospitalization or the date they were each reported to the Ministry of Health). It also makes use of the 2 y-axes to have different scales. The primary or left-hand y-axis goes up to 35 while the secondary or right-hand y-axis maxes out at 120. If the same axis values were used, the illness onset cases would mostly be hard to see.

Graph 1. The basics of a graph.

What about cumulative graphs? What are they and how do I interpret those?

The next graph is made to show cases piling up over time (Graph 2). This is the graph that sparked this blog. It plots numbers as a line graph but instead of showing the value at that timepoint (day, week, month, year), it adds the new number to sum of all the previous numbers. It is plotting a cumulative tally, so it will always be a hill with an upwards (left-to-right, bottom to top) slope except when there are no new cases to add, when the curve becomes parallel to the x-axis - a flat line. How steep that line is can tells us how rapidly cases are piling up. That can also be fudged if you present the chart with a very short or long x-axis.

In the case of the Zaire ebolavirus outbreak in West Africa, we have the unusual ability to compare numbers from multiple countries at the same time, and use the same x-axis. Here, we show the date when the World Health Organization's Disease Outbreak News update was released. Sadly for us graph addicts, this doesn't include any illness onset dates, but the WHO do have those data and plot it themselves here (1).
A steep slope indicates a rapid rise in cases and this results from a lot of new cases being added in a short period of time.
A near flat or horizontal slope to the line shows that there are not many new cases being added.
In this graph we also show multiple lines plotted using the primary (left) x-axis to present how much and at what rate the total suspect, probable and laboratory confirmed case numbers are piling up (pink) as well as how the deaths from among that number are changing (blue line) and how many of the cases are being laboratory confirmed (green line) as due to the virus suspected of being the cause. This last one is important as it gives a glimpse of how the laboratory network is coping, perhaps how specimen access is going and how much faith to put in the other two totals. Why are we worried about the result totals? Because many other things can look like Ebola virus disease (EVD) early on, and even later in the disease course. A laboratory test is the only way to be certain that the patient had that virus.
Nigeria's numbers look to be rising alarmingly fast. Relative to each other they are, but compared to the dozens of new EVD cases being added between reports in other countries, it is still a small (although still very bad for Nigeria!) increase. This highlights that care is needed when reading charts. Perhaps also an understanding that between different outbreaks, the rate of new cases being added is disease specific. Lots of influenzavirus detections during flu season is what we expect, any ebolavirus cases are not what we expect nor what we want to see. Context. A hard thing to account for and probably a matter of experience.

Graph 2. The cumulative case graph. Adding new numbers to the sum of all the numbers that came before.
Click on image to enlarge.

Graph 3. Changing the scale. Raising the primary y-axis (left) scale to 750, the level of the other country graphs, makes Nigeria's case numbers look tiny. But it underestimates the impact of the localised spread of Zaire ebolavirus in an are that was not part of the outbreak until a case flew in and spread it. Changing the scale is not just whimsical decision making, it can highlight the importance of events that may otherwise go unnoticed.
Click on image to enlarge.

Take care when interpreting a graph - look at the axes and also use your noodle

Finally, I'm going to look at the way in which I present the numbers I plot on a graph. I'm using the cumulative case chart for Liberia as my example (Graph 4 collection). Its the same one used in Graph 3 - the only thing different is that I've dragged the x-axis to the left (shrunk) or to the right (stretched) to see what that does.

The line plots look more or less steep when you shrink or stretch the x-axis, respectively. But the numbers have not changed. Possibly, our interpretation of them has, as a result of seeing the slope change. Remember though, check the axes. If you look at the x-axis, the shrunken version shows that those cases have climbed over a longer period than the slope suggests. Always check the denominator (the y of x/y) when you think about slope. Equally, the flatter curves of the stretched out x-axis, at the bottom of the Graph 4 collection, have to be looked at in context with time. The dates have been dragged out to what may be an unreasonable length, which makes the slopes look less; but they are still steeper in July than they were in April. Look around the graph for comparison.
As I said above, the current multi-country outbreak lets us compare and so we can see that some areas are adding new cases very rapidly between each report (Liberia and Sierra Leone) while others (Guinea) are not adding as many as quickly. Nigeria looks to have jumped quickly but that is also because of the altered scale (discussed above)
On VDU I get around this by also adding charts that plot total numbers per day or week or month or year. This shows a more discrete series of data that grow or shrink as the outbreak peaks or resolves. The 2 peaks of influenza A(H7N9) virus outbreaks illustrate this nicely - especially when combined with a cumulative case chart (Graph 5)!
There is no real right or wrong here (although there are pixel width constraints)- but don't let your perceptions fool you when looking at someone's graphs for the first time. Take some time to really look at the graphs.

Graph 4 collection. Stretching the x-axis can seem like stretching the truth. But carefully read the axes. Some experience is needed here and ultimately you are at the mercy of the person presenting the data.
Click on image to enlarge.

Graph 5. Influenza A(H7N9) virus outbreak in China during 2013 and 2014. Plotting the numbers discretely (by week) clearly shows the two outbreak peaks (darker blue lines joining the data point dots) and gives valuable context to the cumulative graph in the background (pale blue mountain). This is probably my favourite style of disease numbers graph.
Click on image to enlarge.

I hope that has helped make sense of my graphs, and perhaps those of others too. I'm always on Twitter so hit me up with questions about this or requests for more posts like this, or to tell me whether it was helpful.

References

http://www.who.int/csr/disease/ebola/EVD_WestAfrica_WHO_RiskAssessment_20140624.pdf?ua=1

Static pages