In this post, I’d like to share some thoughts I’ve accumulated over the past few years about how to draw better graphs.
To get straight to the point, I have two concrete recommendations:
- normalised data should usually be plotted on a logarithmic scale, and
- scatter plots can be easier to understand than bar charts.
I’ll now elaborate on both of these points, drawing upon examples from 31 examples of graphs I found in the proceedings of PLDI 2019.
Logarithmic scales for normalised data
I believe that normalised data should be plotted on a logarithmic scale. By “normalised data”, I mean data that is the ratio between two measurements that have the same dimension. For example: the ratio between the execution time of a program before a proposed compiler optimisation has been applied and the execution time of that program afterwards.
I will illustrate my reasons with reference to the two graphs below, which both show some sort of “speedup” that has been obtained on four benchmark programs, A, B, C, and D. The left graph uses a linear scale on the y-axis, while the right one plots the same data on a logarithmic scale.
There are four reasons why the logarithmic scale is better:
- The natural origin for a speedup ratio is 1, not 0. That is, we are primarily interested in seeing whether a data point lies above 1 (which indicates a speedup) or below 1 (which indicates a slowdown). This fits nicely with logarithmic scales, which can’t go down to 0. In the right-hand graph above, it is immediately obvious that A and B experience a slowdown; this is slightly less obvious in the left-hand graph.
- Going from a 1x speedup to a 2x speedup is surely more impressive than going from a 3x speedup to a 4x speedup. But on the linear y-axis in the left-hand graph above, the distance between 1 and 2 is the same as the distance between 3 and 4, so these feats would appear equally impressive.
- Often, it is just as good to get a 2x speedup as it is bad to get a 2x slowdown. But on the linear y-axis in the left-hand graph above, the distance from 1 to 0.5 is much smaller than the distance from 1 to 2, so the speedup experienced by benchmark C is emphasised over the slowdown experienced by benchmark B, even though both have the same magnitude.
- On a linear scale, the “centre of gravity” to which the eye is drawn lies at the arithmetic mean, while on a logarithmic scale, the centre of gravity is at the geometric mean. When averaging dimensionless ratios, most authors tend to use the geometric mean.
One caveat: the logarithmic scale doesn’t work well if the normalised data can be zero. This is quite rare though – I don’t think I saw this in any of the PLDI 2019 papers I scanned through when writing this post.
Here are some examples of graphs from PLDI 2019 that I believe would be improved by using a logarithmic scale on their y-axes.
In praise of scatter plots
The problem with bar charts is that the x-axis tends to be underemployed. It has the capacity to represent some sort of quantity, but often is used simply to spread out a collection of benchmarks in some arbitrary order. I think that information can often be conveyed more effectively to the reader by using a scatter plot instead.
As an example, here is a classic bar chart, showing the performance of a “new” technique compared to an “old” technique over a range of benchmarks.
The graph is rather straightforward to read: when the green bar is higher, the old technique is faster, and when the red bar is higher, the new technique is faster. This graph provides all the information needed to compare the two techniques. But it is not easy to see “at a glance” which technique is better.
Here is an analogous scatter plot.
I reckon that scatter plots take longer to read than bar charts, but less time to understand. What I mean is: it’s immediately obvious from the bar chart that the height of a bar represents the time taken for that benchmark, but it takes a few moments to work out that in the scatter plot, the points above the diagonal represent benchmarks where the old technique is faster, and those below the diagonal represent benchmarks where the new technique is faster.
However, once this has been established, it becomes straightforward to compare the two techniques. One can immediately make observations like “the new technique seems to win on the shorter-running benchmarks, but to lose on the longer ones”.
It’s worth pointing out that my scatter plot shows less information than my bar chart, because it does not identify the individual benchmarks. Perhaps, if there are not too many benchmarks, it would be possible to label the points individually, or to use a different colour for each point. Of course, this risks overcomplicating things, and we are often more concerned with general trends than with the performance of particular benchmarks. A reasonable compromise might be to colour a handful of the particularly interesting points so that they can be referred to in the surrounding text.
By the way, I have used semi-transparent markers in my scatter plot. I find this quite an attractive way to deal with multiple points being almost or exactly on top of each other. With opaque markers, coincident points could get lost.
Scatter plots can cope with more adventurous situations too. For instance, here is a scatter plot that compares two variants of the “new” technique against the “old” one.
And here’s a scatter plot that conveys the uncertainty surrounding each data point using ellipses. The width of the ellipse corresponds to the uncertainty in the x-value, and the height of the ellipse corresponds to the uncertainty in the y-value. Ellipses that cross the y=x diagonal represent benchmarks where we’re not sure which technique is better.
Here are some examples of graphs from PLDI 2019 that might be better as scatter plots.
- LaTeX code for the graphs drawn by me is available.
- A 1983 article “On graphing rate ratios” in the American Journal of Epidemiology argues that relative rates should be plotted on logarithmic rather than linear scales. A counterpoint is provided by James R. Hebert and Donald R. Miller’s 1989 article “Plotting and discussion of rate ratios and relative risk estimates” in the Journal of Clinical Epidemiology, which argues that relative rates should actually be plotted not on logarithmic scales but on reciprocal scales!
- The SIGPLAN Empirical Evaluation Checklist suggests logarithmic scales for speedups, among several other recommendations for drawing good graphs.