# A plot twist! Drawing better graphs in PL papers

In this post, I’d like to share some thoughts I’ve accumulated over the past few years about how to draw better graphs.

To get straight to the point, I have two concrete recommendations:

• normalised data should usually be plotted on a logarithmic scale, and
• scatter plots can be easier to understand than bar charts.

I’ll now elaborate on both of these points, drawing upon examples from 31 examples of graphs I found in the proceedings of PLDI 2019.

# Logarithmic scales for normalised data

I believe that normalised data should be plotted on a logarithmic scale. By “normalised data”, I mean data that is the ratio between two measurements that have the same dimension. For example: the ratio between the execution time of a program before a proposed compiler optimisation has been applied and the execution time of that program afterwards.

I will illustrate my reasons with reference to the two graphs below, which both show some sort of “speedup” that has been obtained on four benchmark programs, A, B, C, and D. The left graph uses a linear scale on the y-axis, while the right one plots the same data on a logarithmic scale.

There are four reasons why the logarithmic scale is better:

1. The natural origin for a speedup ratio is 1, not 0. That is, we are primarily interested in seeing whether a data point lies above 1 (which indicates a speedup) or below 1 (which indicates a slowdown). This fits nicely with logarithmic scales, which can’t go down to 0.  In the right-hand graph above, it is immediately obvious that A and B experience a slowdown; this is slightly less obvious in the left-hand graph.
2. Going from a 1x speedup to a 2x speedup is surely more impressive than going from a 3x speedup to a 4x speedup. But on the linear y-axis in the left-hand graph above, the distance between 1 and 2 is the same as the distance between 3 and 4, so these feats would appear equally impressive.
3. Often, it is just as good to get a 2x speedup as it is bad to get a 2x slowdown. But on the linear y-axis in the left-hand graph above, the distance from 1 to 0.5 is much smaller than the distance from 1 to 2, so the speedup experienced by benchmark C is emphasised over the slowdown experienced by benchmark B, even though both have the same magnitude.
4. On a linear scale, the “centre of gravity” to which the eye is drawn lies at the arithmetic mean, while on a logarithmic scale, the centre of gravity is at the geometric mean. When averaging dimensionless ratios, most authors tend to use the geometric mean.

One caveat: the logarithmic scale doesn’t work well if the normalised data can be zero. This is quite rare though – I don’t think I saw this in any of the PLDI 2019 papers I scanned through when writing this post.

Here are some examples of graphs from PLDI 2019 that I believe would be improved by using a logarithmic scale on their y-axes. This first graph demonstrates why y=0 is not the best origin – if the bars started at y=1 (and then stretched upward for speedups or downwards for slowdowns) then there would be no need for the thick black line at y=1. This graph would also be clearer if its bars emanated from y=1. Moreover, this would serve to remove the unnecessary orange bars, which represent the values against which the other bars are normalised. These two examples have the unfortunate property that some bars extend beyond the top of the scale. Another advantage of the logarithmic scale is that it handles unusually large (and unusually small) values gracefully. This example is particularly unusual. It has the nice property of treating speedups and slowdowns symmetrically, but has a disconcerting discontinuity between y=-1 and y=1, and requires the reader to grapple with strange concepts like negative speedup factors! (From reading the surrounding text, I understand that a speedup factor of -2 should be interpreted as a 0.5x speedup, or a 2x slowdown.) I’m a little torn about these two examples, both of which plot speedup against the degree of parallelism. In both cases, the advantage of plotting the speedup on a linear scale is that we can compare against the ideal situation of “speedup factor equals parallelism degree”, which is depicted as a straight line at y=x. Perhaps a reasonable compromise in these situations would be to use logarithmic scales for both axes. Here is an assortment of other graphs from PLDI 2019 that plot normalised values on a linear scale. I’d like to give an “honourable mention” to these two graphs, which do use logarithmic scales, but could be improved further by starting the bars at y=1 rather than at an arbitrary small number like y=0.01.

# In praise of scatter plots

The problem with bar charts is that the x-axis tends to be underemployed. It has the capacity to represent some sort of quantity, but often is used simply to spread out a collection of benchmarks in some arbitrary order. I think that information can often be conveyed more effectively to the reader by using a scatter plot instead.

As an example, here is a classic bar chart, showing the performance of a “new” technique compared to an “old” technique over a range of benchmarks.

The graph is rather straightforward to read: when the green bar is higher, the old technique is faster, and when the red bar is higher, the new technique is faster. This graph provides all the information needed to compare the two techniques. But it is not easy to see “at a glance” which technique is better.

Here is an analogous scatter plot.

I reckon that scatter plots take longer to read than bar charts, but less time to understand. What I mean is: it’s immediately obvious from the bar chart that the height of a bar represents the time taken for that benchmark, but it takes a few moments to work out that in the scatter plot, the points above the diagonal represent benchmarks where the old technique is faster, and those below the diagonal represent benchmarks where the new technique is faster.

However, once this has been established, it becomes straightforward to compare the two techniques. One can immediately make observations like “the new technique seems to win on the shorter-running benchmarks, but to lose on the longer ones”.

It’s worth pointing out that my scatter plot shows less information than my bar chart, because it does not identify the individual benchmarks. Perhaps, if there are not too many benchmarks, it would be possible to label the points individually, or to use a different colour for each point. Of course, this risks overcomplicating things, and we are often more concerned with general trends than with the performance of particular benchmarks. A reasonable compromise might be to colour a handful of the particularly interesting points so that they can be referred to in the surrounding text.

By the way, I have used semi-transparent markers in my scatter plot. I find this quite an attractive way to deal with multiple points being almost or exactly on top of each other. With opaque markers, coincident points could get lost.

Scatter plots can cope with more adventurous situations too. For instance, here is a scatter plot that compares two variants of the “new” technique against the “old” one.

And here’s a scatter plot that conveys the uncertainty surrounding each data point using ellipses. The width of the ellipse corresponds to the uncertainty in the x-value, and the height of the ellipse corresponds to the uncertainty in the y-value. Ellipses that cross the y=x diagonal represent benchmarks where we’re not sure which technique is better.

Here are some examples of graphs from PLDI 2019 that might be better as scatter plots. These three graphs all use a logarithmic scale for normalised data, thus meeting the first criterion in this post, but all have under-employed x-axes that are only being used to spread out benchmarks in an arbitrary order. This example is quite interesting – here, the benchmarks are not in an arbitrary order along the x-axis; they are in descending order of their y-values. However, I think showing “time taken by str.KLEE” against “time taken by vanilla.KLEE” on a scatter plot would be more informative. These four examples all use linear scales, but this is fine because they’re not plotting normalised values. These four examples all use scatter plots, and appropriate scales. My only suggestion here is that they could all benefit from semi-transparent data points, so that overlapping points don’t get lost. This final example is a scatter plot, with appropriate scales, and with no need for semi-transparent data points. There’s not much to fault here; my only suggestion is to use exactly the same scales for the x-axis and the y-axis, so that the y=x diagonal crosses from the bottom-left corner to the top-right corner.