A lot of papers include a graph that benchmarks the performance of a new technique against a technique from previous work.
Such a graph might look like this:
The graph is rather straightforward to read: when the green bar is higher, the old technique is faster, and when the red bar is higher, the new technique is faster. This graph provides all the information needed to compare the two techniques. However, it is not easy to see “at a glance” which technique is better.
The fundamental problem, as I see it, is that the x-axis is being underutilised. Where a bar is placed along the x-axis does not matter. As long as the bars are spaced out enough, we do not mind where they sit. We could put the benchmarks in any order, and the graph would be fundamentally the same. This is in stark contrast to the y-axis: the position of the top of each bar on the y-axis is very important: it relates directly to the time taken to run that benchmark.
Enter the scatter plot
We can make better use of the x-axis by drawing a scatter plot:
I reckon that scatter plots take longer to read than bar charts, but less time to understand. What I mean is: it’s immediately obvious from the bar chart that the height of a bar represents the time taken for that benchmark, but it takes a few moments to work out that in the scatter plot, the points above the diagonal represent benchmarks where the old technique is faster, and those below the diagonal represent benchmarks where the new technique is faster.
However, once this has been established, it becomes straightforward to compare the two techniques. One can immediately make observations like “the new technique seems to win on the shorter-running benchmarks, but to lose on the longer ones”.
It’s worth pointing out that my scatter plot shows less information than my bar chart, because it does not identify the individual benchmarks. Perhaps, if there are not too many benchmarks, it would be possible to label the points individually, or to use a different colour for each point. Of course, this risks overcomplicating things, and we are often more concerned with general trends than with the performance of particular benchmarks. A reasonable compromise might be to colour a handful of the particularly interesting points so that they can be referred to in the surrounding text.
By the way, I have used semi-transparent markers in my scatter plot. I find this quite an attractive way to deal with multiple points being almost or exactly on top of each other. With opaque markers, coincident points could get lost.
Benchmarking three or more techniques
A lot of papers do not have such a straightforward comparison between one “old technique” and one “new technique”. There may be several “old techniques” with different strengths that are worth comparing against. And the paper may be proposing multiple “new techniques” that all need evaluating.
To represent this kind of situation, here is a bar chart that compares two new techniques against one old technique.
The problems of the first bar chart have been compounded. It is now even more difficult to work out which of the three techniques is best overall.
Happily, scatter plots can come to our rescue again.
Here, we compare the old technique against both the first new technique (green circles) and the second new technique (blue squares).
As for comparing the two new techniques, this is not completely straightforward – one can observe that the green circles extend both a bit above and a bit below the blue squares, but little else.
Plotting relative values
Often, we do not care too much about the absolute time taken for each benchmark, only the relative time taken compared to the old technique. So, one quite commonly sees graphs like this one, where all the times are divided by the “old” time:
Indeed, here is an example of just such a graph, from a paper published in ASPLOS 2016:
Of course, my graph (and the ASPLOS graph too) is rather redundant because all the red bars have a height of exactly 1! Better would be to use a log scale on the y-axis, and then to draw the green/blue bars as ascending or descending from the y=1 line. A log scale is more appropriate than a linear scale for relative values anyway, as I’ve argued in a previous blog post. This leads to the following graph:
It’s now straightforward to spot which benchmarks have been slowed down by our new techniques, and which have been sped up. But the general trend is still not easy to see. So we turn to another scatter plot:
Here, the x-axis shows the relative time taken by the first new technique, and the y-axis shows the relative time taken by the second new technique. The diagonal line compares the first new technique against the second new technique (points above the diagonal represent benchmarks where the first new technique is faster). The vertical line at x=1 compares the first new technique against the old technique (points to the left of the line represent benchmarks where the first new technique is faster). The horizonal line at y=1 compares the second new technique against the old technique (points below the line represent benchmarks where the second new technique is faster).
Including error bars
There may be some uncertainly in the values that we wish to plot. On a bar chart, this uncertainty is typically shown using error bars. Error bars can be drawn on scatter plots too, but once we have error bars stretching horizontally and vertically from each point (to show the uncertainty along both dimensions), the picture quickly becomes unreadable. One way to show uncertainty on a scatter plot is by varying the size of each data point, so that the larger the data point, the more uncertainty we have about its position. More precisely: we can draw each data point as an ellipse whose width corresponds to the uncertainty along the x-axis, and whose height corresponds to the uncertainty along the y-axis. As an example, the first scatter plot in this post, with additional uncertainty-ellipses, might look like this:
Here, we see that the uncertainty in the measurements means that for two of the benchmarks, we’re not sure which of the two techniques wins.
In conclusion, when you are planning your next “performance comparison” figure, and especially if you are using an unordered set of benchmarks, why not consider using a scatter plot?
NB: LaTeX code for all of the figures above is available.