I believe that normalised data should be plotted on a logarithmic scale.
(By “normalised data”, I refer to data that is the ratio between two measurements that have the same dimension. Examples of such data include
- the ratio between the execution time of a program before a proposed compiler optimisation has been applied and the execution time of that program afterwards, and
- the power consumption of a proposed computer divided by the power consumption of a reference computer.)
In the vast majority of research papers published in computer architecture in the last year, when this data is plotted on a graph, a linear scale is used. I believe that a logarithmic scale would be more appropriate most of the time.
Why a logarithmic scale is appropriate for normalised data
I will illustrate my reasons with reference to the two graphs below, which show some sort of “speedup” that has been obtained on four benchmark programs, A, B, C, and D. The left graph uses a linear y-axis, while the right one plots the same data on a logarithmic y-axis.
- The natural origin for a speedup ratio is 1, not 0. That is, we are primarily interested in seeing whether a data point lies above or below 1 (i.e., whether a speedup or a slowdown was observed) rather than whether it lies above or below 0. In the right-hand graph above, it is immediately obvious that A and B experience a slowdown; this is slightly less obvious in the left-hand graph.
- Going from a 1x speedup to a 2x speedup is surely more impressive than going from a 3x speedup to a 4x speedup. But on the linear y-axis in the left-hand graph above, the distance between 1 and 2 is the same as the distance between 3 and 4, so these feats appear equally impressive.
- Obtaining a 2x speedup is likely to be considered just as good as obtaining a 2x slowdown (i.e., a 0.5x speedup) is bad. But on the linear y-axis in the left-hand graph above, the distance from 1 to 0.5 is much smaller than the distance from 1 to 2, so the speedup experienced by benchmark C is emphasised over the slowdown experienced by benchmark B, even though both have the same magnitude.
- On a linear scale, the “centre of gravity” to which the eye is drawn lies at the arithmetic mean, while on a logarithmic scale, the centre of gravity is at the geometric mean. When averaging dimensionless ratios, most authors tend to use the geometric mean.
Caveat. One should be careful when the ratio being plotted could be zero, for then the geometric mean will degenerate to zero (it being the product of the individual ratios). The log scale will also fail to handle such cases because the logarithm of zero is negative infinity (which tends not to fit on the page).
Remark. There is considerable debate on the relative merits of the arithmetic mean, geometric mean, and harmonic mean when summarising speedup figures. (The debate is well summarised by Citron et al. and by Eeckhout.) I argue that whenever the geometric mean is deemed a suitable average – and it appears to be the most prevalent among the three means when summarising speedup figures – then a logarithmic scale should be used for the same reasons.
Examples from recent research papers
In what follows, I consider sixteen papers that have been published this year in three of the top conferences in programming languages and computer architecture: PLDI 2016, POPL 2016, and ASPLOS 2016. Each paper includes a graph of some normalised data (typically some sort of “speedup” metric) that uses a linear scale. I argue that in each case, a logarithmic scale would be more appropriate.
- The following graph is from Yizhou Zhang et al. Accepting Blame for Safe Tunneled Exceptions in PLDI 2016. The authors report that the graph demonstrates an average speedup of 2.4% speedup, and by examination, I deduce that this refers to the geometric mean. The y=1 line has been added by the authors to help the reader distinguish speedups from slowdowns; such an addition would be unnecessary if a logarithmic y-axis were used instead.
- The following graph is from Michael D. Adams et al. On the Complexity and Performance of Parsing with Derivatives in PLDI 2016. The average speedup is given as 2.04, but it’s not clear whether this is the arithmetic or geometric mean.
- The following graph is from James Bornholt et al. Optimizing Synthesis with Metasketches in POPL 2016.
- The following graph is from Sergi Abadal et al. WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication in ASPLOS 2016. This graph reports both the arithmetic and geometric mean. Note that if a logarithmic y-axis had been used, it would (probably) not be necessary to have several bars extending beyond the top of the graph.
- The following graph is from Xiaodong Wang and José F. Martínez ReBudget: Trading Off Efficiency vs. Fairness in Market-Based Multicore Resource Allocation via Runtime Budget Reassignment in ASPLOS 2016. It reports the geometric mean.
- The following graph is from Haishan Zhu and Mattan Erez Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems [Paywall] in ASPLOS 2016. Their “FG” figures represent time ratios, and are averaged using the arithmetic mean, while their “BG” figures represent rate ratios, and are averaged using the harmonic mean.
- The following graph is from Anurag Mukkara et al. Whirlpool: Improving Dynamic Cache Management with Static Data Classification in ASPLOS 2016. No mean speedup is calculated for these measurements in the paper; only the ranges are reported.
The following graphs are from the same paper. For these graphs, the authors quote the geometric mean speedups.
- The following graph is from Myeongjae Jeon et al. TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services [Paywall] in ASPLOS 2016.
- The following graph is from Tong Zhang et al. TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory [Paywall] in ASPLOS 2016. It gives the geometric mean.
- The following graph is from Nils Asmussen et al. M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores in ASPLOS 2016.
- The following graph is from Daniyal Liaqat et al. Sidewinder: An Energy Efficient and Developer Friendly Heterogeneous Architecture for Continuous Mobile Sensing in ASPLOS 2016. No averages are given, only ranges.
- The following graph is from Jonathan Balkind et al. OpenPiton: An Open Source Manycore Research Framework in ASPLOS 2016. No averages are given.
- The following graphs are from Alex Markuze et al. True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy in ASPLOS 2016.
- The following graph is from Amro Awad et al. Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers in ASPLOS 2016. This graph is quite interesting because the quoted 3.3x speedup is actually the arithmetic mean of all the per-benchmark speedups. The geometric mean would give an average speedup of 2.4x, and is probably the more appropriate measure in this case.
The same paper also reports a reduction in the number of writes to main memory (see the graph below). This graph could also use a logarithmic axis, but there is a problem with the H264 benchmark, which requires 0x as many writes to main memory when “Silent Shredder” is used – i.e., none at all. On a log scale, this data point would need to stretch to negative infinity. It also means that although calculating the arithmetic mean of these ratios feels suspect to me, the geometric mean does not work at all in this case.
- The following graph is from Youngjin Kwon et al. Sego: Pervasive Trusted Metadata for Efficiently Verified Untrusted System Services in ASPLOS 2016. No overall averages are given.
- The following graphs are from Saurav Muralidharan et al. Architecture-Adaptive Code Variant Tuning in ASPLOS 2016.
- Beth C. Gladen and Walter J. Rogan On graphing rate ratios in the American Journal of Epidemiology, 1983. This article argues that relative rates should be plotted on logarithmic rather than linear scales.
- James R. Hebert and Donald R. Miller Plotting and discussion of rate ratios and relative risk estimates in the Journal of Clinical Epidemiology, 1989. This article argues that relative rates should actually be plotted not on logarithmic scales but on reciprocal scales!