System to avoid data center inefficiencies earns Best Paper

Called Whisper, the technique mitigates mispredictions in application control flow via efficient profiling.
Group photo of the winning researchers
The winning co-authors of the MICRO Best Paper Award: PhD candidate Tanvir Amhed Khan, SUGS alum Muhammed Ugur, Prof. Baris Kasikci, Daniel A. Jiménez (Texas A&M University), and Heiner Litz (University of California, Santa Cruz).

A new method for eliminating costly stalls in modern datacenter applications was awarded a Best Paper Award at the IEEE/ACM International Symposium on Microarchitecture. The paper, titled “Whisper: Profile-Guided Branch Misprediction Elimination for Data Center Applications,” details a series of innovations to make datacenter microarchitecture less prone to mispredictions, resulting in a 2.8% speedup that could save millions of dollars. The research was led by doctoral candidate Tanvir Ahmed Khan and assistant professor Baris Kaskci.

The project targets inefficiencies in microarchitectural components responsible for predicting future instructions needed by the processor. These components, like the branch predictors, save an enormous amount of time in large applications by anticipating future instructions and moving these relevant instructions to the processor’s cache from much slower memory or discs. Branch predictors in particular specialize in predicting the outcomes of branching instructions or control flow logic, which tends to be quite repetitive in big applications. Despite this, mispredictions still occur frequently over time.

Modern data center applications make use of enormous batches of instructions, making these mispredictions pile up and lead to “performance losses worth millions of dollars.”

“These applications’ hot code footprints range from tens to hundreds of megabytes which overwhelm on-chip cache structures, whose sizes are only hundreds of kilobytes,” the researchers say.

The stalls that arise from these wrong predictions increase the ownership cost of a datacenter enough that even a single digit reduction can slash management and energy costs by millions of dollars. Existing methods to mitigate datacenter stalls can be limited by the accuracy of the branch predictor or simply address different types of frontend stalls altogether, with minimal impact on incorrect predictions.

The researchers began by measuring these losses, quantifying the performance implications of branch mispredictions specifically on 12 modern datacenter applications. They showed that the  large code footprints triggered frequent mispredictions, rendering current state-of-the-art techniques ineffective.

“Even a highly advanced predictor experiences an average of three mispredictions for every 1000 instructions,” the authors say. The high rate of misses is a result of heavy instruction capacity.

To combat this issue, the team devised Whisper. The technique efficiently profiles an application to identify which of its branch instructions cause the most frequent mispredictions. It then correlates the direction of these branches with many prior branch directions (the application’s history), and encodes this correlation using Boolean formulas to improve the branch predictor’s performance over time.

Whisper introduces three techniques to improve profile-guided branch prediction and reduces 16.8% of all mispredictions.

First, the team uses an approach they call hashed history correlation to make predictions efficiently. Existing profile-guided techniques tend to either keep track of a huge program history, requiring kilobytes of storage for every static application branch, or make use of a history that’s too small to be helpful. Whisper instead profiles even the length of history needed to accurately predict a given branch, comparing a series of lengths and choosing the one with the highest correlation. It then encodes the history into a fixed-length, 8-bit hashed history.

Whisper also implements randomized formula testing. Finding the best formula to predict a branch outcome for a given application scales terribly as the size of the application increases — every n bits introduces an exponentially larger search space. By trying a randomized, uniformly distributed set of candidate formulas, Whisper searches 0.1% of the options and matches an exhaustive search performance 88% of the time.

Finally, they improve on the accuracy of typical Boolean formulas by introducing additional operations like contradiction, tautology, and, or, implication, and converse nonimplication. This approach enables more powerful Boolean formulas, improving branch prediction accuracy while increasing storage linearly.

The team evaluated Whisper with 12 popular datacenter applications that suffer from frequent frontend and misprediction stalls and showed that, on average, their technique eliminated 16.8% of all branch mispredictions over the state-of-the-art baseline, creating an average speedup of 2.8%.

The project was co-authored by Tanvir Ahmed Khan, SUGS alum Muhammed Ugur (now a PhD student at Yale), Krishnendra Nathella (ARM), Dam Sunwoo (ARM), Heiner Litz (University of California, Santa Cruz), Daniel A. Jiménez (Texas A&M University), and Baris Kasikci.