Landmark microprocessor reliability paper recognized for enduring impact

Published in 1999, Todd Austin’s paper turned out to be a major contribution to the field, cited over 870 times

Todd Austin Enlarge
Prof. Todd Austin

At the turn of the century, Moore’s Law was still in full swing – the future of microprocessor development was on track for massive growth, unhindered by the physical limitations designers are working to overcome today. But while it may seem like a simpler time in hindsight, the era was beset by many challenges as chip designers and fabricators worked to bring the size of transistors to an unheard of new scale. Their innovations set the stage for the technical revolutions we’ve seen in the years since.

During that period of rapid development, Prof. Todd Austin authored a paper that addressed the critical issue of reliability tests for microprocessors. The work presented dynamic verification, a microarchitectural technique that significantly reduced the burden of correctness in microprocessor designs, relieving some of the pressure in the testing stage.

Published in 1999, “DIVA: a reliable substrate for deep submicron microarchitecture design” turned out to be a major contribution to the field. The paper has been cited 335 times, including 20 times in 2018, and has been referenced in 11 patents. Now, to recognize the work’s lasting relevance, Austin has earned the IEEE/ACM MICRO Test of Time Award, awarded each year to an influential MICRO paper whose impact is still felt 18-22 years after its initial publication.

Along with performance and cost, reliability is one of the most important characteristics of any computer system, Austin wrote. Users need to be able to trust that when a processor is put to a task, its results are correct. If this is not the case, there can be serious repercussions, ranging from disgruntled users to financial damage to loss of life – and this has only become more true with time. Faulty parts have resulted in many cases of bad press, lawsuits, costly replacements, and reduced customer confidence.

To avoid these reliability issues and their consequences, chip designers devote great time and resources during design and fabrication to verifying the correct operation of all parts. They do this by applying functional and electrical verification to their designs, which have to ensure the correctness of large, complex systems. When transistors entered deep submicron territory, on the scale of less than one millionth of a meter, reliability testing grew even more complicated. Components this small presented new reliability challenges in the form of degraded signal quality and logic failures caused by natural radiation interference.

Austin helped streamline this arduous testing phase and eliminate the need for strict correctness across the board with the development of dynamic implementation verification architecture (DIVA). This architecture was a major shift in common design wisdom of the time. It split a traditional processor into two parts, the DIVA core and DIVA checker, and allowed fabricators to reliably verify the entire processor by just testing the checker.

The DIVA core is composed of the entire microprocessor design except the retirement stage, which is when the results of the execution stage are placed into other processor registers or the computer’s main memory. The core fetches, decodes, and executes instructions, and then sends their inputs and results in program order to the DIVA checker. The DIVA checker verifies the correctness of all core computation, only permitting correct results to pass through to the next stage and storage. If any errors are detected in the core computation, the checker fixes the error in computation, flushes the processor pipeline, and then restarts the processor at the next instruction.

This technique helped alleviate the growing pressure surrounding transistor verification as the components grew smaller and smaller. Transistors outside of the checker unit can scale to smaller sizes without fear of natural radiation interference. If they experience an energetic particle strike that leads to incorrect results, the checker will detect and fix the incorrect computations.

The time saved by only having to verify the checker offers a dramatically reduced overall design cost. Austin’s paper detailed the DIVA checker architecture, which was optimized for simplicity and low cost. He was able to demonstrate that the checkers had little impact on core processor performance, and argued that the DIVA checker should lend itself to functional and electrical verification better than a complex core processor. He also looked to the future at applications that could leverage dynamic verification to increase processor performance and availability.

This award is the fifth annual MICRO ToT Award to be given, and was presented at the International Symposium on Microarchitecture, held on October 14, 2018 in Fukuoka, Japan.