CSE faculty bring significant showing to major systems conference

Researchers designed three new systems to speed up code at several key bottlenecks.

Work by CSE researchers, including Profs. Peter Chen, Jason Flinn, Jean-Baptiste Jeannin (AERO), Manos Kapritsos, Baris Kasikci, Karem Sakallah, and their students, represents 3 of the 38 papers to be presented at the 2019 ACM Symposium on Operating Systems Principles (SOSP) in October. SOSP, held biannually, is one of the leading forums for researchers, developers, programmers, and teachers of computer systems technology.

The U-M papers cover three systems, Aegean, I4, and ShortCut, designed to tackle replication of common services, formal verification of distributed systems, and cutting down on repeat computations. The team behind I4 drew on the combined expertise of systems and automated reasoning researchers.

Aegean: Replication beyond the client-server model
Remzi Can Aksoy (CSE alum) and Prof. Manos Kapritsos

Many of the services you access frequently on the web are actually sets of multiple services that are woven together into large, complex systems. When you make a request to an online store, the storefront has to then make a request to a credit company to verify your credentials and process the payment. Such nested requests exist all over on extremely large scales.

These interacting services are often mission-critical and need to be replicated to handle many, many requests. Replicating these complex services, however, turns out to be very inefficient and often incorrect. Current replication protocols were designed for the simple setup of a client and a server – one person making a request, and one service responding to it. In a world where services interact on a huge scale, though, it’s not enough to account for this simple setup.

To answer this limitation, Aksoy and Kapritsos designed Aegean, a new approach that allows fault-tolerant replication to be implemented beyond the confines of the client-server model. The system offers a new approach to the design of replication protocols in the presence of interactions between services and introduces new techniques that accommodate these interactions safely and efficiently. The researchers’ evaluation shows that a prototype implementation of Aegean not only ensures correctness in the presence of service interactions, but can further improve throughput by an order of magnitude.

I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols
Haojun Ma (CSE PhD student), Aman Goel (CSE PhD student), Profs. Jean-Baptiste Jeannin (AERO), Manos Kapritsos, Baris Kasikci, and Karem A. Sakallah

For more than 50 years, systems researchers and industry have relied on testing to increase confidence in the correctness of software. While demands rise, however, testing can fall short – it’s impractical to exhaustively test any program, let alone massive industrial systems. This leads to bugs during production, as well as a loss of availability, revenue, and product reputation.

In answer to this, the field of formal verification has grown in parallel. These techniques provide a means to build complex software that can be verified mathematically, without the need for testing as many cases as possible. But existing approaches to formally verifying complex systems have a major scalability bottleneck. These techniques use certain automated tools to simplify the proofs that make up the verification process, with the major exception of one critical process: finding an inductive invariant.

Inductive invariants are provable conditions that hold for every state the system can take on, as well as all states that might come next. Designers have to essentially prove that, with a certain starting point where a condition is true, that condition will remain true for any execution the software makes. Finding this condition, the inductive invariant, is the most difficult part of any proof, and current proof techniques require them to be found manually – and painstakingly – by the developer.

In this paper the researchers present a new approach, called Incremental Inference of Inductive Invariants (I4). This tool can automatically generate inductive invariants for distributed protocols. The main idea behind the system is simple: the inductive invariant of a finite instance of the protocol can be used to infer a general inductive invariant for the infinite distributed protocol. In I4, the team created a finite instance of the protocol, used a model checking tool to automatically derive the inductive invariant for this finite instance, and generalized this invariant to an inductive invariant for the infinite protocol.

Their experiments show that I4 can prove the correctness of several distributed protocols like Chord, 2PC and Transaction Chains with little to no human effort.

ShortCut: Accelerating Mostly-Deterministic Code Regions
Xianzheng Dou (CSE Phd student), Profs. Peter Chen and Jason Flinn

Applications often perform repeated computations that are mostly, but not exactly, similar. Every time you launch an application it performs similar initialization and setup, and accessing dynamic content on a website requires similar actions from the web server. These mostly-similar computations consume resources, increase latency, and decrease throughput.

Currently, systems use generic approaches to speed up computation that’s completely identical and deterministic. Unfortunately, most code regions of reasonable size meet neither of those qualifications – but many come close. To handle these cases, application-specific solutions are often deployed to meet performance requirements. These are, of course, tailor-made for a certain setting, costly, and time consuming to develop. Generic performance optimizations are far more useful than hand-crafted solutions.

This paper presents ShortCut, which generically accelerates mostly-deterministic computation without requiring application modification or source code. To operate, the system breaks code into sequences of related instructions called regions. For each accelerated region, ShortCut captures a diff, or a record of changes made to memory and register values and the system calls executed. ShortCut then generates a program that modifies the diff to account for non-determinism and variant inputs that arise in subsequent executions of the region. This program, called a slice, consists of the instructions in the region that depend on any input that may change; typically, the slice executes many fewer instructions than the original execution

Assuming only likely values rather than allowing all possible values makes it possible to generate complex slices and execute them much faster. Slices are self-verifying; they include predicates that verify all assumptions made during a subsequent execution. When these verifications succeed, the slice is guaranteed to produce a correct modification. If a verification fails, ShortCut transparently rolls back the slice execution and runs the original computation instead. Users see no difference between normal execution and those sped up by ShortCut.