Automated software repair paper recognized for 10 years of influence
Ten years ago, Prof. Westley Weimer set out to shave off time and expenditure sunk into patch fixes.
Patch fixes will always be a fact of life in software development – no program is perfect, and some of the largest industrial software products have millions of use cases to account for that no amount of testing can cover. Most developers spend more time reading and fixing code than they do writing it for the first time – as much as 0.5% of the US GDP goes to fixing software errors and problems associated with bad testing every year.
Ten years ago, Prof. Westley Weimer set out to shave off some of that time and expenditure. His solution was controversial, applying a novel approach to software development called genetic programming to generate automated patches to even large, complicated programs. The paper, called “Automatically finding patches using genetic programming,” has since been cited over 500 times and served as groundwork for tools in use by companies like Facebook. Now, the International Conference on Software Engineering has recognized the work with a Ten-Year Most Influential Paper.
When it comes to large industrial software, small bugs are easy to ignore or work around. Companies use a technique known as triage to determine which bugs are prominent enough to be priorities, and which can safely ship without major consequences. Programs like certain versions of Windows have shipped with tens of thousands of known defects, because Microsoft judged that they just weren’t worth the time to fix.
“So, rather than trying to come up with more bug reports,” says Weimer, “I switched over about 10 years ago to saying, what if we could reduce the costs associated with fixing the ones we already have?”
This was a controversial concept at the time, thanks to the prevailing view that software is fragile, and humans are the best we can get when it comes to fixing it. In sensitive cases like an airplane’s autopilot, a radiation dosing machine, or Facebook’s privacy settings, human eyes are seen as mandatory to make sure all necessary functions are working properly.
Unfortunately, there often aren’t humans to do the job completely, and even when there are those human patches are incorrect or reverted as much as 25-35% of the time. Humans are wrong more often than we would like to admit. So, according to Weimer, there’s room for helping people out.
The core of Weimer’s automated solution is to help, rather than completely replace, human developers. As Weimer puts it, there’s a creative problem in software development, similar to writing a short story or a poem – both writers face the “tyranny of the blank page.” If you’re given a problem and there’s nothing there, it’s really hard to get started. But it’s often easier to offer criticism of somebody else’s work.
In the work just before this paper, Weimer and his team did a human study in which they showed tool-generated patches to developers, and it turned out that seeing the patch, even if it was wrong or created by a machine, meant that it took less time for developers to come up with a correct patch.
This meant that even if these auto-generated patches were only used to point human developers in the right direction, it could still reduce the total cost associated with fixing these bugs. The hope was, by automatically covering or helping along the easy bugs, that would free up developer time to tackle the harder bugs that really matter.
To actually generate the patches, Weimer worked with Michigan CSE alumna Stephanie Forrest at the University of Mexico on an implementation that used a paradigm called genetic programming. Genetic programming is an approach in computer science that takes analogs from biological evolution, like selection, mutation, or fitness functions, and uses them to solve programming problems. It poses questions like, can we evolve a population of solutions the same way a population of turtles might evolve on an island over a long period of time?
The initial plan was to search through the space of all possible patches to the bugged program until the algorithm found a good one. It was the “monkeys banging on typewriters” approach – and it didn’t work in any reasonable time.
What they did instead was gain insights from students in entry-level CS courses. When early students are faced with programming problems that don’t function correctly, what they’ll often do is scatter print statements throughout their program to find the bug and figure out what line of code needs to be changed. Students then semi-randomly make changes around that line and resubmit their program until they get the result they want.
Weimer’s approach to automated program repair is basically just that – simulating the semi-random, but directed, changes of a CS-101 student. The program focuses attention on where the bug likely is, a technique called fault localization, and then makes changes that are likely to be realistic.
In the field of genetic programming, these changes are called mutations. The changes themselves are determined in one of three general ways – either through removing code, shuffling code, or copying code written by humans elsewhere in the program in question.
When a combination of changes successfully compiles, the newly patched program is run on test cases that ensure two things: first, that the program’s existing functionality is unchanged (called regression testing), and then that the problem bug has been resolved. As soon as that is achieved, the patch can be shown to developers for further tidying up – a big time-saver over writing a patch from scratch.
Another issue solved by this approach is security deployments. In the event of a major product crash or security leak, hotfixes are needed at a moment’s notice to keep the product running without more catastrophic issues. Even a bug that needs improvement later but gets the job done in a pinch can often be an improvement over ceasing operations entirely.
This solution came to inspire a number of iterations that expanded on the idea and made it more practical for real-world solutions. Techniques for large cloud-powered or online companies to determine suspicious or irregular behavior in real time meant the automatic fixes could be applied to problems as they arose, completely removing developers from the immediate response process. This is exactly the workload described by Facebook in their implementation of automated repair, in which they explicitly cite Weimer’s work.
This award marks the third such ten-year award Weimer’s research group has earned.