Refactoring

Software Entropy

A program spends most of its life in the maintenance phase. During this phase new features are added to the program, bugs are repaired, and the program is adapted to work with new devices, servers, and platforms. Often, these modifications are not made by the original developers of the program, so it's not surprising to see the integrity of the original design deteriorate over time. Methods become longer and their logic more convoluted. Some classes become bloated with additional responsibilities, while other classes become irrelevant. Replicated code appears. Type tags multiply in number and significance. Architectural flourishes added to accomodate some permanently futuristic generalization mock us like a derelict "House of Tomorrow". Parallel inheritance hierarchies develop. Comments pile up like red flags. Let's call this tendency to disorder software entropy:

software entropy = (design integrity)-1

As in a physical system, entropy increases dramatically over time:

(It might be possible to quantify software entropy as the inverse of some combination of favorite design metrics: cohesion degree, coupling degree, etc.)

Anti-Patterns

A design pattern is meant to summarize some fragment of good design: Publisher-Subscriber, Decorator, Strategy, etc. A pattern catalog is an organized collection of design patterns. In a similar spirit, an anti-pattern summarizes some fragment of bad design, and an anti-pattern catalog is an organized collection of anti-patterns. Another way to look at software entropy is that over time patterns turn into anti-patterns. (Kent Beck calls anti-patterns "bad smells". In this sense, a program is like a beautiful banquet that has been left out on the table too long.)

(Technically, an anti-pattern is a bad solution that may have looked good at some point, a corrective action gone awry. Some of my anti-patterns are perhaps more like kluges: bandages that were never seen as an optimal solution.)

The Refactoring Process

Fortunately, we can combat software entropy. The antidote is refactoring. Refactoring means improving the design of existing code. The idea is to gradually apply refactoring transformations to the original program, P0:

Each transformation preserves behavior and decreases entropy:

behavior(Pi+1) = behavior(Pi)
entropy(Pi+1) < entropy(Pi)

If we think of the behavior of a program as the function it implements, then guaranteeing that behavior is preserved is guaranteeing that the function implemented before the transformation is identical to the function implemented after the transformation. This may require a difficult mathematical proof. Instead, we only verify that a tiny finite fragment of the function implemented by the program before applying the transformation is the same as the fragment implemented after the transformation. This can be done quickly if we associate a suite of tests with the original program. Running the test suite after each transformation quickly compares the outputs produced by the program with a number of predefined test cases, producing a pass or fail result. Thus, each transformation guarantees:

pass(Pi, tests) => pass(Pi+1, tests)

This verification step can be facilitated by using a testing framework.

 There are a number of well known refactoring transformations. We call these transformations refactoring patterns or simply refactorings. A refactoring catalog is an organized collection of refactoring transformations. We present some of the refactorings from [Fowler]:

Volume I: Class-Level Refactorings

Volume II: Method-Level Refactorings

Volume III: Block-Level Refactorings

Refactoring Table

A refactoring table matches anti-patterns to the refactoring transformations that can often eliminate them.

eXtreme Programming (XP)

Refactoring doesn't have to be confined to the maintenance phase. Programmers may face new or changing requirements in any phase of development. Despite best efforts, the original design of the program may not have adequately anticipated the new requirements, and refactoring transformations may be needed to accommodate the new requirements. Of course this assumes that a test suite for the program already exists. It also requires new test cases be written to verify that the new requirements have been properly implemented.

A cynic might say that the entire design enterprise-- the art of anticipating changing requirements-- is doomed from the start. Why not abandon design altogether. Write test cases for each new requirement that comes along, add code to the program that implements the new requirement, then run all of the tests (old and new). If they all pass, repeat the cycle for the next requirement, otherwise debug and refactor the program until it passes all of the tests:

This style of programming has been variously called Extreme Programming, XP, Agile Development, up-front design, or Test-Driven Development.

Pairs Programming

Extreme Programming advocates pairs programming: two programmers sit in front of a computer; one types, while the other looks over his shoulder and kibitzes. Occasionally the two programmers switch places. The idea is simple: two heads are better than one, like Lennon and McCartney.

Hacking and Refactoring

Eric Raymond posted an interesting essay on what he sees as the commonality between XP, the development of Unix, the development of the Open Source movement, and hacker philosophy. See http://www.artima.com/weblogs/viewpost.jsp?thread=5342.

Refactoring Tools

Raymond Lee wrote an excellent MS thesis on this topic called "Automated Refactorings for Java Programs" (2002) which included an extensible tool called DART (Design Analysis and Repair Tool). The thesis can be found online at:

http://raymondwlee.europe.webmatrixhosting.net/mscs.mscsthesis.html

Web Sites

Fowler's Refactoring Home Page is at http://www.refactoring.com/

Don Wells has a nice site on eXtreme Programming at http://www.extremeprogramming.org/

There's also plenty of stuff on the web about Anti-Patterns. See http://c2.com/cgi/wiki?AntiPatterns for example.

There have been other attempts to formally define software entropy. See http://serlab2.di.uniba.it/serlab/sw_entropy.htm for a start.

References

[FOWLER] Martin Fowler; Refactoring: Improving the Design of Existing Code; Addison-Wesley; 1999.

[Beck] Kent Beck; Test-Driven Development; Addison-Wesley; 2003.

[Bruegge] Bernd Bruegge and Allen H Dutoit; Object-Oriented Software Engineering Using UML, Patterns, and Java, ed. 2; Prentice Hall; 2004.