What Refactoring Is and Is Not

I bet you’ve heard something like this before.  Some developer somewhere says something like, “We should just take three dedicated months and completely refactor the entire customer support module, soup to nuts.  Switch to a better database, overhaul the GUI, and fix everything in between.”

This invocation of refactoring might slip by, completely unnoticed.

You’re not focusing on the particulars or on the meaning of the word “refactoring.”  Instead, you’re probably picturing context and backstory.  What got the team, and this developer in particular, to the point of saying this?

Most likely, their code has been in a state of slow rot.  With each new upcoming (and eventually missed) deadline, the team scrambles more and more to get software out the door.  The business declares that it needs functionality no matter what, and the team lurches toward obliging them.  Leadership notes but ultimately dismisses the objections of developers on the team.  They listen soberly and somewhat earnestly to impassioned pleas that involve terms like “technical debt.”  But ultimately, what’s anyone to do?

And so you hear those words I mentioned in the first paragraph.  That developer has lived and breathed the customer support module for a couple of years, presiding reluctantly over its descent into unsustainable chaos.  He wants to hit the pause button—wants to get in there, clean up all the cruft, modernize everything and really do it right.  He wants to refactor until he’s got a clean slate.

It’s a completely understandable sentiment.  And it’s also categorically not refactoring.

What Is the Problem With This Use of Refactoring?

Why not?  What’s wrong with this usage?

Well, I’ll offer two main issues with the notion of refactoring as implicitly defined in this story.

  1. The developer is describing a series of changes that would affect input to and output of the application (using a different database and changing the GUI).
  2. He’s also describing a massive change that will result only in extremely coarse feedback.

Refactoring: An Official Definition

Let’s look to software luminary Martin Fowler for an “official” definition of refactoring:

Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.

Given that definition, you can understand my first objection.  But what about the second one?  There’s nothing about the scale of the change in Martin Fowler’s definition at all.  Am I just making stuff up?

No, I’d argue.  I’m not just making stuff up.  Instead, I’m taking things to their logical conclusion, given the story in question.  For the rest of the post, I’ll explain how I arrive at that definition by delving more into what refactoring is not.

Refactoring Is Not Massive Changes

Based on the official definition alone, we can establish that our developer’s desire to “refactor” does not actually count as refactoring.  A new GUI and a new database both ipso facto change the external behavior of the software.  But the size of this refactoring also disqualifies it.

To understand why, consider the scope being bandied about.  He talks about “three dedicated months.”  Let’s set aside the idea of GUI and database changes and pretend he was just talking about a three-month effort to restructure the code for better maintainability.  He has no intention of changing any external-facing component or behavior of the software at all.

Does this now count as refactoring?

Well, it would, if he executed flawlessly on this insanely difficult goal.  The developer or his entire team might intend to hack away at a nasty legacy module for three months without changing its behavior one iota.  But do you expect them to succeed?  Do you think there’s even a remote possibility that, in at least 520 man-hours of banging away at that module, not the slightest thing about its behavior will change?

There’s no chance.  Any effort this large will affect the external behavior, without question.  Call it an overhaul, a reworking, or a legacy rescue.  But if you want to be accurate, do not call it refactoring.

Refactoring Is Not Change Without Automated Tests

This begs a question.  Say we take the scope of the effort down considerably.  Instead of three months, call it three weeks or three days.  Heck, call it three hours.

Even then, do you know you’re not altering the external behavior of the system as you make your changes?

The answer is no, you don’t.  Now, one might make the epistemological argument that you could never truly know that.  But leaving philosophy out of the equation, we have practical means at our disposal for assuring ourselves that changes do not affect our software’s behavior.  They’re called automated tests.

While you can never truly know whether a change you’ve made results in the software always behaving exactly the same in all runtime contexts, you can get an awful lot closer when you have a robust test suite detecting behavior changes.

And with that, we’re narrowing things down considerably.  Refactoring means you’re not explicitly changing your system’s behavior.  But it also means that you’re not doing so unintentionally by making massive changes or changes without the benefit of a test suite.

What Refactoring Is: Practical Implications of the Rigorous Definition

Having defined some tangible boundaries and talked about what refactoring isn’t, let’s get at what refactoring really, truly is.

As the official definition suggests, refactoring is the act of changing the software’s structure without changing external behavior.  And by logical implication from what it isn’t, refactoring is making these changes in very bite-sized chunks while covered nicely by a suite of automated tests.   In fact, it starts to hew pretty closely to the red-green-refactor cycle of TDD.

And this brings us to what I think should truly define refactoring.  Refactoring is not a project, and it’s not an effort—not any more than writing or compiling code is a project or an effort.  Refactoring is a constant, lightweight-but-persistent, improvement of the code that you’re working with.  You get your code working, you prove that it works with tests, and then you refactor it to make it clean.

When you’re doing refactoring right, it’s something that you’re doing more or less every minute that you’re coding.

Why Does This Matter?

Why spend an entire blog post talking about this?  You might understandably wonder if I’m just pedantically arguing semantics or narrowing an accepted definition to suit an agenda.  But the reason for this definition and this line of argument goes back to the story I opened the post with.

This developer—the one that we’ve all encountered—views refactoring as a project.  He’s spent years systematically making a piece of software worse because of deadlines, lack of time, technical debt, and a dozen other reasons that he’ll cite.  His plight is sympathetic but, frankly, self-imposed.  He’s in the position that he is precisely because he thinks of refactoring as a project and not as little changes you make continuously, with passing tests, to keep your code in good shape.

Thinking of refactoring as a project is a self-fulfilling prophecy.  And so is thinking of refactoring as a continuous effort that goes hand in hand with normal development.  So I encourage you to adopt the definition we’ve established here, for your own sake.

Source of Featured Image: Photo Courtesy of Wiki Commons: Sistine Chapel, the prophet Daniel before and after Restoration