Refactoring Conditional Logic

As programmers we strive to write code that is readable and maintainable. Complex conditional logic, however, can easily become confusing and difficult to grasp quickly.

Martin Folwer's book Refactoring offers a number of techniques for making conditional logic simpler that I have found to be very effective. In this post I'll demonstrate how to apply some of these techniques. While the following code example is written in PHP, the lessons will apply to just about any language you use.

Let's say we have an Emailer class with a method on it for validating a list of addresses to which an email will be sent:

    public function validateEmailAddressList($emailAddresses)
    {
        if (is_string($emailAddresses)) {
            $emailAddresses = explode(",", $emailAddresses);
            foreach ($emailAddresses as $email) {
                if (! filter_var(trim($email), FILTER_VALIDATE_EMAIL)) {
                    $valid = false;
                    break;
                } else {
                    if (! preg_match("/uchicago.edu$/", $email)) {
                        $valid = false;
                        break;
                    } else {
                        $valid = true;
                    }
                }
            }
        } else {
            throw new InvalidArgumentException('The address list must be a string.');
        }

        return $valid;
    }

The body of this method is a nested conditional expression with a return at the end. It enforces two rules about what constitutes a valid email address list, namely, that each item in the list is a properly formatted email address (according to RFC 822) and each address ends in "uchicago.edu". While this code is relatively simple, it's still much more complicated than it needs to be. For example, if we needed to add a new rule, it's not immediately clear where we'd modify the code. We'd have to carefully read through it to figure out where in the control stucture to place a new condition.

We can apply a series of refactorings to improve this code: Replace nested conditional with guard clauses, Remove control flag, Consolidate conditional expression, and Extract method.

Replace nested conditional with guard clause

Whenever I come across a nested conditional structure, the first thing I always do is look to see if any part of it can be replaced by one or more guard clauses. A guard clause is a block of code at the beginning of a function or method that ensures any preconditions have been met. The body of the guard clause should only return or throw an exception. In our example, we can eliminate the outermost level of nesting by using a guard clause. We'll apply this refactoring where the type of the $emailAddresses argument is verified to be a string. We can replace this:

    if (is_string($emailAddresses)) {
        // body of the then clause
    } else {
        throw new InvalidArgumentException('The address list must be a string.');
    }

    return $result;

with this:

    if (! is_string($emailAddresses)) {
        throw new InvalidArgumentException('The address list must be a string.');
    }

    // body of the then clause

    return $result;

When reading the first version I have to make a mental note of the initial condition, is_string($emailAddress), as I'm reading the body of the then clause and recall this condition when I get to the else clause where an exception is thrown.

The second version clearly expresses a fact: the method expects $emailAddress to be a string, and if it is not, throw an exception. I no longer have a condition to keep track of as I'm reading through the rest of the function body.

Remove control flag

Now we can focus on the body of the method, which now looks like this:

    foreach ($emailAddresses as $email) {
        if (! filter_var(trim($email), FILTER_VALIDATE_EMAIL)) {
            $valid = false;
            break;
        } else {
            if (! preg_match("/uchicago.edu$/", $email)) {
                $valid = false;
                break;
            } else {
                $valid = true;
            }
        }
    }

    return $valid;

In the foreach, we're checking each email address in the list and setting a variable, $valid, to true or false depending on whether or not it meets our validation rules. After the loop exits or we break from it, we return the value of $valid. This variable $valid is a control flag. Control flags are mainly a legacy of the structured programming rule that a routine should have only one exit. This rule is less necessary in modern programming languages and leads to code that is more convoluted than it needs to be. A control flag can usually be replaced with a return:

    foreach ($emailAddresses as $email) {
        if (! filter_var(trim($email), FILTER_VALIDATE_EMAIL)) {
            return false;
        } else {
            if (! preg_match("/uchicago.edu$/", $email)) {
                return false;
            } else {
                return true;
            }
        }
    }

In this case, however, we'd always return from the loop after the first item. To make sure we correctly check each item in our list of email addresses, we need to take a different approach. Instead of looping over the list, we can filter our list using a callback function to give us a new list that contains only the valid email addresses. Then we can compare our filtered list of valid email addresses to our original list. If they're the same, then our method should return true. To implement this, we move our conditional logic into an anonymous function, which we pass as the callback to PHP's array_filter() function:

	
	$validAddresses = array_filter($emailAddresses, function ($email) {
		if (! filter_var(trim($email), FILTER_VALIDATE_EMAIL)) {
			return false;
		} else {
			if (! preg_match("/uchicago.edu$/", $email)) {
				return false;
			} else {
				return true;
			}
		}
	});

	return $validAddresses == $emailAddresses;

We've now successfully removed the control flag by replacing the assignment/breaks with returns and the loop with array_filter(). By itself this refactoring may not seem to add much clarity, at least in this example. What it does do, however, is put us in a position to remove the conditional expressions altogether by consolidating them, which we'll do next.

Consolidate conditional expression

We should notice that there are two conditions in our example which return false. What may not be immediately obvious is that these conditions can be consolidated using an or expression. Let's see why this is the case. First, we can safely remove the else clauses, creating two ifs that are no longer nested:

    if (! filter_var(trim($email), FILTER_VALIDATE_EMAIL)) {
        return false;
    }
    if (! preg_match("/uchicago.edu$/", $email)) {
        return false;
    }
    return true;

We can do this because when the if condition is met, we return. If it's not met, then execution simply continues; we don't need to branch with an else.

Now it should be clear that the two conditionals can be grouped together:

    if (! filter_var(trim($email), FILTER_VALIDATE_EMAIL) ||
        ! preg_match("/uchicago.edu$/", $email)) {
        return false;
    }
    return true;

We can simplify this code even further by recognizing that the or expression is being stated negatively but can be restated positively. As we have it, the conditional is saying: return false if the email is not a valid email address or it does not end in "uchicago.edu". We can say the same thing positively: return true if the email address is a valid address and it ends in "uchicago.edu". Stated this way, we can simply return the result of the compound expression without the conditional at all:

    return filter_var(trim($email), FILTER_VALIDATE_EMAIL) &&
           preg_match("/uchicago.edu$/", $email);    

Extract method

Let's review how far we've come. After applying the three refactorings above, our method now looks like this:


	public function validateEmailAddressList($emailAddresses)
	{
		if (! is_string($emailAddresses)) {
			throw new InvalidArgumentException('The address list must be a string.');
		}
		
		$emailAddresses = explode(",", $emailAddresses);
		
		$validAddresses = array_filter($emailAddresses, function ($email) {
			return filter_var(trim($email), FILTER_VALIDATE_EMAIL) &&
		               preg_match("/uchicago.edu$/", $email);
		});

		return $validAddresses == $emailAddresses;
	}

It's quite a bit simpler than when we started, but we can still add some clarity. What's happening inside of the array_filter callback? It is validating an individual email address in the given list. We could add a comment to that effect. Better yet, we can extract the body of the callback into a method whose name explains its purpose:

    public function validateEmailAddressList($emailAddresses)
    {
        if (! is_string($emailAddresses)) {
            throw new InvalidArgumentException('The address list must be a string.');
        }

        $emailAddresses = explode(",", $emailAddresses);
        
<code>        $validAddresses = array_filter($emailAddresses, function ($email) {
            return $this->validateEmailAddress($email);
        });

        <code>return $validAddresses == $emailAddresses;
    }

    public function validateEmailAddress($email)
    {
        return filter_var(trim($email), FILTER_VALIDATE_EMAIL) &&
               preg_match("/uchicago.edu$/", $email); 
    }

You could even take this another step further and extract each part of the compound expression into its own method to clarify the rule being tested, e.g., isValidEmailFormat() and isUChicagoAddress().

Conclusion

At this point we have greatly increased the readablility of the code. There is no more nested conditional logic, so we're no longer forced to keep track of conditions as we read through the code. In fact, aside from the guard clause, we've completely eliminated the conditional logic. We've also made the code more maintainable by keeping our methods short and focused on a single purpose. Now it is clear where to look in order to add a new validation rule: we'd modify the expression in the validateEmailAddress() method. 

The next time you come across a complicated conditional structure, try applying some or all of these refactorings to simplify it:

  • use guard clauses to ensure preconditions;
  • use a return instead of a control flag;
  • consolidate conditions that have the same consequent;
  • convert negative conditions to positive ones;
  • extract unclear units of code to a method whose name explains its purpose.

Keeping each of these in mind can also help you avoid writing complex conditional structures in the first place.