An Introduction to Automated Testing with Behat

automated testing illustration

Behat is based on the principles of Behavior-Driven Development.

We do many kinds of testing in Web Services.  In my role as Quality Assurance Analyst, I have been most directly involved with what we call Functional Testing. This type of testing has included elements of several different kinds of testing:

  • Acceptance Testing: Verifying that the project's requirements have been met

  • Data Integrity Testing: Verifying that database operations are functioning properly

  • User Experience Testing: Gathering information about how people interact with an application

Historically, our Functional Testing process has looked something like this:

  1. Team members create project documentation (requirements lists, to-do lists, functional specs, wireframes, etc.).

  2. QA Analyst reviews project documentation.

  3. QA Analyst writes test script (MS Word doc and/or MS Excel spreadsheet).

  4. Staff volunteers manually execute test script.

  5. Staff volunteers record feedback in Basecamp to-do list.

  6. Team members review volunteer feedback.

There are two main inefficiencies in this process. First, we're recording functional requirements in at least two elaborate documents, often more. Second, staff volunteers are using their valuable human time to do relatively menial verifications.

Automated Testing: One Failed Approach

We heard about the idea of automated testing and had investigated some tools like Selenium, which can operate a browser's UI automatically. Our difficulty with this kind of tool was that the instructions had to be so specific that the test plans were very brittle. If you made a cosmetic change to the site (e.g, moving a button or changing some content), it could cause your test to fail, even though the site worked perfectly well from a client's or a user's perspective.  Selenium is a perfectly good tool, but it wasn't a good fit for what we were trying to do.

Automated Testing Using Behat

My colleague Rocco Palladino introduced me to (and patiently taught me everything I know about) Behat.  Behat is an automated testing tool based on the principles of Behavior-Driven Development, or BDD. In BDD, the actions your client needs their application to facilitate become your units of work. The process of testing with Behat looks something like this:

  1. Team members work with the client to document required actions by role.

  2. Programmer codes automated tests to verify required actions.

  3. Programmer codes application based on test and runs tests through the command line. When tests pass, development is complete.

Let's look at the first of these steps in detail:

1. Documenting Requirements for Use in Behat

Behat requirement documents are written in a language called Gherkin. Gherkin is plain English formatted in a specific way so that the Behat software can parse it. My favorite thing about the entire Behat testing process is the way this documentation format helps me think through the requirements and rules of the application. I have tried to come up with templates for functional specifications in the past and the best I could do was to make a list of vague best practices. Writing in the Gherkin language is better than a template because it is infinitely flexible while still demanding concrete answers to the important questions.

A. What Not to Say: A Technology-Agnostic Approach

One of the ideas behind BDD is the idea of a, "Ubiquitous Language," or, a common vocabulary. We have all worked on projects where we refer to parts of the site one way internally and a another way with the client. Sometimes, even within the team, we have trouble settling on shared terms. Sometimes we can't even seem to figure out what to call the project itself. This can be not only frustrating, but dangerous, because important details can get lost in translation between dilalects.

Client: "Is Phase One open?"
You: "Which Phase is Phase One? You mean the Application Phase?"
Client: "I don't know."
You: "Well, if you want something to be open, you must mean the application."
Client: "I want people to be able to log in, but not submit applications. Does that mean it's open?"
You: "They can submit in the Application Phase, but not in the Public Phase."
Client: "I hate you."

BDD addresses this problem by encouraging requirements gatherers to use, "the language of the domain." This means that we should adopt the vocabulary native to the business of the application we're building. We should use the client's names for things and describe functions in terms of their business purpose. This might seem like common sense, but the implications are profound. If we are writing about a business process in the language of the domain, we should not be describing any particular technology or mechanism. That is, even though we are web developers building software applications, we should never express a client's requirements in terms of pages, buttons, checkboxes, drop-downs, database fields, or clicking. This is crucial for three reasons:

  1. It helps us think clearly about the business problems before we apply our filter of technology solutions and the assumptions that go with it. "If all you have is a hammer, everything looks like a nail."

  2. It creates a more versatile document: We would like to use these requirements to do automated testing through a browser and its UI, but also on a more primordial level, by checking the logic of the code in memory. If the requirements describe the UI, they can't be used both ways.

  3. It creates a more durable document that is easier to maintain: Recall the fragility of the Selenium tests I mentioned earlier. If your requirements state that the button is on the left, and it gets moved, a test of those requirements will fail, even if the form isn't broken. If, however, you scope your requirements to the language of the domain, a test of those requirements will fail only if a critical business process fails. A happy consequence of this is that you only need to update your document when a core business need changes.

Do you know how many times I've had to update a test plan because someone moved a button? Too many times. The client doesn't actually care if there's a button; they care that someone can provide them with the information they need. The button was our idea—it was probably a good idea and the right tool for the job, but I don't mention buttons in test plans anymore.

B. What to Say and How: Gherkin Specs, or "Features"

A feature captures all the information about an action that the application must facilitate. My colleague and I chose to organize our feature files by role. In the code repository for the application, we have a Features folder, which contains a subfolder for each role.  Here is an example of a feature:

Feature: Recommender accesses recommendation form

  In order to recommend an applicant
  As a recommender
  I need to access the recommendation form

  Scenario: Successfully access recommendation form
    Given the recommendation phase is open
    And I have received a recommendation request for an applicant
    When I try to access the recommendation form
    Then I should see the recommendation form
    And I should see the current cycle year
    And I should see the applicant's first and last name
    And I should see the applicant's first-ranked OAide type
    And I should see the recommendation submission deadline

Let's look at each of the components of a feature file.


Feature: Recommender accesses recommendation form

Each feature has a title. We title our features in the form [Role Does Action], for example, "Applicant Accesses Application" or "Administrator Views Application."


In order to recommend an applicant
As a recommender
I need to access the recommendation form

The format does not technically require a story to run the test, but this is one of the most important parts of the document for definition purposes. We write stories in three lines as follows:

  • In order to [benefit]
  • As a [role from title]
  • I need to [action from title]

If you can't express a feature of the application in these terms, you know something is amiss. If you and the client together can't think of the benefit of a feature, you don't need it.  If you find yourself listing the same benefit for more than one feature, chances are you need to be more specific in describing your benefits.


Successfully access recommendation form

Scenarios are the “body” of a feature. A scenario describes the result of an attempt to use the feature under certain circumstances. Scenarios for the feature "Applicant accesses form" might be:

Successfully access form
Cannot access form because the application is closed
Cannot access form because applicant has not logged in
Cannot access form because applicant has already submitted the form

We have found it helpful to write the success scenario first (since there has to be one), and then to write a separate scenario for each different reason one might fail when attempting to use the feature.

Each scenario should be self-contained and independently verifiable. This took a little while for me to wrap my head around because I was used to writing very long, complex, linear paths through an application that visited all the different conditional cases in an application. The main problem with writing tests this way is that each step depends upon the one before. If a tester got stuck three rows in to one of my test plans, they had to stop, and the rest of test plan became useless. Writing scenarios correctly avoids that problem. If one scenario fails, that failure doesn't prevent you from testing the other scenarios in the feature.

Scenarios are the acceptance criteria for the feature.  If you read the names of the scenarios, you’ll have a concise list of what the feature must do.


Given the recommendation phase is open
And I have received a recommendation request for an applicant
When I try to access the recommendation form
Then I should see the recommendation form
And I should see the current cycle year
And I should see the applicant's first and last name
And I should see the applicant's first-ranked OAide type
And I should see the recommendation submission deadline

A scenario is made up of lines called steps. We generally write steps in the first person to make them clear and immediate. There are three main kinds of steps:

  • GIVEN steps set the scene for the scenario. They describe a current state of affairs, or an action that has already been done. Each scenario should have a different "Given" statement.
  • WHEN steps are the key action in the scenario. The main "When" step in each scenario should mirror the title of the feature.
  • THEN steps describe the results we expect from the scenario. We write "Then" steps in the form, "should...."  These are the only steps written in the conditional mood.

We also can use "And" or "But" steps to add or clarify conditons under "Given," "When,"or, "Then," steps.

2. Step Definitions

Once you've written and revised your Features with your client, you can hand them over to your developer. Our programmers use the Feature files to write tests which will guide their work coding the application itself. The Gherkin language provides a template for the code of these tests, so the programmer can fill in the blanks by indicating which parts of the (as-yet-unwritten, ideally) code should run at each step.

The step definitions are crucial because they help bring the ubiquitous language of the client's process right into the code of the application.  For example, given the requirement: "Nurses administer flu vaccines to patients in standard doses," we should expect to see code like:

$nurse->administerFluVaccine($patient, FluVaccine::standardAdultDose());

This code is easy for developers to read and understand because it is a direct translation of the client's requirements.  Another option would be to translate instead the database structure required to support this requirement, which would look something like this:


This will work, of course, but is harder to read and maintain because it is not structured with the core functions in mind.

Here is an example of some step definition code:

     * @Given the recommendation phase is open
    public function theRecommendationPhaseIsOpen()
        $open = new DateTime("-1 week");
        $close = new DateTime("+2 months");
        $this->setPhase("RecommendationActual", $open, $close);


     * @When I (try to) access the recommendation form
    public function iAccessTheRecommendationForm()
        $this->visitPath(sprintf('/recommender/index.php?key=%s', $this->accessKey));

     * @Then I should see the recommendation form
    public function iShouldSeeTheRecommendationForm()
        $this->assertSession()->elementExists('css', "div.content h3:contains('Recommendation Form')");
        $this->assertSession()->elementExists('xpath', '//form[starts-with(@action, "recommendation.php")]');

3. Development and testing

Once the programmer has coded the tests, he or she writes code for each feature, running the tests as he or she pleases until the tests pass.

Some Limitations of Behat

  1. Behat is excellent for automating acceptance testing, that is, testing that the required features of the application are working properly.  It can also be useful for testing data integrity.  This covers most of what we've called "Functional Testing", but not everything.  When used as described above, Behat would not test browser or platform compatibility, and could not catch rendering issues or layout problems.  These details are still vitally important, and are best verified by a real human being.  We expect to continue to use staff volunteers to review our applications to some degree, but with a more exploratory, less transactional focus.

  2. Because it requires step definitions, Behat requires programmers to write more code (in the form of tests) than would be strictly necessary to create the application over the short term.

Some Benefits of Behat

  1. Automates acceptance testing: Under our old process, it can take several people hours or days to test the core functions of our more complex applications.  With Behat, that work can be done in minutes, if not seconds.
  2. Makes requirements easier to write, and easier to read: The Gherkin-language feature is a delightfully elegant format, especially compared to other functional specification formats (arbitrarily organized MS Word outlines ten levels deep) and test plan formats (MS Excel workbooks with endless rows and tabs) I have used in the past.
  3. Provides foundation for ubiquitous language: The integral relationship between the features, the step definitions, and the code itself reinforces the use of the client's own language for their process, and does not incentivize us to invent proprietary terms.
  4. Reduces the types of documents needed for complex projects and clarifies their scope: A set of features with step definitions achieves the combined aims of our functional specification, functional test script, client acceptance test script, and data integrity test plan.  Other sorts of documentation like wireframes, process maps, and state diagrams, can of course be employed as the team sees fit.  With the requirements captured in features, these sort of specialized documents are now free to play to their respective strengths.

Useful References

Gabe McElwain