Reference Testing Exercise 1 (unittest flavour)

Posted on Mon 28 October 2019 in TDDA • Tagged with reference test, exercise, screencast, video, unittest

This exercise (video 8m 53s) shows how to migrate a test from using unittest directly to the exploiting the referencetest capabilities in the TDDA library. (If you use pytest for writing tests, you might prefer the pytest-flavoured version of this exercise.)

We will see how even simple use of referencetest

  • makes it much easier to see how tests have failed when complex outputs are generated
  • helps us to update reference outputs (the expected values) when we have verified that a new behaviour is correct
  • allows us easily to write tests of code whose outputs are not identical from run to run. We do this by specifying exclusions from the comparisons used in assertions.


★ You need to have the TDDA Python library (version 1.0.31 or newer) installed see installation. Use

tdda version

to check the version that you have.

Step 1: Copy the exercises

You need to change to some directory in which you're happy to create three new directories with data. We are use ~/tmp for this. Then copy the example code.

$ cd ~/tmp
$ tdda examples    # copy the example code

Step 2: Go the exercise files and examine them:

$ cd referencetest_examples/exercises-unittest/exercise1  # Go to exercise1

You should have at least the following three files:

$ ls
  • contains a function called generate_string that, when called, returns HTML text suitable for viewing as a web page.

  • expected.html is the result of calling that function, saved to file

  • contains a single unittest-based test of that file.

It's probably useful to look at the web page expected.html in a browser, either by navigating to it in a file browser and double clicking it, or by using

open expected.html

if your OS supports this. As you can see, it's just some text and an image. The image is an inline SVG vector image, generated along with the text.

Also have a look at the test code. The core part of it is very short:

import unittest

from generators import generate_string

class TestFileGeneration(unittest.TestCase):
    def testExampleStringGeneration(self):
        actual = generate_string()
        with open('expected.html') as f:
            expected =
        self.assertEqual(actual, expected)

if __name__ == '__main__':

The code

  • calls generate_string() to create the content
  • stores its output in the variable actual
  • reads the expected content into the variable expected
  • asserts that the two strings are the same.

Step 3. Run the test, which should fail

$ python   #  This will work with Python 3 or Python2

You should get a failure, but it will probably be quite hard to see exactly what the differences are.

We'll convert the test to use the TDDA libraries referencetest and see how that helps.

Step 4. Change the code to use referencetest.

First we need our test to use ReferenceTestCase from tdda.referencetest instead of unittest.TestCase. ReferenceTestCase is a subclass of unittest.TestCase.

  • Change the import statement to from tdda.referencetest import ReferenceTestCase
  • Replace unittest.TestCase with ReferenceTestCase in the class declaration
  • Replace unittest.main() with ReferenceTestCase.main()

The result is:

from tdda.referencetest import ReferenceTestCase

from generators import generate_string

class TestFileGeneration(ReferenceTestCase):
    def testExampleStringGeneration(self):
        actual = generate_string()
        with open('expected.html') as f:
            expected =
        self.assertEqual(actual, expected)

if __name__ == '__main__':

If you run this, it's behaviour should be exactly the same, because we haven't used any of the extra features of tdda.referencetest yet.

Step 5. Change the assertion to use assertStringCorrect

TDDA's ReferenceTestCase provides the assertStringCorrect method, which expects as its first positional arguments an actual string and the path to a file containing the expected result. So:

  • Change assertEqual to assertStringCorrect
  • Change expected to expected.html as the second argument to the assertion
  • Delete the two lines reading the file and assigning to expected as we no longer need that.
    def testExampleStringGeneration(self):
        actual = generate_string()
        self.assertStringCorrect(actual, 'expected.html')

Step 6. Run the modified test

$ python

You should see very different output, that includes, near the end, something like this:

Expected file expected.html
Compare raw with:
    diff /var/folders/zv/3xvhmvpj0216687_pk__2f5h0000gn/T/actual-raw-expected.html expected.html

Compare post-processed with:
    diff /var/folders/zv/3xvhmvpj0216687_pk__2f5h0000gn/T/actual-expected.html /var/folders/zv/3xvhmvpj0216687_pk__2f5h0000gn/T/expected-expected.html

Because the test failed, the TDDA library has written a copy of the actual ouput to file to make it easy for us to examine it and to use diff commands to see how it actually differs from what we expected. (In fact, it's written out two copies, a "raw" and a "post-precocessed" one, but we haven't used any processing, so they will be the same in our case. So we ignore the second diff command suggested for now.)

It's also given us the precise diff command we need to see the differences between our actual and expected output.

Step 6a. Copy the first diff command and run it. You should see something similar to this:

$ diff /var/folders/zv/3xvhmvpj0216687_pk__2f5h0000gn/T/actual-raw-expected.html expected.html
<     Copyright (c) Stochastic Solutions, 2016
<     Version 1.0.0
>     Copyright (c) Stochastic Solutions Limited, 2016
>     Version 0.0.0
< </html>
\ No newline at end of file
> </html>

(If you have a visual diff tool, can also use that. For example, on a Mac, if you have Xcode installed, you should have the opendiff command available.)

The diff makes it clear that there are three differences:

  • The copyright notice has changed slightly
  • The version number has changed
  • The string doesn't have a newline at the end, whereas the file does.

The Copyright and version numbers lines are both in comments in the HTML, so don't affect the rendering at all. You might want to confirm that if you look at the actual file it saved (/var/folders/zv/3xvhmvpj0216687_pk__2f5h0000gn/T/actual-raw-expected.html, the first file in the diff command), you should see that it looks identical.

In this case, therefore, we might now feel that we should simply update expected.html with what generate_string() is now producing. It would be (by design) extremely easy to change the diff in the command it gave is to cp to achieve that.

However, there's better thing we can do in this case.

Step 7. Specify exclusions

Standing back, it seems obvious likely that periodically the version number and Copyright line written to comments in the HTML will change. If the only difference between out expected output and what we actually generate are those, we'd probably prefer the test didn't fail.

The assertStringCorrect method from referencetest gives us several mechanisms for specifying changes that can be ignored when checking whether a string is correct. The simplest one, which will be enough for our example, is just to specify strings which, if they occur on a line in the output, case differences in those lines to be ignored, so that the assertion doesn't fail.

Step 7a. Add the ignore_substrings parameter to assertStringCorrect as follows:

        self.assertStringCorrect(actual, 'expected.html',
                                 ignore_substrings=['Copyright', 'Version'])

Step 7b. Run the test again. It should now pass:

$ python3
Ran 1 test in 0.002s


Recap: What we have seen

We've seen

  1. Converting unittest-based tests to use ReferenceTestCase is straightfoward.

  2. When we do that, we gain access to powerful new assert methods such as assertStringCorrect. Among the immediate benefits:

    • When there is failure, this method saves the failing output to a temporary file
    • It tells you the exact diff command you need to see be able to see differences
    • This also makes it very easy to copy the new "known good" answer into place if you've verified that the new answer is now correct. (In fact, the library also has a more powerful way to do this, as we'll see in a later exercise).
  3. The assertStringCorrect method also has a number of mechanisms for allowing specific expected differences to occur without causing the test to fail. The simplest of these mechanisms is the ignore_substrings keyword argument we used here.

Screencasts and Exercises

Posted on Fri 25 October 2019 in TDDA • Tagged with tests, screencast, video, exercises

We've started producing a series of exercises for various aspects of TDDA, available on the blog, with follow-along screencasts.

There will be a series of posts about these, starting on Monday (28th October). There's a YouTube channel as well, if you want to subscribe.

The goal has been for each exercise to be as short and simple as it can reasonably be while still covering useful aspects.

The first set of exercises will cover the reference testing capabilities of TDDA, and at least some of them will be available both as unittest-favoured versions and pytest variants. If you don't currently use either, you probably want to follow the unittest variants, since unittest is part of Python's standard library.

There's a page for the exercises at:

which we'll try to keep up-to-date as we add more.

Please note: if you want to do the exercises, you'll need the latest TDDA release, and as we add more (unfortunately) you'll probably need to update each time we add new exercises, with something like

pip install -U tdda


python3 -m pip install -U tdda

depending on your setup. See the installation instructions for details.


Posted on Thu 24 October 2019 in TDDA • Tagged with tdda, python, installation

This post is a standing post that we plan to try to keep up to date, describing options for obtaining the open-source Python TDDA library that we maintain.

Using pip from PyPI

If you don't need source, and have Python installed, the easiest way to get the TDDA library is from the Python package index PyPI using the pip utility.

Assuming you have a working pip setup, you should be able to install the tdda library by typing:

pip install tdda

or, if your permissions don't allow use in this mode

sudo pip install tdda

If pip isn't working, or is associated with a different Python from the one you are using, try:

python -m pip install tdda


sudo python -m pip install tdda

The tdda library supports both Python 3 (tested with 3.6 and 3.7) and Python 2 (tested with 2.7). (We'll start testing against 3.8 real soon!)


If you have a version of the tdda library installed and want to upgrade it with pip, add -U to one of the command above, i.e. use whichever of the following you need for your setup:

pip install -U tdda
sudo pip install -U tdda
python -m pip install -U tdda
sudo python -m pip install -U tdda

Installing from Source

The source for the tdda library is available from Github and can be cloned with

git clone


git clone

When installing from source, if you want the command line tdda utility to be available, you need to run

python install

from the top-level tdda directory after downloading it.


The main documentation for the tdda library is available on Read the Docs.

You can also build it youself if you have downloaded the source from Github. In order to do this, you will need an installation of Sphinx. The HTML documentation is built, starting from the top-level tdda directory by running:

cd doc
make html

Running TDDA's tests

Once you have installed TDDA (whether using pip or from source), you can run its tests by typing

tdda test

If you have all the dependencies, including optional dependencies, installed, you should get a line of dots and the message OK at the end, something like this:

$ tdda test
Ran 122 tests in 3.251s


If you don't have some of the optional dependencies installed, some of the dots will be replaced by the letter 's'. For example:

$ tdda test
Ran 120 tests in 3.221s

OK (skipped=2)

This does not indicate a problem, and simply means there will be some of the functionality unavailable (e.g. usually one or more database types).

Using the TDDA examples

The tdda library includes three sets of examples, covering reference testing, automatic constraint discovery and verification, and Rexpy (discovery of regular expressions from examples, outside the context of constraints).

The tdda command line can be used to copy the relevant files into place. To get the examples, first change to a directory where you would like them to be placed, and then use the command:

tdda examples

This should produce the following output:

Copied example files for tdda.referencetest to ./referencetest-examples
Copied example files for tdda.constraints to ./constraints-examples
Copied example files for tdda.rexpy to ./rexpy-examples

Quick Reference Guides

There is a quick reference guides available for the TDDA library. These are often a little behind the current release, but are usually still quite helpful.

These are available from here.

Online Tutorials

Various videos of tutorials, and accompanying slides, are available online. Exercises with screencasts are under development, and we hope to begin to release these shortly.

Rexpy for Generating Regular Expressions: Postcodes

Posted on Wed 20 February 2019 in TDDA • Tagged with regular expressions, rexpy, tdda

Rexpy is a powerful tool we created that generates regular expressions from examples. It's available online at and forms part of our open-source TDDA library.

Miró users can use the built-in rex command.

This post illustrates using Rexpy to find regular expressions for UK postcodes.

A regular expression for Postcodes

If someone asked you what a UK postcode looks like, and you don't live in London, you'd probably say something like:

A couple of letters, then a number then a space, then a number then a couple of letters.

About the simplest way to get Rexpy to generate a regular expression is to give it at least two examples. You can do this online at or using the open-source TDDA library.

If you give it EH1 3LH and BB2 5NR, Rexpy generates [A-Z]{2}\d \d[A-Z]{2}, as illustrated here, using the online version of rexpy:

Rexpy online, with EH1 3LH and BB2 5NR as inputs, produces [A-Z]{2}\d \d[A-Z]{2}

This is the regular-expression equivalent of what we said:

  • [A-Z]{2} means exactly two ({2}) characters from the range [A-Z], i.e. two capital letters
  • \d means a digit (which is the same as [0-9]—two characters from the range 0 to 9)
  • the gap () is a space character
  • \d is another digit
  • [A-Z]{2} is two more letters.

This doesn't cover all postcodes, but it's a good start.

Other cases

Any easy way to try out the regular expression we generated is to use the grep command1. This is built into all Unix and Linux systems, and is available on Windows if you install a Linux distribution under WSL.

If we try matching a few postcodes using this regular expression, we'll see that many—but not all—postcodes match the pattern.

  • On Linux, the particular variant of grep we need is grep -P, to tell it we're using Perl-style regular expressions.
  • On Unix (e.g. Macintosh), we need to use grep -E (or egrep) to tell it we're using "extended" regular expressions

If we write a few postcodes to a file:

$ cat > postcodes
G1 9PU
RG22 4EX
OL14 8DQ

we can then use grep to find the lines that match:

$ grep -E '[A-Z]{2}\d \d[A-Z]{2}' postcodes

(Use -P instead of -E on Linux.)

More relevantly, for present purposes, we can also add the -v flag, to ask the match to be "inVerted", i.e. to show lines that fail to match:

$ grep -v -E '[A-Z]{2}\d \d[A-Z]{2}' postcodes
G1 9PU
RG22 4EX
OL14 8DQ
  • The first of these, a Glasgow postcode, fails because it only has a single letter at the start.

  • The second and fourth fail because they have two digits after the letters.

  • The third fails because it's a London postcode with an extra letter, A after the EC1.

Let's add an example of each in turn:

If we first add the Glasgow postcode, Rexpy generates ^[A-Z]{1,2}\d \d[A-Z]{2}$.

Rexpy online, adding G1 9PU, produces [A-Z]{1,2}\d \d[A-Z]{2}

Here [A-Z]{1,2} in brackets means 1–2 capital letters, and we've checked the anchor checkbox, to get it to add in ^ at the start and $ at the end of the regular expression.2 If we use this with our grep command, we get:

$ grep -v -E '^[A-Z]{1,2}\d \d[A-Z]{2}$' postcodes
RG22 4EX
OL14 8DQ

If we now add in an example with two digits in the first part of the postcode—say RG22 4EX—rexpy further refines the expression to ^[A-Z]{1,2}\d{1,2} \d[A-Z]{2}$, which is good for all(?) non-London postcodes. If we repeat the grep with this new pattern:

$ grep -v -E '^[A-Z]{1,2}\d{1,2} \d[A-Z]{2}$' postcodes

only the London example now fails.

In a perfect world, just by adding EC1A 1AB, Rexpy would produce our ideal regular expression—something like ^[A-Z]{1,2}\d[A-Z]? \d[A-Z]{2}$. (Here, the ? is the equivalent to {0,1}, meaning that the term before can occur zero times or once, i.e. it is optional.)

Unfortunately, that's not what happens. Instead, Rexpy produces:

^[A-Z0-9]{2,4} \d[A-Z]{2}$

Unfortunately, Rexpy has concluded that the first part is just a jumble of capital letters and numbers and is saying that the first part can be any mixture of 2-4 letters and numbers.

In this case, we'd probably fix up the regular expression by hand, or separately pass in the special Central London postcodes and all the rest. If we feed in a few London postcodes on their own, we get:

^[A-Z]{2}\d[A-Z] \d[A-Z]{2}$

which is also a useful start.

Have fun with Rexpy!

By the way: if you're in easy reach of Edinburgh, we're running a training course on the TDDA library as part of the Fringe of the Edinburgh DataFest, on 20th March. This will include use of Rexpy. You should come!

Training Course on Testing Data and Data Processes

  1. grep stands for global regular expression print, and the e in egrep stands for extended

  2. Sometimes, regular expressions match any line that contains the pattern anywhere in them, rather than requiring the pattern to match the whole line. In such cases, using the anchored form of the regular expression, ^[A-Z]{2}\d \d[A-Z]{2}$, means that matching lines must not contain anything before or after the text that matches the regular expression. (You can think of ^ as matching the start of the string, or line, and $ as matching the end.) 

Tagging PyTest Tests

Posted on Tue 22 May 2018 in TDDA • Tagged with tests, tagging

A recent post described the new ability to run a subset of ReferenceTest tests from the tdda library by tagging tests or test classes with the @tag decorator. Initially, this ability was only available for unittest-based tests. From version 1.0 of the tdda library, now available, we have extended this capability to work with pytest.

This post is very similar to the previous one on tagging unittest-based tests, but adapted for pytest.


  • A decorator called tag can be imported and used to decorate individual tests or whole test classes (by preceding the test function or class with @tag).

  • When pytest is run using the --tagged option, only tagged tests and tests from tagged test classes will be run.

  • There is a second new option, --istagged. When this is used, the software will report which test classes are tagged, or contain tests that are tagged, but will not actually run any tests. This is helpful if you have a lot of test classes, spread across different files, and want to change the set of tagged tests.


The situations where we find this particularly helpful are:

  • Fixing a broken test or working on a new feature or dataset. We often find ourselves with a small subset of tests failing (perhaps, a single test) either because we're adding a new feature, or because something has changed, or because we are working with data that has slightly different characteristics. If the tests of interest run in a few seconds, but the whole test suite takes minutes or hours to run, we can iterate dramatically faster if we have an easy way to run only the subset of tests currently failing.

  • Re-writing test output. The tdda library provides the ability to re-write the expected ("reference") output from tests with the actual result from the code, using the --write-all command-line flag. If it's only a subset of the tests that have failed, there is real benefit in re-writing only their output. This is particularly true if the reference outputs contain some differences each time (version numbers, dates etc.) that are being ignored using the ignore-lines or ignore-patterns options provided by the library. If we regenerate all the test outputs, and then look at which files have changed, we might see differences in many reference files. In contrast, if we only regenerate the tests that need to be updated, we avoid committing unnecessary changes and reduce the likelihood of overlooking changes that may actually be incorrect.


In order to use the reference test functionality with pytest, you have always needed to add some boilerplate code to in the directory from which you are running pytest. To use the tagging capability, you need to add one more function definition, pytest_collection_modifyitems.

The recommended imports in are now:

from tdda.referencetest.pytestconfig import (pytest_addoption,
                                             ref) is also a good place to set the reference file location if you want to do so using set_default_data_location.


We'll illustrate this with a simple example. The code below implements four trivial tests, two in a class and two as plain functions.

Note the import of the tag decorator function near the top, and that test_a and the class TestClassA are decorated with the @tag decorator.


from tdda.referencetest import tag

def test_a(ref):
    assert 'a' == 'a'

def test_b(ref):
    assert 'b' == 'b'

class TestClassA:
    def test_x(self):
        assert 'x' * 2 == 'x' + 'x'

    def test_y(self):
        assert 'y' > 'Y'

If we run this as normal, all four tests run and pass:

$ pytest
============================= test session starts ==============================
platform darwin -- Python 3.5.1, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /Users/njr/tmp/referencetest_examples/pytest, inifile:
plugins: hypothesis-3.4.2
collected 4 items ....

=========================== 4 passed in 0.02 seconds ===========================

But if we add the –tagged flag, only three tests run:

$ pytest --tagged
============================= test session starts ==============================
platform darwin -- Python 3.5.1, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /Users/njr/tmp/referencetest_examples/pytest, inifile:
plugins: hypothesis-3.4.2
collected 4 items ...

=========================== 3 passed in 0.02 seconds ===========================

Adding the –-verbose flag confirms that these three are the tagged test and the tests in the tagged class, as expected:

$ pytest --tagged --verbose
============================= test session starts ==============================
platform darwin -- Python 3.5.1, pytest-3.2.1, py-1.4.34, pluggy-0.4.0 -- /usr/local/Cellar/python/3.5.1/bin/python3.5
cachedir: .cache
rootdir: /Users/njr/tmp/referencetest_examples/pytest, inifile:
plugins: hypothesis-3.4.2
collected 4 items PASSED PASSED PASSED

=========================== 3 passed in 0.01 seconds ===========================

Finally, if we want to find out which classes include tagged tests, we can use the --istagged flag:

pytest --istagged
============================= test session starts ==============================
platform darwin -- Python 3.5.1, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /Users/njr/tmp/referencetest_examples/pytest, inifile:
plugins: hypothesis-3.4.2
collected 4 items


========================= no tests ran in 0.01 seconds =========================

This is particularly helpful when our tests are spread across multiple files, as the filenames are then shown as well as the class names.


Information about installing the library is available in this post.

Other Features

Other features of the ReferenceTest capabilities of the tdda library are described in this post. Its capabilities in the area of constraint discovery and verification are discussed in this post, and this post.