The New ReferenceTest class for TDDA

Posted on Thu 26 January 2017 in TDDA

Since the last post, we have extended the reference test functionality in the Python tdda library. Major changes (as of version 0.2.5, at the time of writing) include:

Introduction of a new ReferenceTest class that has significantly more functionality from the previous (now deprecated) WritableTestCase.
Support for pytest as well as unittest.
Available from PyPI with pip install tdda, as well as from Github.
Support for comparing CSV files.
Support for comparing pandas DataFrames.
Support for preprocessing results before comparison (beyond simply dropping lines) in reference tests.
Greater consistency between parameters and options for all comparison methods
Support for categorizing kinds of reference data and rewriting only nominated categories (with -w or --write)
More (meta) tests of the reference test functionality.

Background: reference tests and WritableTestCase

We previously introduced the idea of a reference test with an example in the post First Test, and then when describing the WritableTestCase library. A reference test is essentially the TDDA equivalent of a software system test (or integration test), and is characterized by:

normally testing a relatively large unit of analysis functionality (up to and including whole analytical processes)
normally generating one or more large or complex outputs that are complex to verify (e.g. datasets, tables, graphs, files etc.)
sometimes featuring unimportant run-to-run differences that mean testing equality of actual and expected output will often fail (e.g. files may contain date stamps, version numbers, or random identifiers)
often being impractical to generate by hand
often needing to be regenerated automatically (after verification!) when formats change, when bugs are fixed or when understanding changes.

The old WritableTestCase class that we made available in the tdda library provided support for writing, running and updating such reference tests in Python by extending the unittest.TestCase class and providing methods for writing reference tests and commands for running (and, where necessary, updating the reference results used by) reference tests.

Deprecating WritableTestCase in Favour of ReferenceTest

Through our use of reference testing in various contexts and projects at Stochastic Solutions, we have ended up producing three different implementations of reference-test libraries, each with different capabilities. We also become aware that an increasing number of Python developers have a marked preference for pytest over unittest, and wanted to support that more naturally. The new ReferenceTest class brings together all the capabilities we have developed, standardizes them and fills in missing combinations while providing idiomatic patterns for using it both in the context of unittest and pytest.

We have no immediate plans to remove WritableTestCase from the tdda library (and, indeed, continue use it extensively ourselves), but encourage people to adopt ReferenceTest instead as we believe it is superior in all respects.

Availability and Installation

You can now install the tdda module with pip:

pip install tdda

and the source remains available under an MIT licence from github with

git clone https://github.com/tdda/tdda.git

The tdda library works under Python 2.7 and Python 3 and includes reference test functionality mentionedabove, constraint discovery and verification, including a Pandas version and automatic discovery of regular expressions from examples (rexpy, also available online at here).

After installation, you can run TDDA's tests as follows:

$ python -m tdda.testtdda
............................................................................
----------------------------------------------------------------------
Ran 76 tests in 0.279s

Getting example code

However you have obtained the tdda module, you can get a copy of its reference test examples by running the command

$ python -m tdda.referencetest.examples

which will place them in a referencetest-examples subdirectory of your current directory. Alternatively, you can specify that you want them in a particular place with

$ python -m tdda.referencetest.examples /path/to/particular/place

There are variations for getting examples for the constraint generation and validation functionality, and for regular expression extraction (rexpy)

$ python -m tdda.constraints.examples [/path/to/particular/place]
$ python -m tdda.rexpy.examples [/path/to/particular/place]

Example use of ReferenceTest from unittest

Here is a cut-down example of how to use the ReferenceTest class with Python's unittest, based on the example in referencetest-examples/unittest. For those who prefer pytest, there is a similar pytest-ready example in referencetest-examples/pytest.

from __future__ import unicode_literals

import os
import sys
import tempfile

from tdda.referencetest import ReferenceTestCase

# ensure we can import the generators module in the directory above,
# wherever that happens to be
FILE_DIR = os.path.abspath(os.path.dirname(__file__))
PARENT_DIR = os.path.dirname(FILE_DIR)
sys.path.append(PARENT_DIR)

from generators import generate_string, generate_file

class TestExample(ReferenceTestCase):
    def testExampleStringGeneration(self):
        actual = generate_string()
        self.assertStringCorrect(actual, 'string_result.html',
                                 ignore_substrings=['Copyright', 'Version'])

    def testExampleFileGeneration(self):
        outdir = tempfile.gettempdir()
        outpath = os.path.join(outdir, 'file_result.html')
        generate_file(outpath)
        self.assertFileCorrect(outpath, 'file_result.html',
                               ignore_patterns=['Copy(lef|righ)t',
                                                'Version'])

TestExample.set_default_data_location(os.path.join(PARENT_DIR, 'reference'))


if __name__ == '__main__':
    ReferenceTestCase.main()

Notes

These tests illustrate comparing a string generated by some code to a reference string (stored in a file), and testing a file generated by code to a reference file.
We need to tell the ReferenceTest class where to find reference files used for comparison. The call to set_default_data_location, straight after defining the class, does this.
The first test generates the string actual and compares it to the contents of the file string_result.html in the data location specified (../reference). The ignore_substrings parameter specifies strings which, when encountered, cause these lines to be omitted from the comparison.
The second test instead writes a file to a temporary directory (but using the same name as the reference file). In this case, rather than ignore_strings we have used an ignore_patterns parameter to specify regular expressions which, when matched, cause lines to be disregarded in comparisons.
There are a number of other parameters that can be added to the various assert... methods to allow other kinds of discrepancies between actual and generated files to be disregarded.

Running the reference tests: Success, Failure and Rewriting

If you just run the code above, or the file in the examples, you should see output like this:

$ python test_using_referencetestcase.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.003s

OK

or, if you use pytest, like this:

$ pytest
============================= test session starts ==============================
platform darwin -- Python 2.7.11, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/njr/tmp/referencetest-examples/pytest, inifile:
plugins: hypothesis-3.4.2
collected 2 items

test_using_referencepytest.py ..

=========================== 2 passed in 0.01 seconds ===========================

If you then edit generators.py in the directory above and make some change to the HTML in the generate_string and generate_file functions preferably non-semantic changes, like adding an extra space before the > in </html>) and then rerun the tests, you should get failures. Changing just the generate_string function:

$ python unittest/test_using_referencetestcase.py
.1 line is different, starting at line 33
Expected file /Users/njr/tmp/referencetest-examples/reference/string_result.html
Note exclusions:
    Copyright
    Version
Compare with "diff /var/folders/w7/lhtph66x7h33t9pns0616qk00000gn/T/actual-string_result.html
/Users/njr/tmp/referencetest-examples/reference/string_result.html".
F
======================================================================
FAIL: testExampleStringGeneration (__main__.TestExample)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittest/test_using_referencetestcase.py", line 62, in testExampleStringGeneration
    ignore_substrings=['Copyright', 'Version'])
  File "/Users/njr/python/tdda/tdda/referencetest/referencetest.py", line 527, in assertStringCorrect
    self.check_failures(failures, msgs)
  File "/Users/njr/python/tdda/tdda/referencetest/referencetest.py", line 709, in check_failures
    self.assert_fn(failures == 0, '\n'.join(msgs))
AssertionError: 1 line is different, starting at line 33
Expected file /Users/njr/tmp/referencetest-examples/reference/string_result.html
Note exclusions:
    Copyright
    Version
Compare with "diff /var/folders/w7/lhtph66x7h33t9pns0616qk00000gn/T/actual-string_result.html
/Users/njr/tmp/referencetest-examples/reference/string_result.html".


    ----------------------------------------------------------------------
    Ran 2 tests in 0.005s

As expected, the string test now fails, and the ReferenceTest library suggests a command you can use to diff the output: because the test failed, it wrote the actual output to a temporary file. (It reports the failure twice, once as it occurs and once at the end. This is deliberate as it's convenient to see it when it happens if the tests take any non-trivial amount of time to run, and convenient to collect together all the failures at the end too.)

Because these are HTML files, I would probably instead open them both (using the open command on Mac OS) and visually inspect them. In these case, the pages look identical, and diff will confirm that the changes are only those we expect:

$ diff /var/folders/w7/lhtph66x7h33t9pns0616qk00000gn/T/actual-string_result.html
/Users/njr/tmp/referencetest-examples/reference/string_result.html
5,6c5,6
<     Copyright (c) Stochastic Solutions, 2016
<     Version 1.0.0
—
>     Copyright (c) Stochastic Solutions Limited, 2016
>     Version 0.0.0
33c33
< </html >
—
> </html>

In this case

We see that the copyright and version lines are different, but we used ignore_strings to avoid say that's OK
It shows us the extra space before the close tag.

If we are happy that the new output is OK and should replace the previous reference test, you can rerun with the -W (or --write-all).

$ python unittest/test_using_referencetestcase.py -W
Written /Users/njr/tmp/referencetest-examples/reference/file_result.html
.Written /Users/njr/tmp/referencetest-examples/reference/string_result.html
.
----------------------------------------------------------------------
Ran 2 tests in 0.003s

OK

Now if you run them again without -W, the tests should all pass.

You can do the same with pytest, except that in this case you need to use

pytest --write-all

because pytest does not allow short flags.

Other kinds of tests

Since this post is already quite long, we won't go through all the other options, parameters and kinds of tests in detail, but will mention a few other points:

In addition to assertStringCorrect and assertFileCorrect there are various other methods available:
- assertFilesCorrect for checking that multiple files are as expected
- assertCSVFileCorrect for checking a single CSV file
- assertCSVFilesCorrect for checking multiple CSV files
- assertDataFramesEqual to check equality of Pandas DataFrames
- assertDataFrameCorrect to check a data frame matches data in a CSV file.
Where appropriate, assert methods accept the various optional parameters, including:
- lstrip — ignore whitespace at start of files/strings (default False)
- rstrip — ignore whitespace at end of files/strings (default False)
- kind — optional category for test; these can be used to allow selective rewriting of test output with the -w/--write flag
- preprocess — function to call on expected and actual data before comparing results

We'll add more detail in future posts.

If you'd like to join the slack group where we discuss TDDA and related topics, DM your email address to @tdda on Twitter and we'll send you an invitation.