Sharing Tests across Implementations by Externalizing Test Data
Posted on Sun 30 August 2020 in TDDA
I've been dabbling in Swift—Apple's new-ish programming language—recently. One of the things I often do when learning a new language is either to take an existing project in a language I know (usually, Python) and translate it to the new one, or (better) to try a new project, first writing it in Python then translating it. This allows me to separate out debugging the algorithm from debugging my understanding of the new language, and also give me something to test against.
I have a partially finished Python project for analysing chords that I've been starting to translate, and this has led me to begin to experiment with some new extensions to the TDDA library (not yet pushed/published).
It's a bit fragmented and embryonic, but this what I'm thinking about.
Sharing test data between languages
Many tests boil down to "check that passing these inputs to this function1 produces this result". There would be some benefits in sharing the inputs and expected outputs between implementations:
- DRY principle (don't repeat yourself);
- reducing the chances of things getting out of sync;
- more confidence that the two implementations really do the same thing;
- less typing / less code.
** Looping over test cases **
Standard unit-testing dogma tends to focus on the idea of testing small units using many tests, each containing a single assertion, usually as the last statement in the test.2 The benefit of using a single assertion is that when there's a failure it's very clear what it was, and an earlier failure doesn't prevent a later check (assertion) from being carried out: you get all your failures in one go. Less importantly, it also means that the number of tests executed is the same as the number of assertions tested, which might be useful and psychologically satisfying.
On the other hand, it is extremely common to want to test multiple input-output pairs and it is natural and convenient to collect those together and loop over them. I do this all the time, and the reference testing capability in the TDDA library already helps mitigate some downsides of this approach in some situations.
A common way I do this is to loop over a dictionary or a list of tuples
specifying input-output pairs. For example, if I were testing a function
that did string slicing from the left in python (string[:n]
)
I might use something like
cases = {
('Catherine', 4): 'Cath',
('Catherine', -6): 'Cath', # deliberately wrong, for illustration
('', 7): '',
('Miró forever', 4): 'Miró',
('Miró forever', 0): ' ' # also deliberately wrong
}
for (text, n), expected in cases.items():
self.assertEqual(left_string(text, n), expected)
In Python this is fine, because tuples, being hashable, can be used
as dictionary keys, and there's something quite intuitive and satisfying
about the cases being presented as lines of the form
input: expected output
. But I also often just use nested tuples or lists,
partly as a hangover from older versions of Python in which dictionaries
weren't sorted.3 Here's a full example using tuples:
from tdda.referencetest import ReferenceTestCase
def left_string(s, n):
return s[:n]
class TestLeft(ReferenceTestCase):
def testLeft(self):
cases = (
(('Catherine', 4), 'Cath'),
(('Catherine', -6), 'Cath'), # deliberately wrong, for illustration
(('', 7), ''),
(('Miró forever', 4), 'Miró'),
(('Miró forever', 0), ' ') # also deliberately wrong
)
for (text, n), expected in cases:
self.assertEqual(left_string(text, n), expected)
if __name__ == '__main__':
ReferenceTestCase.main()
As noted above, two problems with this are:
- if one test case fails, it's not necessarily easy to figure out which
one it was, especially if expected values (e.g.
'Cath'
) are repeated. - an earlier failure prevents later cases from running.
We can see both of these problems if we run this:
$ python3 looptest.py
F
======================================================================
FAIL: testLeft (__main__.TestLeft)
----------------------------------------------------------------------
Traceback (most recent call last):
File "looptest.py", line 18, in testLeft
self.assertEqual(left_string(text, n), expected)
AssertionError: 'Cat' != 'Cath'
- Cat
+ Cath
? +
----------------------------------------------------------------------
Ran 1 test in 0.000s
FAILED (failures=1)
It's actually the second case that failed, and the fifth case would also fail if it ran (since it should produce an empty string, not a space).
A technique I've long used to address the first problem is to include the test case in the equality assertion, replacing
self.assertEqual(actual, expected)
with
self.assertEqual((case, actual), (case, expected))
like so:
def testLeft(self):
cases = (
(('Catherine', 4), 'Cath'),
(('Catherine', -6), 'Cath'),
(('', 7), ''),
(('Miró forever', 4), 'Miró'),
(('Miró forever', 0), ' ')
)
for case, expected in cases:
(text, n) = case
self.assertEqual((case, left_string(text, n)),
(case, expected))
Now when a case fails, we see what the failure is more easily:
$ python3 looptest2.py
F
======================================================================
FAIL: testLeft (__main__.TestLeft)
----------------------------------------------------------------------
Traceback (most recent call last):
File "looptest2.py", line 20, in testLeft
(case, expected))
AssertionError: Tuples differ: (('Catherine', -6), 'Cat') != (('Catherine', -6), 'Cath')
First differing element 1:
'Cat'
'Cath'
- (('Catherine', -6), 'Cat')
+ (('Catherine', -6), 'Cath')
? +
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (failures=1)
I wouldn't call it beautiful, but it does the job, at least when the inputs and outputs are of a manageable size.
This still leaves the problem that the failure of an earlier case prevents
later cases from running. The TDDA library already addresses this
in the case of file checks, by providing the
assertFilesCorrect
(plural) assertion in addition to the
assertFileCorrect
(singular); we'll come back to it later.
Externalizing Test Data
Returning to the main theme of this post, when there are multiple implementations of software, potentially in different languages, there is some attraction to being able to share the test data—ideally, both the inputs being tested and the expected results.
The project I'm translating is a chord analysis tool focused on
jazz guitar chords, especially moveable ones with no root.
It has various classes, functions and structures concerned with musical
notes, scales, abstract chords, tunings, chord shapes, chord names and
so forth. It includes an easy-to-type text format that
uses #
as the sharp sign and b
as the flat sign, though on output,
these are usually translated to ♯
and ♭
. Below are two simple tests
from the Python code.
For those interested, the first tests a function transpose
that
transposes a note by an number of semitones. There's an optional key
parameter which, when provided, is used to decide whether to express
the result as a sharp or flat note (when appropriate).
def testTranspose(self):
self.assertEqual(transpose('C', 0), 'C')
self.assertEqual(transpose('C', 1), 'C#')
self.assertEqual(transpose('C', 1, key='F'), 'Db')
self.assertEqual(transpose('C#', -1), 'C')
self.assertEqual(transpose('Db', -1), 'C')
self.assertEqual(transpose('C', 2), 'D')
self.assertEqual(transpose('D', -2), 'C')
self.assertEqual(transpose('C', 3, key='A'), 'D#')
self.assertEqual(transpose('C', 3), 'Eb')
self.assertEqual(transpose('C', 3, key='Bb'), 'Eb')
self.assertEqual(transpose('D#', -3), 'C')
self.assertEqual(transpose('Eb', -3), 'C')
self.assertEqual(transpose('C', -1), 'B')
self.assertEqual(transpose('B', 1), 'C')
self.assertEqual(transpose('C', -2), 'Bb')
self.assertEqual(transpose('C', -2, 'E'), 'A#')
self.assertEqual(transpose('Bb', 2), 'C')
self.assertEqual(transpose('A#', 2), 'C')
self.assertEqual(transpose('C', -3), 'A')
self.assertEqual(transpose('A', 3), 'C')
self.assertEqual(transpose('G', 4), 'B')
self.assertEqual(transpose('B', -4), 'G')
self.assertEqual(transpose('F#', 4), 'Bb')
self.assertEqual(transpose('F#', 4, 'E'), 'A#')
self.assertEqual(transpose('Bb', -4), 'F#')
self.assertEqual(transpose('A#', -4), 'F#')
self.assertEqual(transpose('Bb', -4, 'F'), 'Gb')
self.assertEqual(transpose('A#', -4, 'Eb'), 'Gb')
self.assertEqual(transpose('G', 4), 'B')
self.assertEqual(transpose('F#', 4), 'Bb')
self.assertEqual(transpose('F#', 4, 'E'), 'A#')
self.assertEqual(transpose('B', -4), 'G')
self.assertEqual(transpose('Bb', -4), 'F#')
self.assertEqual(transpose('A#', -4), 'F#')
self.assertEqual(transpose('Bb', -4, 'F'), 'Gb')
self.assertEqual(transpose('A#', -4, 'F'), 'Gb')
Clearly, this test does not use looping, but does combine some 36 test cases in a single test (dogma be damned!)
A second test is for a function to_flat_equiv
, which (again, for
those interested) accepts chord names (in various forms) and—where
the chord's key is sharp, as written—converts them to the equivalent
flat form. (Here, o
is one of the ways to indicate a diminished
chord (e.g. Dº) and M
is on of the ways of describing a major chord
(also maj
or Δ
). The function also accepts None
as an input
(returned unmodified) and R
as an abstract chord with no key
specified (also unmodified).4
def test_to_flat_equiv(self):
cases = (a
('C', 'C'),
('C#m', 'Dbm'),
('Db7', 'Db7'),
('C#M7', 'DbM7'),
('Do', 'Do'),
('D#M', 'EbM'),
('E9', 'E9'),
('FmM7', 'FmM7'),
('F#mM7', 'GbmM7'),
('G', 'G'),
('G#11', 'Ab11'),
('Ab11', 'Ab11'),
('Am11', 'Am11'),
('A#+', 'Bb+'),
('Bb+', 'Bb+'),
('A♯+', 'B♭+'),
('B♭+', 'B♭+'),
(None, None),
('R', 'R'),
('R#', 'R#'),
('Rm', 'Rm'),
)
for k, v in cases:
self.assertEqual(to_flat_equiv(k), v)
for letter in 'BEPQaz@':
self.assertRaises(NoteError, to_flat_equiv, letter + '#')
This function uses two loops within the test, one for the good cases and another for eight illegal input cases that raise exceptions. The looping has a clear benefit, but there's no reason to have combined the good and bad test cases in a single test function other than laziness.
In 2020, if we're going to share the test data between implementations,
it hard to look beyond JSON. Here's an extract from a file
scale-tests.json
that encapsulates the inputs and expected outputs
for all the tests above:
{
"transpose": [
[["C", 0], "C"],
[["C", 1], "C#"],
[["C", 1, "F"], "Db"],
[["C#", -1], "C"],
[["Db", -1], "C"],
[["C", 2], "D"],
[["D", -2], "C"],
[["C", 3, "A"], "D#"],
[["C", 3], "Eb"],
[["C", 3, "Bb"], "Eb"],
[["D#", -3], "C"],
[["Eb", -3], "C"],
[["C", -1], "B"],
[["B", 1], "C"],
[["C", -2], "Bb"],
[["C", -2, "E"], "A#"],
[["Bb", 2], "C"],
[["A#", 2], "C"],
[["C", -3], "A"],
[["A", 3], "C"],
[["G", 4], "B"],
[["B", -4], "G"],
[["F#", 4], "Bb"],
[["F#", 4, "E"], "A#"],
[["Bb", -4], "F#"],
[["A#", -4], "F#"],
[["Bb", -4, "F"], "Gb"],
[["A#", -4, "Eb"], "Gb"],
[["G", 4], "B"],
[["F#", 4], "Bb"],
[["F#", 4, "E"], "A#"],
[["B", -4], "G"],
[["Bb", -4], "F#"],
[["A#", -4], "F#"],
[["Bb", -4, "F"], "Gb"],
[["A#", -4, "F"], "Gb"],
],
"flat_equivs": [
["C", "C"],
["C#m", "Dbm"],
["Db7", "Db7"],
["C#M7", "DbM7"],
["Do", "Do"],
["D#M", "EbM"],
["E9", "E9"],
["FmM7", "FmM7"],
["F#mM7", "GbmM7"],
["G", "G"],
["G#11", "Ab11"],
["Ab11", "Ab11"],
["Am11", "Am11"],
["A#+", "Bb+"],
["Bb+", "Bb+"],
["A♯+", "B♭+"],
["B♭+", "B♭+"],
[null, null],
["R", "R"],
["R#", "R#"],
["Rm", "Rm"]
],
"flat_equiv_bads": "BEPQaz@"
}
I have a function that reads this uses json.load
to read this and
other test data, storing the results in an object with a .scale
attribute,
like so:
>>> from moveablechords.utils import ReadJSONTestData
>>> from pprint import pprint
>>> TestData = ReadJSONTestData()
>>> pprint(TestData.scale['transpose'])
[[['C', 0], 'C'],
[['C', 1], 'C#'],
[['C', 1, 'F'], 'Db'],
[['C#', -1], 'C'],
[['Db', -1], 'C'],
[['C', 2], 'D'],
[['D', -2], 'C'],
[['C', 3, 'A'], 'D#'],
[['C', 3], 'Eb'],
[['C', 3, 'Bb'], 'Eb'],
[['D#', -3], 'C'],
[['Eb', -3], 'C'],
[['C', -1], 'B'],
[['B', 1], 'C'],
[['C', -2], 'Bb'],
[['C', -2, 'E'], 'A#'],
[['Bb', 2], 'C'],
[['A#', 2], 'C'],
[['C', -3], 'A'],
[['A', 3], 'C'],
[['G', 4], 'B'],
[['B', -4], 'G'],
[['F#', 4], 'Bb'],
[['F#', 4, 'E'], 'A#'],
[['Bb', -4], 'F#'],
[['A#', -4], 'F#'],
[['Bb', -4, 'F'], 'Gb'],
[['A#', -4, 'Eb'], 'Gb'],
[['G', 4], 'B'],
[['F#', 4], 'Bb'],
[['F#', 4, 'E'], 'A#'],
[['B', -4], 'G'],
[['Bb', -4], 'F#'],
[['A#', -4], 'F#'],
[['Bb', -4, 'F'], 'Gb'],
[['A#', -4, 'F'], 'Gb']]
I have also made it so you can get entries using attribute lookup on
the objects, i.e. TestData.scale.transpose
rather than
TestData.scale['transpose']
, just because it looks more elegant
and readable to me.
A straightforward refactoring of the testTranspose
function to use the
JSON-loaded data in TestData.scale
would be
def testTranspose2(self):
for (case, expected) in TestData.scale.transpose:
if len(case) == 2:
(note, offset) = case
self.assertEqual((case, transpose(note, offset)),
(case, expected))
else:
(note, offset, key) = case
self.assertEqual((case, transpose(note, offset, key=key)),
(case, expected))
In case this isn't self-explanatory
- The loop runs over the cases and expected values, so on the first
iteration
case
is["C", 0]
andexpected
is'C'
; - The assignments set the
note
andoffset
variables; if the list is of length three thekey
variable is also set; - As discussed above, rather than just using things like
self.assertEqual(transpose(note, offset), expected)
, we're including thecase
(the tuple of input parameters) on both sides of the assertion so that if there's a failure, we can see which case is failing.
We can simplify this further since the transpose
function has only
one optional (keyword) argument, key
, which can also be provided as a third
positional argument. Assuming we don't specifically need to test the
handling of key
as a keyword argument, we can combine the two branches
as follows:
def testTranspose3(self):
for (case, expected) in TestData.scale.transpose:
self.assertEqual((case, transpose(*case)),
(case, expected))
Here, we're using the *
operator to unpack5 case
into an argument
list for the transpose
function.
Adding TDDA Support
It probably hasn't escaped your attention that this third version of
testTranspose
is rather generic: the same structure would work
for any function f
and list of input-output pairs Pairs
:
def testAnyOldFunction_f(self):
for (case, expected) in Pairs:
self.assertEqual((case, f(*case)), (case, expected))
This makes it fairly easy to add TDDA support. I added prototype support for this that allows us to use an even shorter version of the test:
def testTranspose4(self):
self.checkFunctionByArgs(transpose, TestData.scale.transpose)
This new checkFunctionByArgs
takes a function to test and a list of
input output pairs and runs a slightly fancier version of
testAnyOldFunction
. I'll go into extensions in another post, but the
most important difference is that it will report all failures rather
than stopping at the first one.
We can illustrate this by changing the last first and last cases
in TestData.scale['transpose']
to be incorrect, say:
[[['C', 0], 'Z'],
...
[["A#", -4, "F"], "Zb"]
If we run testTranspose3
using this modified test data,
we get only the first failing case,
and although the test case is listed in the output,
the output isn't particularly easy to
grok.
$ python3 testscale.py
.....F.......
======================================================================
FAIL: testTranspose2 (__main__.TestScale)
----------------------------------------------------------------------
Traceback (most recent call last):
File "testscale.py", line 22, in testTranspose2
(case, expected))
AssertionError: Tuples differ: (['C', 0], 'C') != (['C', 0], 'Z')
First differing element 1:
'C'
'Z'
- (['C', 0], 'C')
? ^
+ (['C', 0], 'Z')
? ^
----------------------------------------------------------------------
Ran 13 tests in 0.002s
FAILED (failures=1)
But if we use the TDDA's prototype checkFunctionByArgs
functionality,
we see both failures and it shows them in a more digestible format:
$ python3 testscale.py
.....
Case transpose('C', 0): failure.
Actual: 'C'
Expected: 'Z'
Case transpose('A#', -4, 'F'): failure.
Actual: 'Gb'
Expected: 'Zb'
F.......
======================================================================
FAIL: testTranspose4 (__main__.TestScale)
----------------------------------------------------------------------
Traceback (most recent call last):
File "testscale.py", line 15, in testTranspose4
self.checkFunctionByArgs(transpose, TestData.scale.transpose)
File "/Users/njr/python/tdda/tdda/referencetest/referencetest.py", line 899, in checkFunctionByArgs
self._check_failures(failures, msgs)
File "/Users/njr/python/tdda/tdda/referencetest/referencetest.py", line 919, in _check_failures
self.assert_fn(failures == 0, msgs.message())
AssertionError: False is not true :
Case transpose('C', 0): failure.
Actual: 'C'
Expected: 'Z'
Case transpose('A#', -4, 'F'): failure.
Actual: 'Gb'
Expected: 'Zb'
----------------------------------------------------------------------
Ran 13 tests in 0.001s
FAILED (failures=1)
The failures currently get shown twice, once during execution of the tests and again at the end in the summary, and the test just counts this as a single failure, though these are both things that could be changed.
There are variant forms of the prototype checking function above to handle keyword arguments only and mixed positional and keyword argmuents. There's also a version specifically for single-argument functions, where it's natural not to write the arguments as a tuple, but a simple value.
Is this a Good Idea?
I think the potential benefits of sharing data between different implementations of the same project are pretty clear. I haven't actually modified the Swift implementation to use the JSON, but I'm sure doing so will be easy and a clear win. I hope the example above also illustrates that good support from testing frameworks can significantly mitigate the downsides of looping over test cases within a single test function. But there are other potential downsides.
The most obvious problem, to me, is that the separation of the test
data from the test it makes it harder to see what's being tested (and
perhaps means you have to trust the framework more, though that is
quite easy to check). Arguably, this is even more true when the test
is reduced to the one-line form in testTranpose4
, rather than longer
form in testTranspose2
, where the function arguments are unpacked
and named, so that you can see a bit more of what is actually being
passed into the function.
There's a broader point about the utility of tests as a form of documentation. A web search for externalizing test data uncovered this post from Arvind Patil in 2005 in which he proposes something like scheme here for Java (with XML taking the place of JSON, in 2005, of course). Three replies to the post are quite hostile, including the first for Irakli Nadareishvili, who says:
sorry, but this is a quite dangerous anti-pattern. Unit-tests are not simply for testing a piece of code. They carry several, additional, very important roles. One of them is - documentation.
In a well-tested code, unit-tests are the first examples of API usage (API that they test). A TDD-experienced developer can learn a lot about the API, looking at its unit-tests. For the readability and clarity of what unit-test tests, it is very important that test data is in the code and the reader does not have to consistently hop from a configuration file to the test code.
Also, usually boundary conditions for a code (which is what test data commonly is) almost never change, so there is more harm in this "pattern" than gain, indeed.
This is definitely a reasonable concern. Even if code has good documentation,
it is all too common for it to become out of date, whereas (passing) tests,
almost by definition, tend to stay up-to-date with API changes.
We could mitigate this issue quite a lot by hooking into verbose mode
(-v
or --verbose
) and having it show each call as well as the test
function being run, which seems like a good idea anyway. At the moment, if
you run the scale
tests with -v
on my chord project like this you get
output like this:
$ python3 testscale.py -v
testAsSmallestIntervals (__main__.TestScale) ... ok
testDeMinorMajors (__main__.TestScale) ... ok
testFretForNoteOnString (__main__.TestScale) ... ok
testNotePairIntervals (__main__.TestScale) ... ok
testRelMajor (__main__.TestScale) ... ok
testTranspose (__main__.TestScale) ... ok
test_are_not_same (__main__.TestScale) ... ok
test_are_same (__main__.TestScale) ... ok
test_are_same_invalids (__main__.TestScale) ... ok
test_flat_equiv (__main__.TestScale) ... ok
test_flat_equiv_bads (__main__.TestScale) ... ok
test_preferred_equiv (__main__.TestScale) ... ok
test_preferred_equiv_bads (__main__.TestScale) ... ok
----------------------------------------------------------------------
Ran 13 tests in 0.002s
OK
but we could (probably) extend this to something more like:6
$ python3 testscale.py -v
testAsSmallestIntervals (__main__.TestScale) ... ok
testDeMinorMajors (__main__.TestScale) ... ok
testFretForNoteOnString (__main__.TestScale) ... ok
testNotePairIntervals (__main__.TestScale) ... ok
testRelMajor (__main__.TestScale) ... ok
testTranspose (__main__.TestScale) ...
transpose('C', 0): OK
transpose('C', 1): OK
transpose('C', 1, 'F'): OK
transpose('C#', -1): OK
transpose('Db', -1): OK
transpose('C', 2): OK
transpose('D', -2): OK
transpose('C', 3, 'A'): OK
transpose('C', 3): OK
transpose('C', 3, 'Bb'): OK
transpose('D#', -3): OK
transpose('Eb', -3): OK
transpose('C', -1): OK
transpose('B', 1): OK
transpose('C', -2): OK
transpose('C', -2, 'E'): OK
transpose('Bb', 2): OK
transpose('A#', 2): OK
transpose('C', -3): OK
transpose('A', 3): OK
transpose('G', 4): OK
transpose('B', -4): OK
transpose('F#', 4): OK
transpose('F#', 4, 'E'): OK
transpose('Bb', -4): OK
transpose('A#', -4): OK
transpose('Bb', -4, 'F'): OK
transpose('A#', -4, 'Eb'): OK
transpose('G', 4): OK
transpose('F#', 4): OK
transpose('F#', 4, 'E'): OK
transpose('B', -4): OK
transpose('Bb', -4): OK
transpose('A#', -4): OK
transpose('Bb', -4, 'F'): OK
transpose('A#', -4, 'F'): OK
... testTranspose (__main__.TestScale): 36 tests: ... ok
test_are_not_same (__main__.TestScale) ... ok
test_are_same (__main__.TestScale) ... ok
test_are_same_invalids (__main__.TestScale) ... ok
test_flat_equiv (__main__.TestScale) ... ok
test_flat_equiv_bads (__main__.TestScale) ... ok
test_preferred_equiv (__main__.TestScale) ... ok
test_preferred_equiv_bads (__main__.TestScale) ... ok
----------------------------------------------------------------------
Ran 49 test cases across 13 tests in 0.002s
OK
I also found this post from Jeremy Wadhams in 2015 on the subject of Sharing unit tests between several language implementations of one spec. It discusses JsonLogic:
JsonLogic is a data format (built on top of JSON) for storing and sharing rules between front-end and back-end code. It's essential that the same rule returns the same result whether executed by the JavaScript client or the PHP client.
Currently the JavaScript client has tests in QUnit, and the PHP client has tests in PHPunit. The vast majority of tests are "given these inputs (rule and data), assert the output equals the expected result."
Jeremy also suggests something very like the scheme above, again using JSON.
Conclusion
I think this has been quite a promising experiment.
It reduced the length of testscale.py
from 223 lines to 75, which wasn't
an aim (and carries the potential issues noted above), but which does
make the scope and structure of the tests easier to understand.
It also achieved the primary goal of allowing test data to be shared
between implementations, which seems like a valuable prize.
Eventually, the project might gain a command line in both implementations,
and and that will potentially enable my favourite mode of testing—pairs
of input command lines and expected output. But this is a useful start.
Meanwhile, I will probably refine (and document and test!) the prototype implementations a bit more and then release it.
If you have thoughts, do get in touch.
-
or, more generally, this callable. ↩
-
other than, perhaps, and manual teardown in a
try...finally
block. ↩ -
From Python 3.8 on, all Python dictionaries are ordered. This is also the case in CPython implementations from 3.6 onwards. ↩
-
The function does not accept
B#
orE#
, even though musically these can be used as alternatives toC
andF
respectively. That is outside the scope of this function. ↩ -
this operation is sometimes called splatting, and sometimes unsplatting or desplatting. ↩
-
Would I seem like a very old fuddy-duddy if I ask "who writes 'ok' in lower case anyway?" ↩