Solving puzzles with hypothesis
I nerd-sniped miself into playing some foobar.withgoogle.com and
got stuck. This is a coding game where you get a task, write a function to implement it
and hope it is correct when you issue verify solution
.
My solutions are seldom correct on the first try but this time I was passing 4 out of 5 tests and my intuituion was that I was hitting a timeout. This was actually certain because the solution I coded was a O(2^n), oopsie!
So I set to reimplement it to be O(n) and passed 3 out of 5 tests. To an untrained eye this may seems a step back but it is a step forward. Let’s see the test result matrix and game our way out of this mess.
test | O(2^n) | O(n) |
---|---|---|
1 | pass | pass |
2 | pass | pass |
3 | fail | pass |
4 | pass | fail |
5 | pass | fail |
The tests confirm our intution that test 3 was failing because of a timeout but now we have a regression as the O(n) solution fails test 4 and 5.
As the two solutions fail on different tests we can use O(2^n) as an oracle when testing O(n) and find inputs that will trigger the bug because the results will be different.
Instead of trying to come up with the input myself I decided it was the right time to try property testing. Property testing is a strategy where you come up with a property that must hold for all inputs and then generate inputs until the property does not hold.
The property that we want to prove is that previous_solution
and solution
are functionally
equivalent. This condition is also known as solution
is a refinement of previous_solution
.
The test case we want to use is the following.
Now this test is useless by itself. The property we want to test, that the two implementations
are functionally equivalent, is easy to express but the test_refinement
method takes a
parameter xs
which is not defined anywhere and therefore we cannot really use it as it is.
This is where the hypothesis library comes to save our day.
By adding three lines of code we are now able to do property testing.
The previous diff “upgrades” our test case by interposing the hypothesis library between the
unittest runtime and the test method. Hypothesis will generate possible inputs xs
following
a recipe and feed them to our method repeatedly. In our case the recipe is simple. We generate
a list of integers by composing two strategies that the library exposes.
The decorator @given(xs=st.lists(st.integers()))
is all we need to specify all possible inputs
to our test.
In matter of seconds hypothesis came up with an example list, [0, -1]
, where the result
was not equal and I was able to correct the O(n) solution immediately.
It is also very important to notice that what we did is just scratching the surface. I could have taken a totally different route by having more methods covering different properties, e.g. consing a positive integer to the list does bla, but this time I had an oracle to use and I was sure that I was missing a simple edge case.
It can be very difficult to use hypothesis well and some tasks are not a good fit for property testing. Having an oracle ready can be a huge productivity bonus because the alternative is having to think really hard about the properties you want.