Solving puzzles with hypothesis

I nerd-sniped miself into playing some foobar.withgoogle.com and got stuck. This is a coding game where you get a task, write a function to implement it and hope it is correct when you issue verify solution.

My solutions are seldom correct on the first try but this time I was passing 4 out of 5 tests and my intuituion was that I was hitting a timeout. This was actually certain because the solution I coded was a O(2^n), oopsie!

So I set to reimplement it to be O(n) and passed 3 out of 5 tests. To an untrained eye this may seems a step back but it is a step forward. Let’s see the test result matrix and game our way out of this mess.

test	O(2^n)	O(n)
1	pass	pass
2	pass	pass
3	fail	pass
4	pass	fail
5	pass	fail

The tests confirm our intution that test 3 was failing because of a timeout but now we have a regression as the O(n) solution fails test 4 and 5.

As the two solutions fail on different tests we can use O(2^n) as an oracle when testing O(n) and find inputs that will trigger the bug because the results will be different.

Instead of trying to come up with the input myself I decided it was the right time to try property testing. Property testing is a strategy where you come up with a property that must hold for all inputs and then generate inputs until the property does not hold.

The property that we want to prove is that previous_solution and solution are functionally equivalent. This condition is also known as solution is a refinement of previous_solution.

The test case we want to use is the following.

import unittest

from solution import solution
from previous_solution import solution as previous_solution

class TestSolution(unittest.TestCase):

    def test_refinement(self, xs):
        assert solution(xs) == previous_solution(xs)
if __name__ == '__main__':
    unittest.main()

Now this test is useless by itself. The property we want to test, that the two implementations are functionally equivalent, is easy to express but the test_refinement method takes a parameter xs which is not defined anywhere and therefore we cannot really use it as it is. This is where the hypothesis library comes to save our day. By adding three lines of code we are now able to do property testing.

 import unittest
 
+from hypothesis import given
+import hypothesis.strategies as st
 
 from solution import solution
 from previous_solution import solution as previous_solution

 class TestSolution(unittest.TestCase):
 
+    @given(xs=st.lists(st.integers()))
     def test_refinement(self, xs):
         assert solution(xs) == previous_solution(xs))
 if __name__ == '__main__':
     unittest.main()

The previous diff “upgrades” our test case by interposing the hypothesis library between the unittest runtime and the test method. Hypothesis will generate possible inputs xs following a recipe and feed them to our method repeatedly. In our case the recipe is simple. We generate a list of integers by composing two strategies that the library exposes.

The decorator @given(xs=st.lists(st.integers())) is all we need to specify all possible inputs to our test.

In matter of seconds hypothesis came up with an example list, [0, -1], where the result was not equal and I was able to correct the O(n) solution immediately.

It is also very important to notice that what we did is just scratching the surface. I could have taken a totally different route by having more methods covering different properties, e.g. consing a positive integer to the list does bla, but this time I had an oracle to use and I was sure that I was missing a simple edge case.

It can be very difficult to use hypothesis well and some tasks are not a good fit for property testing. Having an oracle ready can be a huge productivity bonus because the alternative is having to think really hard about the properties you want.