Flag #22 - Writing Tests are Hard

Iskandar Setiadi

Flag #22 - Writing Tests are Hard

Hope everyone are doing well & staying home during this COVID period! Right now, it's a 5-day consecutive holiday period in Japan (Golden Week) and since I have a bit of leeway time, let's have a discussion about a topic that scares most developers: writing tests.

Let's be honest, most developers are too lazy in writing tests for the code that they write on daily basis, including me. Nevertheless, a good piece of code should have some written tests in it. The real problem here is that writing tests are actually hard. Really? Yes, writing a test by itself is kinda easy, but writing a good test is difficult. This topic is not limited to unit test, but also all kinds of other tests: functional test, integration test, and so on.

There is no tests at all

In the early stage, a code base usually doesn't have any written tests at all. Is this really a bad thing? In my perspective, this is not a good thing, but it can be definitely worse. Have you ever read a documentation or a code comment that behaves completely different from the actual code? A test is actually a self-documenting piece of code. If the test doesn't reflect the actual code behavior, then it will definitely create more confusion than testing actual purpose. Writing tests are not as simple as writing the actual code as we need to actually understand the actual behavior of our own code before we can write a good test for it.

Some simple codes can survive without having a single test, e.g.: statistics reporter that posts a result back to internal Slack every week. When it breaks, we can clearly see that it has stopped working and it won't have any or little adverse effect as it's only used as internal statistics reporting tool. However, most code bases need proper tests and that's why we're discussing this topic here.

The code doesn't have 100% coverage

Most code bases out there will fall in this category, the code doesn't have 100% coverage. If you're not building a critical system, this category is quite acceptable as long as some precautions are taken.

If some parts of the code don't have any test on it, there are several questions that we need to ask ourselves:
- Do we really need these lines of code? This is commonly occur on error / exception handling, where we often overthink stuff and those lines are never called
- If we believe we still need those lines of code, why don't we have a test for that? One of the top answers is third-party / external dependencies. Depending on the importance and susceptibility of our external dependencies to breaking changes, we can decide on the next step. The ideal situation is to simulate our external dependencies locally, e.g.: running database or nginx instance with Docker in automated test. If we cannot run it locally, then we can mock it with some caveats (we'll discuss it later below).

How do we know that we have enough tests? If your system hasn't stopped working during the past year because of missing tests, most likely you don't need to have 100% coverage as maintaining tests also increase your technical debt. To add more context into that, we'll proceed to the next category.

There are too many tests

Software development will never have enough resources of manpower and time. Maintainability is costly and adding too many tests will also increase technical debt in the code base. There are several ways to improve this situation:
- Ensure that a specific test is still needed. At times, when we add a specific function, the test becomes obsolete as it will never happen in the real world. Imagine you have a specific test for an exception in a web server. In the past, the function will return 500: Internal Server Error and you write a test for it. But nowadays, you have added a default fallback to the function and it will never trigger 500: Internal Server Error any longer. Unfortunately, you're mocking a specific part in your test and it will still trigger the Exception part during test.
- Ensure that test is not redundant. As code base is managed by multiple persons, this tends to happen time to time. For example, another person writes a test which is part of a larger test. This is especially important for tests which need a long duration to execute (e.g.: Selenium for web UI tests, as a single test can take 15-30 seconds to run).
- If you still feel that build time has become very slow, it's time to identify the most important components of your system. This is arguable as some people strives for 100% coverage, but personally I'm also valuing business cost opportunity as Pareto principle (80-20 rule) implicitly states that the required effort will increase tremendously to go from 80% to 100%. For example, we can start by identifying several main features from the business requirements. From those main features, we should have at least a successful test and several failure handling tests which include most of the common errors. As the final note, while writing tests are definitely important, it should not hinder the actual purpose of your code.

The code has ~100% code coverage and there's no excessive tests

The real question comes here: are you sure that you're testing the correct stuff? I have seen several code bases that have high percentage of testing coverage but breaks occasionally when changes are made in production. A badly written test will provide fake assurance that your code is bug free.

For the simplest example, imagine that you have the following test (taken from https://docs.pytest.org/en/latest/monkeypatch.html):

# contents of test_app.py, a simple test for our API retrieval
# import requests for the purposes of monkeypatching
import requests

# our app.py that includes the get_json() function
# this is the previous code block example
import app

# custom class to be the mock return value
# will override the requests.Response returned from requests.get
class MockResponse:
    # mock json() method always returns a specific testing dictionary
    @staticmethod
    def json():
        return {"mock_key": "mock_response"}

def test_get_json(monkeypatch):
    # Any arguments may be passed and mock_get() will always return our
    # mocked object, which only has the .json() method.
    def mock_get(*args, **kwargs):
        return MockResponse()

    # apply the monkeypatch for requests.get to mock_get
    monkeypatch.setattr(requests, "get", mock_get)

    # app.get_json, which contains requests.get, uses the monkeypatch
    result = app.get_json("https://fakeurl")
    assert result["mock_key"] == "mock_response"

Personally, I'm not a fan of mocking as it's often overfitting the real world situation. For example, the test above depends on an external dependency https://fakeurl. What will happen if mock_key is no longer returned by the external dependency? In this case, our test will still pass and have 100% coverage while it will fail in production. If we still want to use mock here, we need to have tests for failures. We might also need to handle the following cases:
- https://fakeurl returns 5xx code because of server-side failures
- https://fakeurl no longer returns mock_key in the response

The key takeaway here is we need to understand how our code behaves. At times, mocking is abused until the tests never have any falsy scenario on it. If we cannot provide a counter example for our successful test, the test is probably overfitted. In order to avoid accidental overfitted tests, we cannot simply rely on unit tests alone, but we also need to do system level and actual integration tests.

Closing

Thanks for reading till the end! As a caveat, this opinion is purely subjective and different projects might require different testing requirements. If you are interested, there are also several interesting articles that I found while writing this post:
- Unit testing, you’re doing it wrong by Cyrille Dupuydauby
- Careless mocking considered harmful by Philippe Bourgau

See you next time!