Tag Archives: testing

Simple software engineering: Mocks are a last resort

Most tests that rely on automatic mocking frameworks should be rewritten to use either the real dependencies or manually-written fakes.

Wait, let’s back up. Tests have a few moving parts. First, there is some code being tested. This is commonly called the unit. The unit might have dependencies. The dependencies are not under test, but they can help determine whether the unit behaves correctly. Ideally, they would be passed into the unit. But dependencies can be many things: static data, global data, files on the filesystem, etc.

Dependencies interact with tests in a few ways. The unit can introduce side effects on the dependencies and vice versa. Automatic mocking frameworks are designed to aid this process. Mock assertions can validate that expected method calls happened, whether correct parameters were passed, can override return values, and can execute different logic. Mocks have almost absolute power to override the behavior of dependencies (within the confines of what the language allows).

But mocks aren’t the only way to write tests that involve dependencies. Real objects can be used directly. This isn’t always possible: the real object might be nondeterministic. It might provide random numbers, make a call on the network, etc. Nondeterminism is difficult to test, since there’s not necessarily an expected output. Nondeterministic failures decrease confidence in tests, since it’s difficult to know whether a failure is real. Accordingly, nondeterminism should be avoided in tests.

Statue of Leif Erickson in Reykjavik, Iceland.
Leif Erikson discovered automatic mocking frameworks in the year 998

“Test fakes” are an alternative. They are a fake implementation of a real object. For example, a trivial implementation of an interface that the real object implements. Here’s an example from a side project of mine. It allows a clock to be simulated and advanced for testing. Test fakes have a maintenance cost. The tradeoff is that the fake can be reused everywhere.

How should you pick which one to use?

How I select a dependency to use for testing

  1. Use the real object, if possible.
  2. Use a fake implementation, if possible.
  3. Use a mock.

I try to get as close to the production configuration as possible. Why? When a test fails with a real dependency, it’s likely a real problem. The more differences between a test object and a production object, the less likely the failure is real, and more likely that the failure involves the test configuration.

OK, so, where am I going with this? In the next section, I will explain common issues with automatic mocking. Then I will describe the tradeoffs that real objects and test fakes have. I will finish by explaining a few situations where mocking is preferable to the other alternatives.

Automatic mocks are very manual

Consider a unit that uses Redis as a key/value store. Talking to Redis involves I/O. So we mock the return values of Redis anywhere it’s used in tests.

The first mocked test isn’t so bad. It reads one value and writes one value. The second test reads a few values. The third test reads a bunch of objects, but it doesn’t modify the values at all. And so on.

Imagine this Redis class spreading through a codebase. Dozens of usages. After all, everyone loves Redis. Every call must be mocked in the test.

But this requires that every test author behaves like a human key/value store. Why provide return values for all of these tests? It is simpler to put the Redis key/value store behind an interface and use an in-memory implementation. This would save time per test and would make tests easier to write. The fake would save time the way any code does – by automating a task.

The tests become easier to write because it becomes trivial to assert both the effects and the side effects of the test. Did the unit return the correct value? Did the fake end up in the correct final state? Great!

I find that the break-even point for this approach is n=1. As in, implementing the fake often takes roughly the same amount of time as implementing the first mock. And then the fake can be reused, but the mock can’t. There are exceptions to this that are discussed at the end of this post.

Automatic mocks don’t have to behave correctly

Mocks can behave absolutely incorrectly with no consequences. Can one plus one equal three? Sure, why not:

when(mockInstance.addTwoNumbers(1, 1).thenReturn(3));

A human must simulate the return value for every mocked return value. This leads to situations where the bug and the test both have errors that mask each other. In fact, mocks can be written by watching the test fail and seeing what value would have make the assertions pass. Then the engineer simply enters the expected values into the mock. People really do this. I’ve done it. I’ve watched other people do it. These errors get past code review.

Granted, this can happen with both real objects and fake objects in testing. But since real/fake objects are not customizable per-test, the error rate will be lower holistically with these approaches.

Automatic mocks can silently break during a refactoring

This is more insidious. Let’s say there is a widely-used dependency, and one of its methods provides the path of a URL. It needs to be changed to provide the full URL string as part of a project to support multiple domains. And it’s being renamed from providePath to provideFullURL or something.

So you rename the method. You change the behavior. The full URL is returned instead of the path. The tests pass. Hooray 🎉  But that method is called in 50ish places, and each of those call sites have tests that are written using mocks. Furthermore, some of those call sites are within code that is mocked in tests. Are you confident that nothing is wrong?

I’d be confident in the opposite: something broke somewhere. The mocks silently hid the problem because the return value was simulated. Imagine the developers of each of those call sites. If even one had a tight deadline and needed the full URL, they’re gonna prepend the server name they expect. They won’t think twice. It could even take days for these errors to appear – when the next nightly big data job runs, when the next weekly marketing email is sent, etc.

Areal object would have a better chance of exposing these errors in tests. A fake object would be changed from providing a path to providing a URL, which would also allow the error to be caught across the codebase with a single change. The change would need to have the same level of scrutiny and QA testing. But with a reasonably complete test suite, it’s less likely that it would lead to real problems.

Tradeoffs

Using a real object has a philosophical tradeoff. Strictly speaking, the test stops being a unit test. It becomes an integration test of the unit and its dependencies. That’s fine. If a test can be written quicker and increase confidence, then it’s a reasonable tradeoff. If the simplest and most maintainable test is an integration test, then write an integration test. Life’s too short for ideological purity.

There are more tradeoffs. A breakage in a real object can cause dozens of failures through the codebase. This often makes it easier to debug the failure (since there are lots of examples to debug), but it can also obscure the failure. Similarly, a real object with many call sites can cause failures in just one or two tests. This is often difficult to diagnose. Is the test subtly wrong? Is there a subtle bug in the real object? Is there a subtle bug in how they interact?

Fakes add a maintenance cost. They need to be written and maintained along with the real object or interface. Plus, since they simulate the behavior of an object without being the full implementation, they can easily introduce incorrect behavior that is then reused everywhere. There is also an art to writing them that has to be learned.

A few situations where mocks are the best approach

There are definitely situations where mocks should be used. Here are some common “last resort” cases that I’ve discovered over the years.

Faking complex behavior, like SQL

At a certain point, a fake would be so complicated that an in-memory solution is totally infeasible. It’s implausible to expect an in-memory execution of a SQL server that matches all of the syntactic quirks and features of MySQL. In this situation, using the mock dramatically reduces the maintenance required for the test.

Preventing a method from being called multiple times

Sometimes, calling a method twice is REALLY BAD – maybe it causes a deadlock, maybe a buggy device driver would cause a kernel panic, etc. Code review and instrumentation aren’t enough, and it’s desirable to assert that it can never happen. Mocks excel at this type of assertion.

Legacy code is poorly structured and there ain’t time to fix it

Sometimes, you have to parachute into a codebase, make a fix, and then get extracted. Sometimes it’s just not reasonable to spend 3 weeks refactoring to make a 1 day change more testable.

Determining whether a delegate is being invoked

A delegate wraps a second object, and is responsible for calling methods on that second object. An automatic mocking library is an easy solution for ensuring that these calls happen as expected.

Thank you for attending my Jake Talk

Automatic mocking frameworks are a last resort. Mocks have uses. But real objects and fake objects should be preferred, in that order.

Simple software engineering: Inject dependencies when possible

Why should dependencies be injected?

Code should not instantiate or otherwise access its own dependencies. Instead, prefer to pass in dependencies as arguments.

This should be done when code becomes important enough to unit test. Dependency injection makes it easier for tests to provide dependencies with different configurations. It also makes it easier to inspect the side effects introduced onto dependencies. It will also make maintenance easier because it will increase flexibility at little cost.

When working with I/O, pass in interfaces instead of instantiating classes

// Avoid new()ing the dependency
public function getDatabaseConnectionInfo(): ConnectionInfo {
    $redis = new Redis();
    return new ConnectionInfo(
        $redis->get('host'),
        $redis->get('port')
    );
}

// Prefer to pass the dependency in
public function getDatabaseConnectionInfo(
    Redis $redis
): ConnectionInfo {
    return new ConnectionInfo(
        $redis->get('host'),
        $redis->get('port')
    );
}

// Passing in an interface is even better
public function getDatabaseConnectionInfo(
    KeyValueInterface $key_value_store
): ConnectionInfo {
    return new ConnectionInfo(
        $key_value_store->get('host'),
        $key_value_store->get('port')
    );
}

Dependency injection makes testing easier.

The unit test can take a trivial in-memory store instead of either wrangling a Redis instance or mocking a constructor.

Additionally, code that manages its own dependencies can become nested deep within other code. If the first example above became deeply nested, it would be unclear that the corresponding test must manage a key/value store. This could lead to bad surprises like tests attempting to connect to Redis.

Dependency injection increases flexibility at little cost.

In the example above, it may become necessary to introduce a caching layer in front of the key/value store. This is easy with dependency injection. Just make a new class that inherits the interface and delegates to the I/O layer on cache misses. Without dependency injection, it becomes a project to find and fix all usages.

Dependency injection makes it easier to make application-wide changes.

In the above example, it may become necessary to stop using the default Redis database and configure which Redis database should be used. If the Redis class is instantiated in many places, this becomes a sizable effort. Compare this to changing the single invocation that is injected throughout the application. The latter will usually be much easier.

Pass the result of I/O into business and presentation logic

// Avoid performing I/O in business logic
public function isShopTemporarilyClosed(
    ORM $orm, int $shop_id
): bool {
    // All shops manually turned off by the owner
    // are called temporarily closed.
    $shop = $orm->getFinder('Shop')->findById($shop_id);
    return $shop->is_off
        && $shop->owner->id === $shop->disabled_by_user->id;
}

// Prefer passing the result of I/O into business logic
public function isShopTemporarilyClosed(
    Shop $shop
): bool {
    // All shops manually turned off by the owner
    // are called temporarily closed.
    return $shop->is_off 
        && $shop->owner->id === $shop->disabled_by_user->id;
}

Business/presentation logic should not be overly opinionated

Applications often have several choices about where they can read equivalent data. This function shouldn’t care that the data came from the ORM. Why couldn’t it be passed in the POST data of a request? Or be fetched from a REST API? Ideally, logic that acts on a shop model should work anywhere.

Doing I/O in application logic makes its callers difficult to refactor

As a codebase grows, helper functions may acquire dozens or hundreds or thousands of callsites. They may become nested deep within the application call stack. It will be used within business-critical logic that will run into scaling problems. Manually managing dependencies makes it difficult to perform some optimizations. For example, it’s difficult to ensure that the program never makes redundant I/O calls when the object is accessed via I/O dozens of times in a codebase. This is true even with caching! It’s often the case that calling two different I/O entry points (or the same entry point with different arguments) can produce the same results. This can be difficult or impossible to programmatically detect, even though it may be obvious to the application developer.

Passing in the result of I/O makes it easier to share the result of I/O among different callers.

I/O introduces nondeterministic behavior
It would be surprising to see a DatabaseReadException when calculating whether a shop is closed. But introducing I/O into a call increases the risk that code can throw exceptions for nondeterministic reasons like service availability.

I/O also dramatically affects timing metrics. Let’s say that I/O calls are cached, and the shop is fetched in two places: once while deciding which views to render, and once while rendering the view. Later, a programmer realizes that they don’t need to perform the first fetch. They remove it. This will move the I/O call from the application logic into the view logic, causing the view logic’s timing instrumentation to increase. This is because a former cache hit is now a cache miss with I/O fetching. No regression happened, but the application performance graphs make it seem like one did.

This could also cause tests to become flaky if they actually perform the I/O and sometimes fail.

Instead, prefer to centralize or share logic related to I/O. The details of this will depend on which languages and libraries are used.

Don’t access static or global state within business or presentation logic

// Avoid accessing global or static state in business logic
public function isLocaleEnUs(): bool {
    return strtolower($_REQUEST('locale')) === 'en-us';
}

// Pass global or static state into business logic
public function isLocaleEnUs(string $locale): bool {
    return strtolower($locale) === 'en-us';
}

Accessing static and global state is unsafe across machines

A helper that access static or global state makes strong assumptions about what happened on the machine prior to the code executing. Note: this refers to static state – the reliance on information from the execution environment, or calculated data that is stored statically. Accessing static data or static functions isn’t included in this.

If code lives long enough, it will eventually execute in several layers of the same application stack. Think about all the different application architectures that can exist in the same company at the same time. Reverse proxies in front of long-lived application servers, CGI scripts, batch processing jobs, monoliths, microservices, single-page applications, mobile apps, serverless lambdas, server-side rendering, etc. And to add another dimension, there are quite a few transport mechanisms available: HTTP, RPC, IPC, etc.

As code becomes longer-lived, it will eventually live within several layers at the same time. This introduces unnecessary complexity on each of the additional layers. If some logic directly reads the request parameters to determine the locale, that it (and every thing that ever depends on it) will always need to execute within an HTTP request. Or it must fake the HTTP request environment when it is included in a layer without HTTP. Or if it’s proxied within HTTP, the proxied call will also need to forward the request parameters, even if it doesn’t make semantic sense.

How to move an existing codebase towards dependency injection

This can be done incrementally. For each commit that uses a dependency, refactor that dependency to come from one layer higher in the stack. This is a good opportunity to introduce tests for untested code, or simplify tests for existing code.

Over time, frequently-modified code will become fully implemented using dependency injection. It may be necessary to do a special project to modify, replace, or delete code that hasn’t been touched in years. But maybe it’s fine to just leave it. After all, it hasn’t been modified in years.