Category Archives: Uncategorized

2019 year in review

Here is my 2019 year in review! I lost weight, struggled with sleep, grew professionally, struggled with some personal challenges, took some much-needed vacations, and worked hard on side projects.

Health

I lost over 20 pounds in 2019. I started the year at 223.2 lbs and ended at 201.0 lbs. I dropped as low as 189. But I achieved this with a strict diet. I ate mostly fruits, vegetables, cheese, nuts, and lean meats. These are delicious, but I missed the satisfaction of eating a bagel or a slice of pizza every now and then. When I started enjoying food again, I slowly gained 10 pound back.

I’m looking for a sustainable solution. At the moment, I’m trying 16/8 intermittent fasting with calorie counting. I have failed to do each of these individually. Unrestricted intermittent fasting didn’t limit my calories enough. I also feel hungry when I count calories across a full day. But I’m optimistic about the combination. I can eat a huge breakfast and still have enough calorie budget for a healthy and filling dinner. Alternatively, I can stick to salads, nuts, and cheeses in the afternoon so that I can enjoy myself later.

"Memoriae Majorum" at the entrance to the ossuary of the Paris Catacombs
The entrance to the ossuary part of the Paris Catacombs

Sleep was rough. I had a prolonged bout of insomnia that stretched from February into July. I would fall asleep easily at midnight. I would then wake up between 4:30 or 5:30. Insomnia sucks. It affects everything. I produced less at work, I was meaner, I was less happy, my chores fall by the wayside. I saw a sleep therapist and she provided a list of recommendations: wear orange-colored goggles at night to limit blue light exposure. Limit caffeine. Get as much sunlight as possible before 8:30am. Have a consistent sleep schedule. Don’t eat, exercise, or shower within 3 hours of going to sleep.

I applied these techniques and shifted my wake time from 8 to 7, and this helped. I still wake in the middle of the night. But I can go back to sleep now. I can be less militant about sleep hygiene by picking my battles. I limit caffeine, keep a strict schedule, and go for a walk before 8:00 every morning. The other changes didn’t matter as much.

This required sacrifices. My friends prefer hanging out late. For years I had considered moving my sleep schedule earlier. But I can’t “just make it up” on the weekends. The opposite is true – staying out once or twice a week can ruin a whole week of regular sleep for me. So I always tended to keep my schedule later to match that of my friends. But the severity of my insomnia bout made me reconsider this. So I leave our plans early on a weekly basis, which is awkward. I’m not sure how this will play out in the long run.

Career

I had a productive year.

My time was split between three things: API stabilization, API support, and GraphQL.

I spent the year working on the API platform team. We worked with a more senior engineer who had helped write broad swaths of our product stack. He successfully pitched the existence of the API platform team, and spent time ramping us up. This was mostly positive. He used his experience to select high-impact projects that helped me learn. We spent a bunch of time cleaning up the user experience and the code, and we ended up with a stabler, cleaner, and faster API. His feedback was always invaluable, and he also made a major contribution to the internals during this time. On the other hand, the downsides to working under him were unusual. He wasn’t officially on the team, but he wasn’t NOT on the team. This created lots of coordination problems. I did discuss this with my manager, but in retrospect I could have handled the situation better by getting everyone to agree to fill out a RACI matrix (or similar). He left Etsy halfway through the year. By that point we knew enough to stand on our own legs, and the team had a fairly successful year.

Supporting the internal API took lots of my team’s time. Etsy is a PHP shop. PHP is single-threaded. An easy way to parallelize computation is to distribute I/O requests among cURL requests. This system is called the “internal API” at Etsy. This means that hundreds of engineers use the internal API. This means that lots of people need help. This version of the API has lasted 5+ years and has advanced users through the company. However, this means that the problems that we hear about are often complex multi-system problems. It’s amazing how many different ways the same metaproblem emerges: “This code was written on the assumption that it would only run on one machine, and now it’s running on two.”

I spent the year planning and prototyping GraphQL at Etsy. This was more solitary than I would like, but it was driven out of necessity. The surface area of our team is so broad. The whole company runs on APIs, and there were just 3 of us for much of the year. The only way that we could move forward was by divide-and-conquer. A second engineer, Kaley Sullivan, did work on it once I started prototyping. She kicked ass – she is a powerhouse engineer and gave me some great feedback on designs. The prototype that we wrote was a success, and we’re moving forward with introducing it at the beginning of next year. In the next year, my challenges will be around growing and evolving GraphQL at Etsy such that it doesn’t rely on me anymore.

The ferry building on the water in San Francisco.
Jet lag advantage: 6:15 AM run through the touristy part of SF. Peaceful and quiet.

I was promoted to Staff Engineer at the beginning of the year. This was an important milestone for me to hit. I didn’t care about the title, but I wanted to prove to myself that I was growing as an engineer. A major reason I joined Etsy was because I love the mission – lots of sellers from all around the world making money. It’s far more palatable than padding a tech billionaire’s pockets by making $cog a little more efficient. Reaching Staff Engineer meant that I’ve been growing my engineering effectiveness, which means that I’ve been more effective at helping our sellers. But now I have access to opportunities that I didn’t as a senior engineer: I emceed our engineering all-hands, I’m leading a working group, and I am in a regular meeting with directors and the CTO. It almost feels like a virtuous cycle: once I reached a certain threshold, it became easier for me to keep getting high-impact or high-visibility work. In 2020, I’d like to use my position to effect positive change, which I did in 2019 but not enough.

I attended a software engineering conference for the first time: GraphQL Summit in San Francisco. I was pleasantly surprised about how useful it was. Reading the presentations was helpful – I did a deep dive on industry writing about GraphQL, but some of the most valuable lessons came from some of the smaller talks. One fact amazed me: literally half of the attendees didn’t even use it. They sent their engineers to this conference to learn more. It stands to reason: of course companies would spend thousands of dollars speculatively to avoid mistakes that would cost them millions. But it doesn’t occur to me that I could get these opportunities myself, and I don’t have a lot of more-senior mentors in my life that can point me in these directions.

Personal life

My year had some ups and downs.

My dog Rupert running through a field with his mouth open and smiling.
Derp on, you crazy diamond

I had a dog at the beginning of the year, Rupert. I don’t have him anymore. The rescue gave him to me because he was high-energy, but I didn’t fully understand what this meant. He could run for an hour in the morning, and be back to 100% at the end of the day. I also didn’t understand that active dogs have active minds. He’d get bored and destructive when I left him alone for a normal work day. He destroyed thousands of dollars worth of stuff in my apartment. This included some things that were irreplaceable. He loved daycare, but he got depressed and distressed if I took him too many days in a row. I worked with a trainer a bunch, and on the first lesson she warned me, “most single owners who have dogs that are this energetic eventually give them away. I often advise people to do this. It’s okay if you do.” Eventually, I came to the conclusion that she was right and he would be better off with another owner. I returned him to the rescue who found him another loving family. This is very hard to write now. I miss him. But it was unsustainable. One of my close friends was very unsupportive of me during this time, which damaged our friendship. I regret this.

My girlfriend and I went on some trips together: a long weekend in Montreal, a week in Vancouver, and 2 weeks in France. I hadn’t traveled internationally in a few years, and it was fun to go on an adventure together. It was also the first time in a long time that I managed to fully disconnect from work. The stories that we got together were wonderful: the meals we had, the places we stayed, the time that we were defeated by the rocky shores of Nice, the time I doused myself in diesel gasoline in Avignon, the delicious fruit varieties you’ve never seen before. It’s also nice to share that with someone.

My girlfriend and I suited up to go whale watching in Vancouver, BC
Suited up for whale watching in Vancouver, BC

I started trying to “learn about business” in the beginning of 2019. I was wondering if I could make and sell bar trivia. But a landing page that I set up with a Squarespace domain didn’t convert into any purchases, even though I got a bunch of clicks. Around this time, I was in the middle of having major problems with my dog, and I gave up working on business stuff to take him to the dog park every morning for an hour. After this, I never got back to it.

I tried to learn French. I got a fair bit of vocabulary, but I made very little progress with listening to it, despite taking French classes, practicing for over an hour every day, and listening to 100+ hours of French learning podcasts. My girlfriend is a French translator who speaks it fluently. I thought this fact would help me. This was part of the reason I chose French. But I was so remedial that we never found a middle ground that wasn’t frustrating for me. Actually going to France was fun – I had some basic transactional conversations with shopkeepers. I could also read menus and order and comprehend most of the posted advertisements around the country. But it helped underscore that I would likely never use French in a serious way – if I had picked Spanish, I’d at least have the benefit of seeing it around New York. And I don’t find French culture or history particularly interesting. It was a long and frustrating process for something that would never have an upside.

Eventually, I stopped spending my mornings studying French and started studying machine learning. Machine learning has interested me a few times over the years, most specifically around the time of the Netflix Prize. Most recently, I’ve spent 1.5 hours every morning learning about machine learning. This has been fun so far. So far, I’ve written a backpropagation algorithm in Numpy, and I’ve been working at the Titanic Kaggle competition. My first submission was yesterday, and it performed worse than the naive approach of estimating whether all the women survived and all the men died, haha. But there is a whole forum filled with people who have helpful suggestions of how to perform feature selection, and there’s a whole Internet of helpful materials. I’d like to learn more about real-world data processing, different neural network architectures, and autoencoders.

I’ve been writing more in 2019. This is perpetually a goal of mine, so I’m glad that I’m finally making room for it. I started a “Simple software engineering” series where I examine the tradeoffs that I make when I write software. I also found it helpful to keep a “knowledge base” in WordPress, where I record things that I learn as I learn them. There’s only a loose organization so far, but I’ve only been keeping the knowledge base for about 3 months.

Conclusion

I had a good 2019 despite some personal challenges. My year went really well professionally, and I took some much-needed breaks that I hadn’t really granted myself. I’d like to focus a little more on friendships in 2020, but it’s not clear if that means old friendships or new friendships.

Ramping Up – Week 1 of learning about business

I’ve always wanted to run my own business. Now that I’m 33, I’ve realized that I need to start actually trying if I want to achieve this goal.

What if this were a TV show? I would quit my job to create a startup. I would also be 22 and a lot hotter. Maybe I dropped out of college. The details aren’t important. “TV Jake” would drive himself to the edge of ruin. At the last second, everything would turn around and my startup would be the next big thing. But “Actual Jake” has a mortgage. At this point, “Going big or going home” isn’t as fun since it means “Go big or lose your home.”

The good news is that it doesn’t have to be that way. I’ve been introduced to online communities that view business differently. Podcasts like “Startups for the Rest of Us” and “Under the Radar” are run by independent developers that run small companies. They build smaller products at a more sustainable pace. The term “lifestyle business” also gets kicked around for these, since you’re exchanging some of the salary and comfort of a big company for the lifestyle you want.

I like the idea of starting out building small projects. This is similar to I learned how to program. I’d start tons of ideas. I made mistakes. I failed repeatedly. I’d try anything that sounded interesting. I wrote command line games, programs that solved my math homework, modifying WinAmp plugins to see what would happen. Eventually it started to stick.

I like the idea of failing on small ideas and building up. It maximizes what I can learn with a limited time budget. This approach has been informed by a lot of third parties. For instance, Rob Walling calls this the “stairstep approach“. David Smith of “Under the Radar” often talks about how he has a portfolio of products rather than going big on one.

So I’d like to learn business, and I feel like my first goal is very achievable:

Make $100 of profit, not counting the value of my time, on a business idea by the end of March, 2019

My first business goal

I’d like to do a few things to try to achieve this goal.

  • Incentivize myself. Ultimately I’d like to pay off my mortgage. BAM! Incentivized.
  • Hold myself accountable. I’m going to write a blog post once a week about what I have been doing in order to achieve my goal. I’ve heard this go both ways: “telling people about your goals feels like an accomplishment, which makes you less likely to actually accomplish them” versus “telling people about your goals adds a social pressure to actually complete them.” I’m choosing the method that involves filling out this domain with more content.
  • Work at least 5 hours a week on it. I believe that I will be working more than this on average. But setting a floor will mean that I will continue to make forward progress while giving myself the option to take some time off if I start to burn out.

So, let’s get started!

What did I do this week?

This week was all about ramping up! I split my time between doing introductory reading from people who run small businesses. I also started gathering data to look for the first opportunity that I want to do.

Side note: my first goal, “$100 of profit without factoring in the value of my time,” is low enough that it enables a lot of options. If the weather warms up, “selling umbrellas in Manhattan when it rains” could even be a way to do that. But I’d like to practice working on businesses with scalable economics. I’m not going to look for these kinds of opportunities unless I start to run up against my March 31 deadline.

I partitioned my research so that I wasn’t reading too much up-front. There’s no reason for me to read an article about improving my conversion rate if I don’t have conversions. So I divided up TODOist into a few really coarse categories, “Research”, “Setup”, “Validate”, and “Build”, and divided interesting articles into those buckets. I didn’t look at anything that ended up in a bucket past “Research.” Then I skimmed the articles that ended up in the Research buckets for ones that seemed particularly good. I took notes as I went, organized by category. This makes it easy to find the relevant article when I start something new like designing an onboarding flow.

The best article I read this week was a set of notes on the talk “Blind spots of the Developer/Entrepreneur” by Ben Orenstein. I thought it had a lot of really pragmatic advice for trying to make money on info products. This inspired a few of the ideas that I had this week.

I also started brainstorming and investigating niches that I could start using to make small products. I had the following three ideas:

  • Trivia questions. There are tons of existing companies that do things like run trivia nights or sell packs for you to run your own. I could pick a really narrow niche of trivia and sell questions for it, and slowly expand into being a trivia generalist. I’ve been going to a trivia night every week for the past 5 years, so I feel like this can inform the decisions I make. Plus, it means I know a few people who I can talk to about it – the trivia jockey, the bartender, and my friends.
  • Info product for Google Docs. I was a Googler that worked on Google Docs for over 4 years, and additionally I answered our internal feedback list. I’m one of the best positioned people in the world to write an info product on how to get the most out of Google Docs. This would be done as a Squarespace site designed to sell the info product. This would also give me an avenue to expand into other products and services, and would provide passive income.
  • Info product for how to get ramped up on PhpStorm, which is an editor that actually requires a license to run past the trial period. Since it has professional users, I’m more likely to be pre-qualified to users that are willing to spend money.

I vetted each of the ideas with the AdWords keyword tool. It may be a mistake, but I’m basically starting with a channel that I’d like to succeed with and comparing based on what has the highest demand.

The results were surprising to me. An unbelievable number of people look for trivia questions, and the search results for a lot of popular queries don’t really seem to serve the domain that well. In comparison, not many people were searching for Google Docs at all except for very high-level questions like “what is Google Docs?” Any approach here would have to be built around long-tail queries, which I think would be difficult to validate without any experience. And almost nobody was searching for PHPStorm anything. It was hard to justify doing either of the info product businesses even with some generous assumptions about conversion rates.

What am I doing this week?

This week I’m going to look closer at the trivia idea. I want to identify a segment within the search data that I can target. My current thinking is that I can try to validate whether holiday-based trivia is a good idea by targeting some upcoming holidays. MLK day is too close; I’d prefer to run a test for the 2 weeks before a holiday. But next month has both Valentine’s Day and President’s Day, so I could try to prepare trivia questions for each as a validation step – will people buy it at all? These are a month in the future though, so I’d also like to try to identify a second segment that I can start to target either this week or next week.

Interestingly, the government shutdown has also been on my mind. I wanted to start up an LLC with Stripe Atlas since the also automatically start up a bank account. However, I know that there’s a chance I won’t be able to get an EIN from the IRS while the government is shut down. So I’ve been holding off actually forming the LLC as much as possible.

That’s everything. See you next week!

Next week on “Learning about business” – I still haven’t learned about business.

A short guide to structuring code to write better tests

Why write this?

Well-written tests often have a positive return on investment. This makes sense; bugs become more expensive to fix the later in the development process they are discovered. This is backed by research. This also matches my experience at Etsy, my current employer. Detecting a bug in our development environment is cheaper than detecting it in staging, which is cheaper than detecting it in production, which is cheaper than trying to divine what a forums post means when it says “THEY BROKE SEARCH AGAIN WHY CAN’T THEY JUST FIX SEARCH??,” which is cheaper than debugging a vague alert about async jobs failing.

Over my career I’ve rediscovered what many know: there are good tests and bad tests. Good tests are mostly invisible except when they catch regressions. Bad tests fail frequently and their failures aren’t real regressions. More often they’re because the test logic makes assumptions about implementation logic and the two have drifted. These tests need endless tweaking to sync the implementation and test logic.

So here’s a guide to help you write better tests by improving how your code is structured. It’s presented as a set of guidelines. They were developed over a few years when I was at Google. My team noticed that we had good tests and bad tests, and we invested time in digging up characteristics of each. I feel like they are applicable outside the original domain, since I have successfully used these techniques since then.

Some may point out that this post isn’t a “short guide” by many definitions. But I think it’s better than saying “Read this 350 page book on testing. Now that I have pointed you to a resource I will not comment further on the issue.”

Please ask me questions!

Get HYPE for a testing discussion!

“Testing” is a broad topic, so I want to explain the domain I have in mind. I’m targeting a database-driven website or API. I’m not thinking about countless other environments like microcontrollers or hard realtime robotics or batch data processing pipelines or anything else. The techniques in this post can be applied broadly, and can be applicable outside of the web domain. But not all of them work for all situations. You’re in the best position to decide what works for you.

For discussion, I will introduce an imaginary PHP testing framework for evil scientists looking to make city-wide assertions: “Citizens of New York”, or cony[0]. It will be invoked as follows:

$x = 3;
cony\BEHOLD::that($x)->equals(3);
cony\BEHOLD::that($x)->isNotNull();

Terminology

Everyone has their own testing terminology. That means this blog post is hopeless. People are going to skip this section and and disagree with something that I didn’t say. This happened with my test readers even though the terminology section was already in place. But here goes!

Here are some definitions from Martin Fowler – Mocks Aren’t Stubs:

Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).

Mocks are […] objects pre-programmed with expectations which form a specification of the calls they are expected to receive.

Martin Fowler’s test object definitions

Here are a few more definitions that I will use:

Unit test: A test that verifies the return values, state transitions, and side effects of a single function or class. Assumed to be deterministic.

Integration test: A test that verifies the interaction between multiple components. May be fully deterministic or include non-deterministic elements. For instance, a test that executes a controller’s handler backed by a real database instance.

System test: A test that verifies a full system end-to-end without any knowledge of the code. Often contains nondeterministic elements like database connections and API requests. For instance, a Selenium test.

Real object: A function or class that you’d actually use in production.

Fragile test: A test whose assertion logic easily diverges from the implementation logic. Failures in fragile tests are often not due to regressions, but due to a logic divergence between the test and implementation.

A few more definitions I needed

This post mostly discusses using “real” vs “fake” vs “mocks.” When I say “fake” I will be interchanging a bunch of things that you can find defined in Martin Fowler’s article, like dummy, fake, stub, or a spy. This is because their implementations are often similar or identical despite being conceptually different. The differences matter in some contexts, but they don’t contribute much to this discussion.

Dependency injection is your best friend

Injecting a dependency means to pass it in where they are needed rather than statically accessing or constructing them in place.

For instance:

// No dependency injection.
public static function isMobileRequest(): bool {
   $request = HttpRequest::getInstance();
   // OMITTED: calculate $is_mobile from $request's user agent
   return $is_mobile;
}

// With dependency injection.
public static function isMobileRequest(HttpRequest $request): bool {
   // OMITTED: calculate $is_mobile from $request's user agent
   return $is_mobile;
}

Dependency injection makes this easier to test for three reasons.

First examine the static accessor for the HTTP request. Imagine testing it. You’d need to create machinery in the singleton to set an instance for testing. Alternatively you will need to mock out that call. But the following test is much simpler:

public static function testIsMobileRequest(): bool {
   $mobile_request = Testing_HttpRequest::newMobileRequest();
   $desktop_request = Testing_HttpRequest::newDesktopRequest();
   cony\BEHOLD::that(MyClass::isMobileRequest($mobile_request))->isTrue();
   cony\BEHOLD::that(MyClass::isMobileRequest($desktop_request))->isFalse();
}

Second, passing dependencies allows common utils to be written. There will be a one-time cost to implement newMobileRequest() and newDesktopRequest() if they don’t exist when you start writing your test. But other tests can use them once they exist. Writing utils pays off very quickly. Sometimes after only one or two usages.

Third, dependency injection will pay off for isMobileRequest() as the program grows. Imagine that it’s nested a few levels deep: used by a configuration object that’s used by a model util that’s called by a view. Now you’re calling your view renderer and you see that it takes an HTTP request. This has two benefits. It exposes that the behavior of the view is parameterized by the HTTP request. It also lets you say, “that’s insane! I need to restructure this” and figure out a cleaner structure. This is a tradeoff; you need to manage some parameter cruft to get these benefits. But in my long experience with this approach, managing these parameters aren’t a problem even when the list grows really long. And the benefits are worth it.

Inject the smallest thing needed by your code

We can make isMobileRequest even more maintainable. Look at testIsMobileRequest again. To write a proper test function, an entire HttpRequest needs to be created twice. Imagine that it gains extra dependencies over time. A MobileDetector and a DesktopDetector and a VirtualHeadsetDetector and a StreamProcessor. And because other tests inject their own, the constructors use dependency injection.

public static function testIsMobileRequest(): bool { 
   $mobile_detector = new MobileDetector();
   $desktop_detector = new DesktopDetector();
   $vh_detector = new VirtualHeadsetDetector();
   $stream_processor = new StreamProcessor();
   $mobile_request = Testing_HttpRequest::newMobileRequest(        $mobile_detector, $desktop_detector, $vh_detector, $stream_processor
   );
   $desktop_request = Testing_HttpRequest::newDesktopRequest(
       $mobile_detector, $desktop_detector, $vh_detector, $stream_processor
   );    cony\BEHOLD::that(MyClass::isMobileRequest($mobile_request))->isTrue();
   cony\BEHOLD::that(MyClass::isMobileRequest($desktop_request))->isFalse();
}

It’s more code than before. That’s fine. This is what tests tend to look like when you have lots of dependency injection. But this test can be simpler. The implementation only needs the user agent in order to properly classify a request.

public static function isMobileRequest(string $user_agent): bool {   
 // OMITTED: calculate $is_mobile from $user_agent
    return $is_mobile;
}

public static function testIsMobileRequest(): bool {
    $mobile_ua = Testing_HttpRequest::$mobile_useragent;
    $desktop_ua = Testing_HttpRequest::$desktop_useragent;
    cony\BEHOLD::that(MyClass::isMobileRequest($mobile_ua))->isTrue();
   cony\BEHOLD::that(MyClass::isMobileRequest($desktop_ua))->isFalse();
}

We’ve made the code simpler by only passing in the limited dependency. The test is also more maintainable. Now isMobileRequest and testIsMobileRequest won’t need to be changed whenever changes are made to HttpRequest.

You should be aggressive about this. You need to instantiate the transitive closure of all dependencies in order to test an object. Keeping the dependencies narrow makes it easier to instantiate objects for test. This makes testing easier overall.

Write tests for failure cases

In my experience, failure cases are often neglected in tests. There’s a major temptation to check in a test when it first succeeds. There are often more ways for code to fail than to succeed. Failures can be nearly impossible to replicate manually, so it’s important to automatically verify failure cases in tests.

Understanding the failure cases for your systems is a major step towards resilience. Failure tests execute logic that could be the difference between partial degradation and a full outage: what happens when things go wrong? What happens when the connection to the database is down? What happens when you can’t read a file from disk? The tests will verify that your system behaves as expected when there is a partial outage, or that your users get the proper error messages, or whatever behaviors you need to ensure that the single failure doesn’t turn into a full-scale outage.

This isn’t a magic wand. There will always be failures that you don’t think to test, and they will bring down your site inevitably. But you can minimize this risk by starting to add failure tests as you code.

Use real objects whenever possible

You often have several options for injecting dependencies into the implementation being tested. You could construct a real instance of the dependency. You could create an interface for the dependency and create a fake implementation. And you could mock out the dependency.

When possible, prefer to use a real instance of the object rather than fakes or mocks. This should be done when the following circumstances are true:

  • Constructing the real object is not a burden. This becomes more likely when dependency injecting the smallest thing needed by the code
  • The resulting test is still deterministic
  • State transitions in the real object can be detected completely via the object’s API or the return value of the function

The real object is preferable to the fake because the test will be a verification of the real interaction the dependency and the fake will have in production. You can verify the correct thing happened in a few different ways. Maybe you’re testing whether the return values change in response to the injected object. Or you can check that the function actually modifies the state of the dependency, like seeing that an in-memory key value store has been modified.

The real object is preferable to the mock because it doesn’t make assumptions about how the two objects interact. The exact API details of the interaction is not important compared to what it actually does to the dependency. Mocks often create fragile tests since they record everything that should be happening; what methods should be invoked, any parameters that are being passed, etc.

Even worse, the test author indicates what the return value from the object is. It may not be a sane return value for the parameters when the test is written. It may not remain true over time. It bakes extra assumptions into the test file that don’t need to be there. And imagine that you go through the trouble of mocking a single method 85 times, and you implement a major change to the real method’s behavior that may invalidate the mock returns. Now you will need to go examine each of the 85 cases and decide how each of them will change and additionally how each of the test cases will need to adapt. Or alternatively you will fix the two that fail and hope that the other 83 are still accurate just because they’re still passing. For my money, I’d rather just use the real object.

The key observation is that “how did something get changed?” matters way less than “what changed?” Your users don’t care which API puts a word into spellcheck. They just care that it persists between page reloads. A corollary is that if “how” matters quite a lot, then you should be using a mock or a spy or something similar.

Combining this with the structuring rules above creates a relatively simple rule: Reduce necessary dependencies whenever possible, and prefer the real objects to mocks when you need complex dependencies.

A careful reader will note that using real objects turns unit tests into deterministic integration tests. That’s fine. Improving the maintenance burden is more desirable than maintaining ideological purity. Plus you will be testing how your code actually runs in production. Note that this isn’t an argument against unit tests – all of the structuring techniques in this doc are designed to make it easier to write unit tests. This is just a tactical case where the best unit test turns out to be a deterministic integration test.

Another complaint I’ve heard to this approach is “but a single error in a common dependency could cause dozens of errors across all tests.” That’s actually good! You made dozens of integration errors and the test suite caught all of them. What a time to be alive. These are also easy to debug. You can choose from dozens of stack traces to help investigate what went wrong. In my experience, the fix is usually in the dependency’s file rather than needing to be fixed across tons of files.

Prefer fakes to mocks

A real object should not be used if you can’t verify what you need from its interface, or it’s frustrating to construct, or it is nondeterministic. At that point the techniques at your disposal are fake implementations and mock implementations. Prefer fake implementations over mock implementations when all else is equal. This reuses much of the same reasoning as the previous section.

Fake viking ship implementation

Despite their name, a fake implementation is a trivial but real implementation of an interface. When your code interacts with the fake object, side effects and return values should follow the same contract as the real implementation. This is good. You are verifying that your code behaves correctly with a correct implementation of the interface. You can also add convenience setters or getters to your fake implementation that you might not ordinarily put on the interface.

Fakes also minimize the number of assumptions that a test makes about the implementation. You’re not specifying the exact calls that are going to be made, or the order that the same function returns different values, or the exact values of parameters. Instead you will be either checking that the return value of your function changes based on data in the fake, or you will be verifying that the state of the fake matches your expectations after test function execution.

Here’s an example implementation:

interface KeyValueStore {
   public function has(string $key): bool;
   public function get(string $key): string;
   public function set(string $key, string $value);
}

// Only used in production. Connects to a real Redis implementation.
// Includes error logging, StatsD, everything!
class RedisKeyValueStore implements KeyValueStore {}

class Testing_FakeKeyValueStore implements KeyValueStore {
   public function __construct() {
$this->data = [];
}

   public function has(string $key): bool {
return array_key_exists($key, $this->data);
}

   public function get(string $key): string {
       if (!$this->has($key)) {
           throw new Exception("No key $key");
       }
       return $this->data[$key];
   }

   public function set(string $key, string $value) {
$this->data[$key] = $value;
}
}

Another benefit is that you now have a reusable test implementation of KeyValueStore that you can easily use anywhere. As you tweak the implementation of needsToBeCached() over time you will only need to change the tests when the side effects and return value changes. You will not need to update tests to keep the mocks up-to-date with the exact logic that is used in the implementation.

There are many cases where this is a bad fit, and anything that sounds like a bad idea is probably a bad idea. Don’t fake a SQL database. If your code has an I/O boundary like network requests, you will basically have no choice but to mock that. You can always abstract it behind other layers, but at some point you will need to write a test for that final layer.

Prefer writing a simple test with mocks to faking a ton of things or writing a massive integration test

I spend lots of time encouraging test authors to avoid mocks as a default testing strategy. I acknowledge that mocks exist for a reason. To borrow the XML adage, an automatic mocking framework is like violence: if it doesn’t solve your problem you’re not using enough of it. A determined tester can mock as many things as possible to isolate an effect in any code. My ideal testing strategy is more tactical and requires discipline. Imagine that you’re adding the first test for an ancient monolithic controller. You have roughly three options to write the test: prep a database to run against a fake request you construct, spending a ton of time refactoring dependencies, or mocking a couple of methods. You should probably do the latter one out of pragmatism. Just writing a test at all will make the file more testable, since now the infrastructure exists.

You can slowly make improvements as you continue to make edits. You can also slowly improve the code’s organization as you go. This will start to enable you to use techniques that lead to less fragile tests.

Always weigh the cost and benefit of the approaches you take. I’ve outlined several techniques above that I think lead to better tests. Unfortunately they may not be immediately usable on your project yet. It takes time to reshape a codebase. As you use them you will discover what works best for your own projects, and you should slowly improve them as you go.

System tests pay for themselves, but it’s hard to predict which ones are worth writing

At Google, my team had a long stretch where we wrote a system test for every regression. We were optimistic that they would become easier to write over time. Eventually the burden could not be ignored: they were flaky and we never ended up in our dream tooling state. So we phased out this strategy. But one day I was discussing an “incremental find” system test with a few teammates. We figured out that this single test saved us from regressing production an average of 4 times per person. Our bugs surfaced on our dev machines instead of later in our deployment process. This saved each of us lots of expensive debugging from user reports or monitoring graphs.

We couldn’t think of another system test that was nearly that valuable. It followed a Pareto distribution: most bugs were caught by a few tests. Many tests caught only a bug or two. Many other tests had similar characteristics (user-visible, simple functionality backed by lots of complex code, easy to make false assumptions about the spec), but only this one saved full eng-months.

So system tests aren’t magic and all of my experience with this suggests that we should only use them tactically. The critical paths for the customer flow are a good first-order metric to target what system tests we should write. Consider adding new system tests as the definition of your critical path changes.

What’s next?

Write tests for your code! Tests are the best forcing function for properly structuring your code. Properly structuring your implementation code will make testing easier for everyone. As you come up with good generic techniques, share them with people on your team. When you see utilities that others will find useful.

Even though this guide is well north of 3000 words, it still only scratches the surface of the subject of structuring code and tests. Check out “Refactoring” by Martin Fowler if you’d like to read more on the subject of how to write code to be more testable.

I don’t recommend following me on Twitter unless you want to read a software engineer complain about how cold it is outside.

Thanks to everyone at Etsy who provided feedback on drafts of this, whether you agreed with everything or not!

Footnotes

[0] I’ve seen this joke before but I can’t figure out where. Please send me pointers to the source material!