2019 year in review

Here is my 2019 year in review! I lost weight, struggled with sleep, grew professionally, struggled with some personal challenges, took some much-needed vacations, and worked hard on side projects.

Health

I lost over 20 pounds in 2019. I started the year at 223.2 lbs and ended at 201.0 lbs. I dropped as low as 189. But I achieved this with a strict diet. I ate mostly fruits, vegetables, cheese, nuts, and lean meats. These are delicious, but I missed the satisfaction of eating a bagel or a slice of pizza every now and then. When I started enjoying food again, I slowly gained 10 pound back.

I’m looking for a sustainable solution. At the moment, I’m trying 16/8 intermittent fasting with calorie counting. I have failed to do each of these individually. Unrestricted intermittent fasting didn’t limit my calories enough. I also feel hungry when I count calories across a full day. But I’m optimistic about the combination. I can eat a huge breakfast and still have enough calorie budget for a healthy and filling dinner. Alternatively, I can stick to salads, nuts, and cheeses in the afternoon so that I can enjoy myself later.

"Memoriae Majorum" at the entrance to the ossuary of the Paris Catacombs
The entrance to the ossuary part of the Paris Catacombs

Sleep was rough. I had a prolonged bout of insomnia that stretched from February into July. I would fall asleep easily at midnight. I would then wake up between 4:30 or 5:30. Insomnia sucks. It affects everything. I produced less at work, I was meaner, I was less happy, my chores fall by the wayside. I saw a sleep therapist and she provided a list of recommendations: wear orange-colored goggles at night to limit blue light exposure. Limit caffeine. Get as much sunlight as possible before 8:30am. Have a consistent sleep schedule. Don’t eat, exercise, or shower within 3 hours of going to sleep.

I applied these techniques and shifted my wake time from 8 to 7, and this helped. I still wake in the middle of the night. But I can go back to sleep now. I can be less militant about sleep hygiene by picking my battles. I limit caffeine, keep a strict schedule, and go for a walk before 8:00 every morning. The other changes didn’t matter as much.

This required sacrifices. My friends prefer hanging out late. For years I had considered moving my sleep schedule earlier. But I can’t “just make it up” on the weekends. The opposite is true – staying out once or twice a week can ruin a whole week of regular sleep for me. So I always tended to keep my schedule later to match that of my friends. But the severity of my insomnia bout made me reconsider this. So I leave our plans early on a weekly basis, which is awkward. I’m not sure how this will play out in the long run.

Career

I had a productive year.

My time was split between three things: API stabilization, API support, and GraphQL.

I spent the year working on the API platform team. We worked with a more senior engineer who had helped write broad swaths of our product stack. He successfully pitched the existence of the API platform team, and spent time ramping us up. This was mostly positive. He used his experience to select high-impact projects that helped me learn. We spent a bunch of time cleaning up the user experience and the code, and we ended up with a stabler, cleaner, and faster API. His feedback was always invaluable, and he also made a major contribution to the internals during this time. On the other hand, the downsides to working under him were unusual. He wasn’t officially on the team, but he wasn’t NOT on the team. This created lots of coordination problems. I did discuss this with my manager, but in retrospect I could have handled the situation better by getting everyone to agree to fill out a RACI matrix (or similar). He left Etsy halfway through the year. By that point we knew enough to stand on our own legs, and the team had a fairly successful year.

Supporting the internal API took lots of my team’s time. Etsy is a PHP shop. PHP is single-threaded. An easy way to parallelize computation is to distribute I/O requests among cURL requests. This system is called the “internal API” at Etsy. This means that hundreds of engineers use the internal API. This means that lots of people need help. This version of the API has lasted 5+ years and has advanced users through the company. However, this means that the problems that we hear about are often complex multi-system problems. It’s amazing how many different ways the same metaproblem emerges: “This code was written on the assumption that it would only run on one machine, and now it’s running on two.”

I spent the year planning and prototyping GraphQL at Etsy. This was more solitary than I would like, but it was driven out of necessity. The surface area of our team is so broad. The whole company runs on APIs, and there were just 3 of us for much of the year. The only way that we could move forward was by divide-and-conquer. A second engineer, Kaley Sullivan, did work on it once I started prototyping. She kicked ass – she is a powerhouse engineer and gave me some great feedback on designs. The prototype that we wrote was a success, and we’re moving forward with introducing it at the beginning of next year. In the next year, my challenges will be around growing and evolving GraphQL at Etsy such that it doesn’t rely on me anymore.

The ferry building on the water in San Francisco.
Jet lag advantage: 6:15 AM run through the touristy part of SF. Peaceful and quiet.

I was promoted to Staff Engineer at the beginning of the year. This was an important milestone for me to hit. I didn’t care about the title, but I wanted to prove to myself that I was growing as an engineer. A major reason I joined Etsy was because I love the mission – lots of sellers from all around the world making money. It’s far more palatable than padding a tech billionaire’s pockets by making $cog a little more efficient. Reaching Staff Engineer meant that I’ve been growing my engineering effectiveness, which means that I’ve been more effective at helping our sellers. But now I have access to opportunities that I didn’t as a senior engineer: I emceed our engineering all-hands, I’m leading a working group, and I am in a regular meeting with directors and the CTO. It almost feels like a virtuous cycle: once I reached a certain threshold, it became easier for me to keep getting high-impact or high-visibility work. In 2020, I’d like to use my position to effect positive change, which I did in 2019 but not enough.

I attended a software engineering conference for the first time: GraphQL Summit in San Francisco. I was pleasantly surprised about how useful it was. Reading the presentations was helpful – I did a deep dive on industry writing about GraphQL, but some of the most valuable lessons came from some of the smaller talks. One fact amazed me: literally half of the attendees didn’t even use it. They sent their engineers to this conference to learn more. It stands to reason: of course companies would spend thousands of dollars speculatively to avoid mistakes that would cost them millions. But it doesn’t occur to me that I could get these opportunities myself, and I don’t have a lot of more-senior mentors in my life that can point me in these directions.

Personal life

My year had some ups and downs.

My dog Rupert running through a field with his mouth open and smiling.
Derp on, you crazy diamond

I had a dog at the beginning of the year, Rupert. I don’t have him anymore. The rescue gave him to me because he was high-energy, but I didn’t fully understand what this meant. He could run for an hour in the morning, and be back to 100% at the end of the day. I also didn’t understand that active dogs have active minds. He’d get bored and destructive when I left him alone for a normal work day. He destroyed thousands of dollars worth of stuff in my apartment. This included some things that were irreplaceable. He loved daycare, but he got depressed and distressed if I took him too many days in a row. I worked with a trainer a bunch, and on the first lesson she warned me, “most single owners who have dogs that are this energetic eventually give them away. I often advise people to do this. It’s okay if you do.” Eventually, I came to the conclusion that she was right and he would be better off with another owner. I returned him to the rescue who found him another loving family. This is very hard to write now. I miss him. But it was unsustainable. One of my close friends was very unsupportive of me during this time, which damaged our friendship. I regret this.

My girlfriend and I went on some trips together: a long weekend in Montreal, a week in Vancouver, and 2 weeks in France. I hadn’t traveled internationally in a few years, and it was fun to go on an adventure together. It was also the first time in a long time that I managed to fully disconnect from work. The stories that we got together were wonderful: the meals we had, the places we stayed, the time that we were defeated by the rocky shores of Nice, the time I doused myself in diesel gasoline in Avignon, the delicious fruit varieties you’ve never seen before. It’s also nice to share that with someone.

My girlfriend and I suited up to go whale watching in Vancouver, BC
Suited up for whale watching in Vancouver, BC

I started trying to “learn about business” in the beginning of 2019. I was wondering if I could make and sell bar trivia. But a landing page that I set up with a Squarespace domain didn’t convert into any purchases, even though I got a bunch of clicks. Around this time, I was in the middle of having major problems with my dog, and I gave up working on business stuff to take him to the dog park every morning for an hour. After this, I never got back to it.

I tried to learn French. I got a fair bit of vocabulary, but I made very little progress with listening to it, despite taking French classes, practicing for over an hour every day, and listening to 100+ hours of French learning podcasts. My girlfriend is a French translator who speaks it fluently. I thought this fact would help me. This was part of the reason I chose French. But I was so remedial that we never found a middle ground that wasn’t frustrating for me. Actually going to France was fun – I had some basic transactional conversations with shopkeepers. I could also read menus and order and comprehend most of the posted advertisements around the country. But it helped underscore that I would likely never use French in a serious way – if I had picked Spanish, I’d at least have the benefit of seeing it around New York. And I don’t find French culture or history particularly interesting. It was a long and frustrating process for something that would never have an upside.

Eventually, I stopped spending my mornings studying French and started studying machine learning. Machine learning has interested me a few times over the years, most specifically around the time of the Netflix Prize. Most recently, I’ve spent 1.5 hours every morning learning about machine learning. This has been fun so far. So far, I’ve written a backpropagation algorithm in Numpy, and I’ve been working at the Titanic Kaggle competition. My first submission was yesterday, and it performed worse than the naive approach of estimating whether all the women survived and all the men died, haha. But there is a whole forum filled with people who have helpful suggestions of how to perform feature selection, and there’s a whole Internet of helpful materials. I’d like to learn more about real-world data processing, different neural network architectures, and autoencoders.

I’ve been writing more in 2019. This is perpetually a goal of mine, so I’m glad that I’m finally making room for it. I started a “Simple software engineering” series where I examine the tradeoffs that I make when I write software. I also found it helpful to keep a “knowledge base” in WordPress, where I record things that I learn as I learn them. There’s only a loose organization so far, but I’ve only been keeping the knowledge base for about 3 months.

Conclusion

I had a good 2019 despite some personal challenges. My year went really well professionally, and I took some much-needed breaks that I hadn’t really granted myself. I’d like to focus a little more on friendships in 2020, but it’s not clear if that means old friendships or new friendships.

Simple software engineering: Mocks are a last resort

Most tests that rely on automatic mocking frameworks should be rewritten to use either the real dependencies or manually-written fakes.

Wait, let’s back up. Tests have a few moving parts. First, there is some code being tested. This is commonly called the unit. The unit might have dependencies. The dependencies are not under test, but they can help determine whether the unit behaves correctly. Ideally, they would be passed into the unit. But dependencies can be many things: static data, global data, files on the filesystem, etc.

Dependencies interact with tests in a few ways. The unit can introduce side effects on the dependencies and vice versa. Automatic mocking frameworks are designed to aid this process. Mock assertions can validate that expected method calls happened, whether correct parameters were passed, can override return values, and can execute different logic. Mocks have almost absolute power to override the behavior of dependencies (within the confines of what the language allows).

But mocks aren’t the only way to write tests that involve dependencies. Real objects can be used directly. This isn’t always possible: the real object might be nondeterministic. It might provide random numbers, make a call on the network, etc. Nondeterminism is difficult to test, since there’s not necessarily an expected output. Nondeterministic failures decrease confidence in tests, since it’s difficult to know whether a failure is real. Accordingly, nondeterminism should be avoided in tests.

Statue of Leif Erickson in Reykjavik, Iceland.
Leif Erikson discovered automatic mocking frameworks in the year 998

“Test fakes” are an alternative. They are a fake implementation of a real object. For example, a trivial implementation of an interface that the real object implements. Here’s an example from a side project of mine. It allows a clock to be simulated and advanced for testing. Test fakes have a maintenance cost. The tradeoff is that the fake can be reused everywhere.

How should you pick which one to use?

How I select a dependency to use for testing

  1. Use the real object, if possible.
  2. Use a fake implementation, if possible.
  3. Use a mock.

I try to get as close to the production configuration as possible. Why? When a test fails with a real dependency, it’s likely a real problem. The more differences between a test object and a production object, the less likely the failure is real, and more likely that the failure involves the test configuration.

OK, so, where am I going with this? In the next section, I will explain common issues with automatic mocking. Then I will describe the tradeoffs that real objects and test fakes have. I will finish by explaining a few situations where mocking is preferable to the other alternatives.

Automatic mocks are very manual

Consider a unit that uses Redis as a key/value store. Talking to Redis involves I/O. So we mock the return values of Redis anywhere it’s used in tests.

The first mocked test isn’t so bad. It reads one value and writes one value. The second test reads a few values. The third test reads a bunch of objects, but it doesn’t modify the values at all. And so on.

Imagine this Redis class spreading through a codebase. Dozens of usages. After all, everyone loves Redis. Every call must be mocked in the test.

But this requires that every test author behaves like a human key/value store. Why provide return values for all of these tests? It is simpler to put the Redis key/value store behind an interface and use an in-memory implementation. This would save time per test and would make tests easier to write. The fake would save time the way any code does – by automating a task.

The tests become easier to write because it becomes trivial to assert both the effects and the side effects of the test. Did the unit return the correct value? Did the fake end up in the correct final state? Great!

I find that the break-even point for this approach is n=1. As in, implementing the fake often takes roughly the same amount of time as implementing the first mock. And then the fake can be reused, but the mock can’t. There are exceptions to this that are discussed at the end of this post.

Automatic mocks don’t have to behave correctly

Mocks can behave absolutely incorrectly with no consequences. Can one plus one equal three? Sure, why not:

when(mockInstance.addTwoNumbers(1, 1).thenReturn(3));

A human must simulate the return value for every mocked return value. This leads to situations where the bug and the test both have errors that mask each other. In fact, mocks can be written by watching the test fail and seeing what value would have make the assertions pass. Then the engineer simply enters the expected values into the mock. People really do this. I’ve done it. I’ve watched other people do it. These errors get past code review.

Granted, this can happen with both real objects and fake objects in testing. But since real/fake objects are not customizable per-test, the error rate will be lower holistically with these approaches.

Automatic mocks can silently break during a refactoring

This is more insidious. Let’s say there is a widely-used dependency, and one of its methods provides the path of a URL. It needs to be changed to provide the full URL string as part of a project to support multiple domains. And it’s being renamed from providePath to provideFullURL or something.

So you rename the method. You change the behavior. The full URL is returned instead of the path. The tests pass. Hooray 🎉  But that method is called in 50ish places, and each of those call sites have tests that are written using mocks. Furthermore, some of those call sites are within code that is mocked in tests. Are you confident that nothing is wrong?

I’d be confident in the opposite: something broke somewhere. The mocks silently hid the problem because the return value was simulated. Imagine the developers of each of those call sites. If even one had a tight deadline and needed the full URL, they’re gonna prepend the server name they expect. They won’t think twice. It could even take days for these errors to appear – when the next nightly big data job runs, when the next weekly marketing email is sent, etc.

Areal object would have a better chance of exposing these errors in tests. A fake object would be changed from providing a path to providing a URL, which would also allow the error to be caught across the codebase with a single change. The change would need to have the same level of scrutiny and QA testing. But with a reasonably complete test suite, it’s less likely that it would lead to real problems.

Tradeoffs

Using a real object has a philosophical tradeoff. Strictly speaking, the test stops being a unit test. It becomes an integration test of the unit and its dependencies. That’s fine. If a test can be written quicker and increase confidence, then it’s a reasonable tradeoff. If the simplest and most maintainable test is an integration test, then write an integration test. Life’s too short for ideological purity.

There are more tradeoffs. A breakage in a real object can cause dozens of failures through the codebase. This often makes it easier to debug the failure (since there are lots of examples to debug), but it can also obscure the failure. Similarly, a real object with many call sites can cause failures in just one or two tests. This is often difficult to diagnose. Is the test subtly wrong? Is there a subtle bug in the real object? Is there a subtle bug in how they interact?

Fakes add a maintenance cost. They need to be written and maintained along with the real object or interface. Plus, since they simulate the behavior of an object without being the full implementation, they can easily introduce incorrect behavior that is then reused everywhere. There is also an art to writing them that has to be learned.

A few situations where mocks are the best approach

There are definitely situations where mocks should be used. Here are some common “last resort” cases that I’ve discovered over the years.

Faking complex behavior, like SQL

At a certain point, a fake would be so complicated that an in-memory solution is totally infeasible. It’s implausible to expect an in-memory execution of a SQL server that matches all of the syntactic quirks and features of MySQL. In this situation, using the mock dramatically reduces the maintenance required for the test.

Preventing a method from being called multiple times

Sometimes, calling a method twice is REALLY BAD – maybe it causes a deadlock, maybe a buggy device driver would cause a kernel panic, etc. Code review and instrumentation aren’t enough, and it’s desirable to assert that it can never happen. Mocks excel at this type of assertion.

Legacy code is poorly structured and there ain’t time to fix it

Sometimes, you have to parachute into a codebase, make a fix, and then get extracted. Sometimes it’s just not reasonable to spend 3 weeks refactoring to make a 1 day change more testable.

Determining whether a delegate is being invoked

A delegate wraps a second object, and is responsible for calling methods on that second object. An automatic mocking library is an easy solution for ensuring that these calls happen as expected.

Thank you for attending my Jake Talk

Automatic mocking frameworks are a last resort. Mocks have uses. But real objects and fake objects should be preferred, in that order.

Simple Software Engineering: Use plugin architectures to improve encapsulation

Let’s say that you have to write an API service. The service handles different types of API calls – calls coming from third parties, your company’s apps, and your company’s internal servers.

There will inevitably be call sites that behave differently for different services. Imagine some examples: third-party services use a different authorization method, internal-only responses get debugging headers, error payloads are formatted differently for apps. They all use different loggers. The list goes on.

By the end of implementation, many call sites will have logic that is conditional on the service type.

// This method has two different call sites that depend on the service:
// logging and the authorization handler.
public function handleAuthorization(Service $service, Request $request) {
    $auth = ($service->type === Service::TYPE_THIRD_PARTY)
        ? new Service\Auth\ThirdParty()
        : new Service\Auth\SomethingElse();

    $logger = null;
    switch($service->type) {
    case Service::TYPE_THIRD_PARTY:
        $logger = Service\Logger\ThirdParty();
        break;
    case Service::TYPE_APP:
        $logger = Service\Logger\App();
        break;
    case Service::INTERNAL:
        $logger = Service\Logger\Debug();
        break;
    default:
        throw new Exception('Unrecognized logger');
    }

    try {
        $auth->check($request);
    } catch (Exception $e) {
        $logger->logError($e, [/* some data */]);
        throw $e;
    }

    $logger->logInfo('successful authorization', [/* some data */]);
}

When many call sites behave differently based on the same set of conditions, this can be considered a “two dimensional problem.” I don’t think there’s an official definition to this. I just like to think about it this way. One dimension comprises each of the different conditions. The other dimension comprises all of the call sites depending on the service type. In the API service example, the service dimension has…

[third-party API calls, calls from company apps, calls from internal servers]

and the call site dimension has…

[Handling authorization, header response selection, response formatting, logging]

In this example, there are 12 combinations to be considered (three origins * four call sites). If a new call site is added, there are 15 total considerations. If a new origin is added after that, there are 20 considerations. This will likely grow geometrically over time.

It’s tempting to say, “This example should be dependency-injected anyways.” But this is just a demo of the problem. Dependency injection doesn’t solve the real problem, which is that the service definitions have no cohesion. The service is a first-class concept within the API. But the definition of each service is scattered throughout the codebase, which leads to some problems.

Switching on types is error-prone across several usages.

Writing cases manually is error-prone. When adding a new type, the author must vet every call site that handles types. The new type might need to be specially handled in one of them. These call sites can be hard to enumerate: it could include all locations where any of the types are checked. Even worse, the list can include sites where the logic works because of secondary effects. For instance, “if this logger type also implements this other interface, do this other logic” might be attempting to define logic for the single service that provides that interface type.

Let’s say that the whole API team gets hit by a bus. It’s sad, but we must increase shareholder value nonetheless. The old team began a new project: adding an API service handling our new web app! So the replacement team defines authorization and response logging and launches the service into production. But they missed a few cases. A few weeks after launch, the new web service is down for two hours and no pages were fired. After some investigation, it turns out that the wrong logger was used and the monitoring service ignored errors from unrecognized services. Later, the company pays out a security bounty because internal-only debug headers were leaked. These are plausible outcomes of dealing with a low-cohesion definition – because the entire definition can’t be considered at once, it’s easy to overlook things that cause silent failures.

Switching on types has low cohesion.

When logic depends on the same conditions throughout the codebase, the cohesion of that particular concept is low or nonexistent. This makes sense: the service’s definition is scattered throughout the codebase. It would be better if all of these definitions were grouped behind the same interface. This makes it easy to describe a service: a service is the collection of definitions inside an implementation of the interface.

Prefer plugin architectures

Me in front of the painted ladies in San Francisco
Marveling at the Painted Ladies on a recent trip to San Francisco. An obvious example of plugin architecture if I’ve ever seen one.

What does the code example look like within a plugin architecture?

// Provides a cohesive service definition.
interface ApiServicePlugin {
    public function getType(): int;
    public function getAuthService(): Service\Auth;
    public function getLogger(): Service\Logger;
    public function getResponseBuilder(): Api\ResponseBuilder;
}

// Allows per-service objects or functions to be retrieved.
class ApiServiceRegistry {
    public function registerPlugin(ApiServicePlugin $plugin): void;
    public function getAuthService(int $service_type): Service\Auth;
    // Not shown: other getters
}

public function handleAuthorization(Service $service, Request $request) {
    // Note: These would likely be dependency-injected.
    $auth = $this->registry->getAuth($service->type);
    $logger = $this->registry->getLogger($service->type);

    try {
        $auth->validate($request);
    } catch (Exception $e) {
        logger->logError($e, [/* some data */]);
        throw $e;
    }

    $logger->logInfo('success', [/* some data */]);
}

The plugin interface improves cohesion.

The plugin provides a solid definition of an API service. It’s the combination of authorization, logging, and the response builder. Every implementation will correspond to a service, and every service will have an implementation.

Plugins enforce that all cases are handled for each new service.

It’s impossible to add a service without implementing the full plugin definition. Therefore, every single call site will be handled when a new service is added.

Adding a new call site means that every service will be considered.

When adding a new call site for the service, there are two options. Either it will use an existing method on the plugin interface, and all existing services will work. If a new concept needs to be added to the plugins, then every plugin will need to be considered.

Plugin registries make testing much easier.

Plugin registries provide an easy dependency injection method. If a plugin is not under test, simply register a “no-op” version of the plugin that does nothing or provides objects that do nothing. If something shouldn’t be called, simply provide objects that throw exceptions when they are called. Because each of the call sites are no longer responsible for managing a fraction of the service, the tests can now focus on testing the logic around the call sites, instead of partially testing whether the correct service was used.

Avoid plugin registries when one of the dimensions is size one

Registry configs are great for reducing dimensionality. But what if there is either just a single service, or just a single call site? Then it would be overkill to make the full class and interface hierarchy. If there is just one call site, then write the basic switch statement or if/else chain. If there is only one mapping type that is being shared across a few call sites, then refactor it into a map or a helper function. The full plugin architecture is only useful when managing the complexity of many services used at many call sites.