How to give a decent tech demo

As you work in the software industry, you will inevitably need to demo your work. This can take different forms: live coding in front of an audience, pitching a SaaS app to a client you must land, showing your boss’ boss’ boss a prototype to pitch a new team, etc. The good news is that with preparation, you can get better results than winging it. Don’t get me wrong. I love winging it. But imagine you could spend a few hours to make your audience remember you. Wouldn’t you do that? I would.

A quick caveat! This isn’t about giving a presentation or a talk. This is much narrower: how do you demonstrate software or hardware? My goal is for people to watch your demos and walk away and say “Alice/Bob/Carol/Dave’s demo was pretty good!” I’m not promising that you will give world class demos. I don’t. You know those talks where Steve Jobs invited the whole world to an intimate storytime? And he’d give a short demo that showed the value that an iPhone added to your life? Jesus. I can’t help you there. But I can offer tips to give your audience a fighting chance of remembering what you showed them.

Two jellyfish floating through the water, deadly as ever

Unlike humans, jellyfish have evolved past the point of needing to give tech demos.

Where did I learn this? I worked at a computer vision research lab for a few years after I graduated college. Its business model involved sales overselling our abilities and resources to land robotics and computer vision grants. Engineering took this money and produced prototype systems. Government employees would then come and verify our progress just to make sure we weren’t pulling hijinks on ol’ Uncle Sam.

Our prototypes were “researchy” and barely worked by any objective measure. But we took this madness and used it to build reliable and successful demos. Our division would have failed if our demos sucked. But we reliably drummed up new and repeat business from this.

I participated in hundreds of demos when I worked there and watched many more. This was on top of thousands of practice runs. We developed a set of loose guidelines for giving demos that reliably produced good results. I have successfully applied these principles to demoing other things like web apps, so I have some confidence that they are generally applicable.

Consider these points à la carte. It’s hard to apply everything all the time. It’s easy to try a few of them. Pick ones that seem achievable! You’ll be surprised how much more your audience will understand with just a few simple tips.

Craft a demo for each audience

The whole reason you are giving a demo is because someone wants to watch it. Give them a good experience. I have a tendency as an engineer to show off the most complex part of a system because that’s the most interesting to me. But a potential user doesn’t want to see something I overcame. They want to see something that provides them value. Think about your audience and ask yourself a few questions about them. What do they care about? What do they want to learn? What do they need to see?

For instance: at my first job we had a prototype robot that could autonomously retrace its steps if you drove it somewhere. Grab an Xbox controller and drive it around for a while. Press a button and BAM! It turned around and drove back to where it started. Now imagine that we had two demos scheduled for the same week. One to a lawnmower manufacturer and the other to bomb disposal technicians. We could have given the same demo to each of them. They may have gotten the point. But we got better results by crafting demos for each group. Think about building an autonomous lawnmower. They want a robot that runs in neat straight lines and it definitely cannot randomly veer into the flower bed. It’s life and death. Now imagine the bomb disposal technician’s job. They may want to recall the robot from dangerous areas with limited WiFi because otherwise one of their coworkers may need to suit up to retrieve it. It’s life and death.

You should confirm this ahead of time when possible. Just send them an email and say, “Hey! I want to make sure your time is well spent. We imagine that tracking over a whole field is difficult. We were planning on demoing how our system handles long straight paths. Are you interested in seeing anything else?” You might find out that “No no! We have straight lines pretty much nailed. We’re having trouble with neat and repeatable turns.” This gives you extra time to build a scenario demoing how your system handles twisty paths. The result is a demo that is more interesting to your audience by design. This is better than showing a robot weaving around a car twice and the lawnmower company thinking “cool, but it’s not doing tight turns so I don’t care” and the bomb disposal technicians thinking “they’re keeping it within range of the WiFi antenna, I wonder how robust this is”

It’s also useful to consider the breadth of experience of each audience. Let’s say I was demoing a new library at Etsy. A typical audience might have software engineers, engineering managers, and product managers. What do they all have in common? They all want to launch features faster! So I would focus my demonstration on how the library requires product developers to write less code while being easier to understand and test.

Explain to the audience what they are seeing and why they care about it

Audiences don’t give you their full attention when you present. I know this because I often do not give people my full attention when they present. Mea culpa. Also, if your audience has more than a handful of people, someone inevitably has a partial view of the screen and can’t see everything you’re doing. And frankly, sometimes your demo isn’t as clear as you think it is. This means you should overexplain what you are doing and explain why it is important.

“You see that the robot is driving over the curb. This means that the obstacle detection algorithm understands the difference between a curb and a car. It doesn’t get stuck on things it can drive over.”

Overexplaining sounds like a negative. Overeager. Overzealous. Overcooked. It’s a bad prefix. But a demo is not a conversation and doesn’t have the same social rules. You’re basically a live stream of audio and video that can’t be rewound or slowed down for clarity. You need to tailor your script to always explain what you are doing and why.

Imagine you were demoing a computer vision program that counts the number of people in a scene. “I’m seeing if the output exists by running ls. Great, it’s here. Now let’s cat the contents. We see that it has the number 3. This is good because the source image has 3 people. That means it detected how many people were there. Yay!” Someone who only listened to this fragment would understand a great deal about your demo.

  • The presenter was demoing a people detector.
  • It’s maybe not production-ready since it requires outputting data to the filesystem.
  • It was run live.
  • It worked.

This is roughly the level of detail that you want to target when you demo something to a general audience. Practicing will help you tune this because it helps you understand when you should add or subtract.

Your preparation should match the intensity of the situation

At the computer vision lab, we got the best results by continuing to practice long after we hit diminishing returns. These were high-stakes demos though. Our business depended on them. If I was just showing my team something, then I would just do two or three runthroughs. “Exceeding diminishing returns” is something I’d break out for high-stakes demos like “we’re hosed if we don’t land this contract” or “we’re doing multiple 12 hour days of demos to dozens of potential investors.” The goal is to use all the time you have available to learn everything about the failure modes of your system.

What does preparation for a really intense demo look like? At the robotics company we had big checkpoint demos several times a year. If the government employees weren’t happy, we probably would not get followups or new business. It would have been game over if we failed them. But we reliably drummed up new and repeat business even in the face of major things going wrong. This is because we practiced for weeks until even the breakages were boring for us.

We went through a few phases when doing this level of practicing.

The first phase was rough. We’d be compiling on laptops next to the robot and transferring stuff over on USB sticks. We’d start practice the second stuff might work. And stuff always went wrong. Practicing in a lab is much different than running a system in the wild. We’d learn that the sun overwhelmed our LiDAR systems in daytime. Our C++ work threads would crash. Pan/tilt units would break because the moving robot would put too much stress on them. Nothing worked and nobody was happy. We’d spend hours or days fixing software and hardware bugs.

A picture of me in prototype demo gear

TFW you try to produce a realistic VR simulation in 2009 but accidentally make a corny video game instead

Eventually this would settle down. Things would gel. The software would stabilize. We understood the limits of our hardware. But our managers insisted that we keep practicing. They knew better. And they were right. New things would break. Weirder things. For instance, commercial cables aren’t usually rated for military use. So the cables would eventually experience brief disconnects and some of the experimental hardware would freak out. We’d uncover weird software bugs like “The robot drives forward until killed when it moves by a wall at a certain angle.” We’d learn how often we needed to recalibrate the cameras. We’d understand how to tear down and rebuild the hardware on-site. We learned how many spare batteries we’d need.

This level of preparation had a few major benefits. First, we learned tons of properties about our full system. We understood roughly what should be replaced, when it should be replaced, and what to look for. More importantly we felt that component failure was a nonevent. We practiced recovering from errors by practicing until we experienced errors. It didn’t matter if they happened during the live demo. We could just laugh it off and keep going. “Hah, wow! We were excited that you were coming so we practiced all morning. Looks like we forgot to swap a battery after lunch and the camera battery died. We’re changing it now and then we’ll pick up where we left off.”

All these examples have been robotics examples, but this is NOT limited to robotics. All software fails. Imagine that you’re going to present a website at a conference. Have you ever seen a demo fail because a WiFi access point is overwhelmed by the number of people connecting to it? I sure have. Make sure that your demo works when WiFi is down. Make sure it works when you have realllly sllloooowwww internet. Practice checking it out and running it from scratch. Make sure that it works if you need to run it from another machine at the last second.

This level of practice also helps with stage fright. I’ve had to demo in front of large crowds a few times. I get nervous every single time. I’ve been so nervous that I have no memory of what happened. But videos show a different story. I look calm and collected. I’m roughly following the script that I’ve practiced dozens of times. I mean, it’s not amazing. I’m still saying “um” and what am I doing with my hands? But I’m getting the job done and following a script that is meaningful for the audience instead of going through the demo by rote.

Embrace show-stopping errors and ignore the rest

Imagine that you’re demoing a new React component for your website. You open up developer tools to show the audience something. You see an error in the Console that is unexpected. Something about the Content Security Policy for a marketing library you’re not demoing.

Just ignore it and go on. This goes against your natural instincts. I’ve been there. I know. You want to tell everyone that you didn’t cause this failure. “Oh, weird. There’s a Content Security Policy error in here. I’ve never seen that before. That’s not related to the demo. OK, now let’s see what the React tab has to say”

It’s not related to the demo and it’s not blocking? Just ignore it. Half the people aren’t paying attention, and the other half will see you ignore it and figure it’s not important. You already have your audience following you through some Real Nerdy Shit if you’re cracking open developer tools. Continue the story. Show them what they need to see. Explain to the audience why it’s important. Ignore the small errors. They don’t matter.

Now let’s say that something happens and you need to restart. This is majorly disruptive to the demo’s flow and is worth explaining. “Whoops! Let me read the error message. Ah, the login cookie expired. Bad luck! OK, I am going to log in real quick and try again.” The audience is going to be pretty chill about it. Just keep them posted on what you’re doing. Worst case, if the demo is unrecoverable (it happens), just describe to them what you were going to show them and what you hoped they were going to take away from it. The worst has already happened. It’s not going to get any worse because you tried to explain what you would have shown them.

Make the participant part of the demo when possible

Demos can be fun. Interactive demos are more fun. Does your demo need a name? Don’t enter foo. So help me god. Enter “Ishmael” and ask the audience to call you that. Showing off some hardware to somebody? Let them control it if time and safety allow. Engagement dramatically increases how interesting a demo is. “The robot followed a path” versus “I drove the robot and then pressed a button and then the robot replayed everything I did!”

This matches my own experiences watching demos. For instance, I don’t remember much about the museum I visited in Anchorage, Alaska. But I remember the exhibit that showed what I look like when filmed with an infrared camera. I even took a damn picture of it.

Summary: my coworkers are gonna be real mad when they watch me give a terrible demo in the future because of how long this post is

You’re a busy engineer. I get it. You could definitely get away with just showing your audience the 5 different versions of the view that you created. But it’s worth considering…

  • What does this specific audience want to see?
  • Are you prepared to explain what the audience is seeing and why they should care about it?
  • What does your audience care about?
  • Has your preparation matched the intensity of the situation?
  • Is there a way you can include the audience?

By considering these 5 things (damn, I should have added the number to the post title), you will have a leg up on people who just mechanically go through the motions because you will say things that are meaningful to the audience and you will have practiced saying it to them.

A short guide to structuring code to write better tests

Why write this?

Well-written tests often have a positive return on investment. This makes sense; bugs become more expensive to fix the later in the development process they are discovered. This is backed by research. This also matches my experience at Etsy, my current employer. Detecting a bug in our development environment is cheaper than detecting it in staging, which is cheaper than detecting it in production, which is cheaper than trying to divine what a forums post means when it says “THEY BROKE SEARCH AGAIN WHY CAN’T THEY JUST FIX SEARCH??,” which is cheaper than debugging a vague alert about async jobs failing.

Over my career I’ve rediscovered what many know: there are good tests and bad tests. Good tests are mostly invisible except when they catch regressions. Bad tests fail frequently and their failures aren’t real regressions. More often they’re because the test logic makes assumptions about implementation logic and the two have drifted. These tests need endless tweaking to sync the implementation and test logic.

So here’s a guide to help you write better tests by improving how your code is structured. It’s presented as a set of guidelines. They were developed over a few years when I was at Google. My team noticed that we had good tests and bad tests, and we invested time in digging up characteristics of each. I feel like they are applicable outside the original domain, since I have successfully used these techniques since then.

Some may point out that this post isn’t a “short guide” by many definitions. But I think it’s better than saying “Read this 350 page book on testing. Now that I have pointed you to a resource I will not comment further on the issue.”

Please ask me questions!

Get HYPE for a testing discussion!

“Testing” is a broad topic, so I want to explain the domain I have in mind. I’m targeting a database-driven website or API. I’m not thinking about countless other environments like microcontrollers or hard realtime robotics or batch data processing pipelines or anything else. The techniques in this post can be applied broadly, and can be applicable outside of the web domain. But not all of them work for all situations. You’re in the best position to decide what works for you.

For discussion, I will introduce an imaginary PHP testing framework for evil scientists looking to make city-wide assertions: “Citizens of New York”, or cony[0]. It will be invoked as follows:

$x = 3;


Everyone has their own testing terminology. That means this blog post is hopeless. People are going to skip this section and and disagree with something that I didn’t say. This happened with my test readers even though the terminology section was already in place. But here goes!

Here are some definitions from Martin Fowler – Mocks Aren’t Stubs:

Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).

Mocks are […] objects pre-programmed with expectations which form a specification of the calls they are expected to receive.

Martin Fowler’s test object definitions

Here are a few more definitions that I will use:

Unit test: A test that verifies the return values, state transitions, and side effects of a single function or class. Assumed to be deterministic.

Integration test: A test that verifies the interaction between multiple components. May be fully deterministic or include non-deterministic elements. For instance, a test that executes a controller’s handler backed by a real database instance.

System test: A test that verifies a full system end-to-end without any knowledge of the code. Often contains nondeterministic elements like database connections and API requests. For instance, a Selenium test.

Real object: A function or class that you’d actually use in production.

Fragile test: A test whose assertion logic easily diverges from the implementation logic. Failures in fragile tests are often not due to regressions, but due to a logic divergence between the test and implementation.

A few more definitions I needed

This post mostly discusses using “real” vs “fake” vs “mocks.” When I say “fake” I will be interchanging a bunch of things that you can find defined in Martin Fowler’s article, like dummy, fake, stub, or a spy. This is because their implementations are often similar or identical despite being conceptually different. The differences matter in some contexts, but they don’t contribute much to this discussion.

Dependency injection is your best friend

Injecting a dependency means to pass it in where they are needed rather than statically accessing or constructing them in place.

For instance:

// No dependency injection.
public static function isMobileRequest(): bool {
   $request = HttpRequest::getInstance();
   // OMITTED: calculate $is_mobile from $request's user agent
   return $is_mobile;

// With dependency injection.
public static function isMobileRequest(HttpRequest $request): bool {
   // OMITTED: calculate $is_mobile from $request's user agent
   return $is_mobile;

Dependency injection makes this easier to test for three reasons.

First examine the static accessor for the HTTP request. Imagine testing it. You’d need to create machinery in the singleton to set an instance for testing. Alternatively you will need to mock out that call. But the following test is much simpler:

public static function testIsMobileRequest(): bool {
    $mobile_request = Testing_HttpRequest::newMobileRequest();
    $desktop_request = Testing_HttpRequest::newDesktopRequest();


Second, passing dependencies allows common utils to be written. There will be a one-time cost to implement newMobileRequest() and newDesktopRequest() if they don’t exist when you start writing your test. But other tests can use them once they exist. Writing utils pays off very quickly. Sometimes after only one or two usages.

Third, dependency injection will pay off for isMobileRequest() as the program grows. Imagine that it’s nested a few levels deep: used by a configuration object that’s used by a model util that’s called by a view. Now you’re calling your view renderer and you see that it takes an HTTP request. This has two benefits. It exposes that the behavior of the view is parameterized by the HTTP request. It also lets you say, “that’s insane! I need to restructure this” and figure out a cleaner structure. This is a tradeoff; you need to manage some parameter cruft to get these benefits. But in my long experience with this approach, managing these parameters aren’t a problem even when the list grows really long. And the benefits are worth it.

Inject the smallest thing needed by your code

We can make isMobileRequest even more maintainable. Look at testIsMobileRequest again. To write a proper test function, an entire HttpRequest needs to be created twice. Imagine that it gains extra dependencies over time. A MobileDetector and a DesktopDetector and a VirtualHeadsetDetector and a StreamProcessor. And because other tests inject their own, the constructors use dependency injection.

public static function testIsMobileRequest(): bool {
    $mobile_detector = new MobileDetector();
    $desktop_detector = new DesktopDetector();
    $vh_detector = new VirtualHeadsetDetector();
    $stream_processor = new StreamProcessor();

    $mobile_request = Testing_HttpRequest::newMobileRequest(
        $mobile_detector, $desktop_detector, $vh_detector, $stream_processor

    $desktop_request = Testing_HttpRequest::newDesktopRequest(
        $mobile_detector, $desktop_detector, $vh_detector, $stream_processor


It’s more code than before. That’s fine. This is what tests tend to look like when you have lots of dependency injection. But this test can be simpler. The implementation only needs the user agent in order to properly classify a request.

public static function isMobileRequest(string $user_agent): bool {
    // OMITTED: calculate $is_mobile from $user_agent
    return $is_mobile;

public static function testIsMobileRequest(): bool {
    $mobile_ua = Testing_HttpRequest::$mobile_useragent;
    $desktop_ua = Testing_HttpRequest::$desktop_useragent;


We’ve made the code simpler by only passing in the limited dependency. The test is also more maintainable. Now isMobileRequest and testIsMobileRequest won’t need to be changed whenever changes are made to HttpRequest.

You should be aggressive about this. You need to instantiate the transitive closure of all dependencies in order to test an object. Keeping the dependencies narrow makes it easier to instantiate objects for test. This makes testing easier overall.

Write tests for failure cases

In my experience, failure cases are often neglected in tests. There’s a major temptation to check in a test when it first succeeds. There are often more ways for code to fail than to succeed. Failures can be nearly impossible to replicate manually, so it’s important to automatically verify failure cases in tests.

Understanding the failure cases for your systems is a major step towards resilience. Failure tests execute logic that could be the difference between partial degradation and a full outage: what happens when things go wrong? What happens when the connection to the database is down? What happens when you can’t read a file from disk? The tests will verify that your system behaves as expected when there is a partial outage, or that your users get the proper error messages, or whatever behaviors you need to ensure that the single failure doesn’t turn into a full-scale outage.

This isn’t a magic wand. There will always be failures that you don’t think to test, and they will bring down your site inevitably. But you can minimize this risk by starting to add failure tests as you code.

Use real objects whenever possible

You often have several options for injecting dependencies into the implementation being tested. You could construct a real instance of the dependency. You could create an interface for the dependency and create a fake implementation. And you could mock out the dependency.

When possible, prefer to use a real instance of the object rather than fakes or mocks. This should be done when the following circumstances are true:

  • Constructing the real object is not a burden. This becomes more likely when dependency injecting the smallest thing needed by the code
  • The resulting test is still deterministic
  • State transitions in the real object can be detected completely via the object’s API or the return value of the function

The real object is preferable to the fake because the test will be a verification of the real interaction the dependency and the fake will have in production. You can verify the correct thing happened in a few different ways. Maybe you’re testing whether the return values change in response to the injected object. Or you can check that the function actually modifies the state of the dependency, like seeing that an in-memory key value store has been modified.

The real object is preferable to the mock because it doesn’t make assumptions about how the two objects interact. The exact API details of the interaction is not important compared to what it actually does to the dependency. Mocks often create fragile tests since they record everything that should be happening; what methods should be invoked, any parameters that are being passed, etc.

Even worse, the test author indicates what the return value from the object is. It may not be a sane return value for the parameters when the test is written. It may not remain true over time. It bakes extra assumptions into the test file that don’t need to be there. And imagine that you go through the trouble of mocking a single method 85 times, and you implement a major change to the real method’s behavior that may invalidate the mock returns. Now you will need to go examine each of the 85 cases and decide how each of them will change and additionally how each of the test cases will need to adapt. Or alternatively you will fix the two that fail and hope that the other 83 are still accurate just because they’re still passing. For my money, I’d rather just use the real object.

The key observation is that “how did something get changed?” matters way less than “what changed?” Your users don’t care which API puts a word into spellcheck. They just care that it persists between page reloads. A corollary is that if “how” matters quite a lot, then you should be using a mock or a spy or something similar.

Combining this with the structuring rules above creates a relatively simple rule: Reduce necessary dependencies whenever possible, and prefer the real objects to mocks when you need complex dependencies.

A careful reader will note that using real objects turns unit tests into deterministic integration tests. That’s fine. Improving the maintenance burden is more desirable than maintaining ideological purity. Plus you will be testing how your code actually runs in production. Note that this isn’t an argument against unit tests – all of the structuring techniques in this doc are designed to make it easier to write unit tests. This is just a tactical case where the best unit test turns out to be a deterministic integration test.

Another complaint I’ve heard to this approach is “but a single error in a common dependency could cause dozens of errors across all tests.” That’s actually good! You made dozens of integration errors and the test suite caught all of them. What a time to be alive. These are also easy to debug. You can choose from dozens of stack traces to help investigate what went wrong. In my experience, the fix is usually in the dependency’s file rather than needing to be fixed across tons of files.

Prefer fakes to mocks

A real object should not be used if you can’t verify what you need from its interface, or it’s frustrating to construct, or it is nondeterministic. At that point the techniques at your disposal are fake implementations and mock implementations. Prefer fake implementations over mock implementations when all else is equal. This reuses much of the same reasoning as the previous section.

Fake implementation of a viking ship

Fake viking ship implementation

Despite their name, a fake implementation is a trivial but real implementation of an interface. When your code interacts with the fake object, side effects and return values should follow the same contract as the real implementation. This is good. You are verifying that your code behaves correctly with a correct implementation of the interface. You can also add convenience setters or getters to your fake implementation that you might not ordinarily put on the interface.

Fakes also minimize the number of assumptions that a test makes about the implementation. You’re not specifying the exact calls that are going to be made, or the order that the same function returns different values, or the exact values of parameters. Instead you will be either checking that the return value of your function changes based on data in the fake, or you will be verifying that the state of the fake matches your expectations after test function execution.

Here’s an example implementation:

interface KeyValueStore {
    public function has(string $key): bool;
    public function get(string $key): string;
    public function set(string $key, string $value);

// Only used in production. Connects to a real Redis implementation.
// Includes error logging, StatsD, everything!
class RedisKeyValueStore implements KeyValueStore {}

class Testing_FakeKeyValueStore implements KeyValueStore {
    public function __construct() { $this->data = []; }

    public function has(string $key): bool {
        return array_key_exists($key, $this->data);

    public function get(string $key): string {
        if (!$this->has($key)) {
            throw new Exception("No key $key");
        return $this->data[$key];

    public function set(string $key, string $value) {
        $this->data[$key] = $value;

And here is a sample test that uses it:

// Implementation.

public function needsToBeCached(string $data, KeyValueStore $store): string {
    if ($store->has($data)) {
        return $store->get($data);

    // OMITTED: Something that calculates $result_of_operation.

    $store->set($data, $result_of_operation);
    return $result_of_expensive_operation;

// Tests.

public function testNeedsToBeCached_emptyCache(): {
    $expected = 'testtesttest';
    $store = new Testing_KeyValueStore();
    $actual = needsToBeCached('test', $store);

    // Test both the return value and the state
    // transition for the cache.

public function testNeedsToBeCached_warmCache(): {
    $stored = '867-5309';
    $store = new Testing_KeyValueStore();
    $store->set('test', $stored);
    $actual = needsToBeCached('test', $store);

    // Verifies that the cache is used by ensuring that
    // it returns the value from the cache, and not the
    // calculated value.

Another benefit is that you now have a reusable test implementation of KeyValueStore that you can easily use anywhere. As you tweak the implementation of needsToBeCached() over time you will only need to change the tests when the side effects and return value changes. You will not need to update tests to keep the mocks up-to-date with the exact logic that is used in the implementation.

There are many cases where this is a bad fit, and anything that sounds like a bad idea is probably a bad idea. Don’t fake a SQL database. If your code has an I/O boundary like network requests, you will basically have no choice but to mock that. You can always abstract it behind other layers, but at some point you will need to write a test for that final layer.

Prefer writing a simple test with mocks to faking a ton of things or writing a massive integration test

I spend lots of time encouraging test authors to avoid mocks as a default testing strategy. I acknowledge that mocks exist for a reason. To borrow the XML adage, an automatic mocking framework is like violence: if it doesn’t solve your problem you’re not using enough of it. A determined tester can mock as many things as possible to isolate an effect in any code. My ideal testing strategy is more tactical and requires discipline. Imagine that you’re adding the first test for an ancient monolithic controller. You have roughly three options to write the test: prep a database to run against a fake request you construct, spending a ton of time refactoring dependencies, or mocking a couple of methods. You should probably do the latter one out of pragmatism. Just writing a test at all will make the file more testable, since now the infrastructure exists.

You can slowly make improvements as you continue to make edits. You can also slowly improve the code’s organization as you go. This will start to enable you to use techniques that lead to less fragile tests.

Always weigh the cost and benefit of the approaches you take. I’ve outlined several techniques above that I think lead to better tests. Unfortunately they may not be immediately usable on your project yet. It takes time to reshape a codebase. As you use them you will discover what works best for your own projects, and you should slowly improve them as you go.

System tests pay for themselves, but it’s hard to predict which ones are worth writing

At Google, my team had a long stretch where we wrote a system test for every regression. We were optimistic that they would become easier to write over time. Eventually the burden could not be ignored: they were flaky and we never ended up in our dream tooling state. So we phased out this strategy. But one day I was discussing an “incremental find” system test with a few teammates. We figured out that this single test saved us from regressing production an average of 4 times per person. Our bugs surfaced on our dev machines instead of later in our deployment process. This saved each of us lots of expensive debugging from user reports or monitoring graphs.

We couldn’t think of another system test that was nearly that valuable. It followed a Pareto distribution: most bugs were caught by a few tests. Many tests caught only a bug or two. Many other tests had similar characteristics (user-visible, simple functionality backed by lots of complex code, easy to make false assumptions about the spec), but only this one saved full eng-months.

So system tests aren’t magic and all of my experience with this suggests that we should only use them tactically. The critical paths for the customer flow are a good first-order metric to target what system tests we should write. Consider adding new system tests as the definition of your critical path changes.

What’s next?

Write tests for your code! Tests are the best forcing function for properly structuring your code. Properly structuring your implementation code will make testing easier for everyone. As you come up with good generic techniques, share them with people on your team. When you see utilities that others will find useful.

Even though this guide is well north of 3000 words, it still only scratches the surface of the subject of structuring code and tests. Check out Refactoring” by Martin Fowler if you’d like to read more on the subject of how to write code to be more testable.

I don’t recommend following me on Twitter unless you want to read a software engineer complain about how cold it is outside.


Thanks to everyone at Etsy who provided feedback on drafts of this, whether you agreed with everything or not!



[0] I’ve seen this joke before but I can’t figure out where. Please send me pointers to the source material!

2017 year in review: layoffs, success and burnout in tech leadership, and trolls

I try to grow and learn every year. But I have done a bad job of capturing these lessons for others to consume. I want to break out of this cycle. Accordingly, I’m experimenting with a “year in review” format.

Most of my learnings before 2017 were focused on technology. Fresh out of college, my first side projects involved new programming languages and technologies. As my career progressed, I learned about working on huge software projects and breaking down enormous problems to be tractable. But 2017 was different. It was the first time where the primary lessons involved the interface between technology and people.

Why did this happen? A combination of professional circumstances and the particular side project I chose. This was the first year where I was regularly a tech lead. This came with successes and failures. It gave me new perspective on how to improve a team’s effectiveness. It also made me relearn how I interact with my coworkers.

Etsy itself had a crazy year. We had multiple rounds of layoffs. Even our CEO and CTO were replaced. This is not the first time this has happened at a place I worked, but these cuts were the most drastic I’ve ever experienced. Watching friends and colleagues leave the company was depressing. On the other hand, it’s fascinating to compare Etsy today with the way it was a year ago.

I also implemented a small Discord bot for my friends to use. It led to trolling, and it had real social consequences amongst my college friends. I learned about how simple choices in software can enable trolling, even when the social cost seems to outweigh the lulz.

Successes and failures as a tech lead

I have led projects in the past. But I’ve never been responsible for the technical output of teams for 9 straight months.

I’m thankful to report that I was mostly successful. I summarized some lessons in a blog post if you’d like to know the gritty details. A draft of the blog post got traction within Etsy. Unexpectedly this made me a go-to on the subject just by writing about it. People still ask me for 1:1s to pick my brain about tech leadership. This made me want to write more, even if I have to accept that most of the stuff I write will mostly be unread.

I also rediscovered that making public assertions means that I end up in public disagreements. I think I handled this well in 2017. My work style is becoming more egoless as I get older. I’m interested in finding right answers instead of being right. By having conversations with the people, I learned a lot about the pressures that individuals and teams faced. I also discovered something interesting: their direct assertions are often unproductive on their face. But incorporating the author’s real situation into my mental model actually makes my view more nuanced. There’s probably something deeper here about persuasion, but I haven’t pieced it together.

As a tech lead, I also discovered that I needed to change my attitude on a day-to-day basis. I’ve been told that my demeanor when I get really serious is intimidating. I also have a bad tendency left over from Google to make strong assertions to turn conversations into negotiations. This comes off badly at a place like Etsy which is much more collaborative by default. I worry about it a lot. But honestly it hasn’t been a problem in my work as an individual contributor. I strive to be positive and friendly when interacting with coworkers. I ended up splitting time between these parts of my personality as an individual contributor: I got dour when doing individual work. I shed that attitude when doing person-to-person work and focused on being friendly and helpful.

I needed to make changes as a tech lead. Suddenly I was interruptable. Spending half of my day in “serious mode” doesn’t work when I can’t schedule my conversations. Especially since the role of a tech lead involves moving the team forward. It falls apart if people are hesitant to approach me. I made real structural changes to my life. Now I do personal projects from a coffee shop for 1-2 hours before work. This gives me a chance to perk up. I’m also more sensitive to eliminating stressors. I think that making these changes has had positive consequences through the rest of my life.

I also experienced some failures. I view these as a sign that I’m pushing myself. I found a limit on how abstracted I can be from engineering. I like first-order engineering work like coding and designing projects. I also like the parts of tech leadership that enable other people to work: designing and parallelizing the work of projects, unblocking engineers, answering questions, working together on designs. I’ll call this second-order engineering work, as it’s one level removed from doing the work myself.

I learned that I couldn’t be more abstracted than this. This year I had an extended span where my primary job was helping reshape how a team was organized, and how it interfaced with the rest of the company. This was on top of responsibilities for the technical output of the team. I lost the ability to see the connection between my effort and the output. It started burning me out and I stopped wanting to come to work. I brought it to my manager’s attention when I realized this, and she helped me find a new project. It’s good to see that I’ve applied lessons from previous burnouts so they don’t become problems when they happen.

I’ll have to switch over to management if I want to eventually accomplish my 20 year career goals. But I learned that this is not the time yet.


Etsy had a crazy year. The board replaced our CEO and CTO. There were two rounds of layoffs. This affected almost a quarter of our workforce. After I found a list of people affected, I realized I was lucky to still have a job. It wasn’t like they lopped off the bottom quintile of performers. Good people were in the wrong place at the wrong time.

Some friends were booted from the company. Others voluntarily left afterwards because Etsy wasn’t what it used to be. It’s sad that it came to this. Honestly, there’s not much to learn from my experience with this. It would mostly be reflections on using gallows humor as a coping mechanism.

Our ex-CEO’s experience was much more instructive. Chad Dickerson was Etsy’s CEO until he announced his own firing and the first round of layoffs at an emergency all-hands. That could have been the end of the story. He was still technically employed for a month to ease the transition. But he could have just vanished from the spotlight. Nobody would have blamed him. Yet Chad faced his final curtain with a bow.

It’s an Etsy tradition for employees to give a “last lecture.” You never know what you’re going to get. Some people have bones to pick. Others treat it as a way to reflect on their entire lives. One was even presented as a fake roguelike. Chad’s last lecture was a story of his Etsy career. You saw his passion for the music he listened to, his values, and stories from his upbringing. It was the story of the people that he worked with and the struggles they faced. It was an overwhelmingly positive presentation. Even being ripped away from the CEO slot didn’t change the fact that Chad was pure class.

And then the new regime settled in. It was fascinating to see the difference in approaches. The old guard valued portfolio diversification. They tried to extend Etsy to be an organization that always set out on new journeys and made new bets. The new leadership has the complete opposite focus. They don’t just track core business metrics. They nerd out over them. We aggressively A/B test. We pare experiments and features to their essence to get real feedback quicker. This gives us confidence to throw away bad ideas. Despite the layoffs, there is a lot to like about working in the new model. We always know where we are going and we know what we will do when we get there. It will be interesting to see how Etsy handles a 2018 world where we need to mix short- and long-term goals. But things are looking up.

My friends trolled the shit out of each other with my side project

Etsy has an anachronistically-named Slack bot called “irccat”. It has dozens of built-in lookup functions: stock prices, server health, etc. In my estimation, its most common use is to store gif and meme URLs.

Earlier this year, my friends and I realized that we didn’t have the cognitive capacity to hold Google’s chat application matrix in our heads long enough to decide which was best for our group. Two of our friends were obsessed with Discord. We pulled the trigger and switched. Discord is a simple consumer version of Slack. It’s marketed to gamers, but it would work for most groups of people.

With legal’s blessing, I implemented a version of irccat for my friends to use on Discord. I wrote the first version in a few hours. It was painfully simple. All you could do was teach it commands. It also had the ability to build parameterized commands like LMGTFY and Urban Dictionary URLs. I talked to the server moderator about adding the bot. He agreed that we could give it a shot. So I invited it and taught it a few things to show people how to use it.

And use it they did! Since launch it has been taught over 1100 commands. People notice when it’s down, which is the best complement a side project of mine has ever received.

Unfortunately, we also took advantage of the super simple functionality to troll the shit out of each other. Commands couldn’t even be deleted in my first version. This started a land grab to learn insulting things to each other’s names and gaming handles. Now I appreciate the enormous social differences between “I use this software at my employer” and “I use this software within my group of friends.” My college friends have an assumption that we’ll forgive each other. We’ve known each other for more than a decade. But at work one false move means that you are looking for a new job. This seems to change the math quite a bit. I’m afraid to imagine what would have happened if a group of strangers used it.

So I implemented command deletion. Suddenly, commands started being swapped out with troll responses in private messages with the bots. It took a few iterations to find a local implementation minima where trolling wasn’t a major problem.

It’s easy to say “You should have seen this coming!”. But a takeaway is that it’s much easier to list the potential problems than it is to guess the exact social issues something will have in practice. Now I appreciate why rapid iteration is baked into a lot of industry contexts: if you can’t see these things coming, being reactive is your best option. A corollary is that I need to budget time for reactive engineering when implementing social software.

It’s not perfect. Right now, we’re having a conversation about whether downvoting is too mean when it is done publicly. Spamming is also a minor problem when features are a novelty. There was a 1 hour spamfest when Ryan implemented karma. This settled down when I announced that if I was provoked, the next feature I would implement is “permaban from ever using the bot again.” Apparently A Group Is Its Own Worst Enemy still holds true today. I thought I would be resistant to needing strong moderation tools since the bot was designed for my friends. But it turns out that we’re still just a social group, and our interactions with software still follow the same patterns as these communities with strangers.

Some of my friends also contributed features to the bot! This was tricky. I wanted to include their work while keeping the codebase at a high standard. One of the contributors has children. I was afraid that work could be abandoned if he went too far in the wrong direction. I also wanted to avoid any hard feelings. I tried to handle this by restructuring the project to be more understandable and usable. I filed tickets that outlined specs that I would like to implement. And I tried to be accommodating when they were testing their changes. No changes were abandoned, so I think this was a success overall.

Looking forward

I constantly re-evaluate goals. This means that I have a low probability of achieving the goals that I have for 2018. But that’s okay, because it’s likely that whatever else I’d like to do will be better.

I started studying Machine Learning as a side project. I’m not interested in developing novel algorithms. I’d like to become reasonably effective as a practitioner in the field, as measured by being happy about my performance in a Kaggle competition. This is daunting from where I am sitting: Coursera courses, multiple textbooks, new coding frameworks, etc. But it’s achievable now that I work on side projects in the morning. Dedicated study time raises my probability of success.

I’d like to run at least 2 marathons this year. I hurt my leg in October 2016 and it took 7 months before I recovered enough to run 2 miles. But now I’m starting to get back to my old distances. Finishing a March race is achievable at this point. I’d like to challenge my personal best in the fall. This is going to be tough; I’m 20-30 pounds over my normal racing weight range and I can’t drop weight as easily as I used to.

I’d also like to write more. I wrote 5 blog posts in 2017. I’d like to increase that to 12 in 2018. This post is one. I have a second queued up about refactoring code to make it more testable. That means that I need 10 more for the rest of the year. The only thing stopping me from doing it is myself.

Professionally, I’d like to get promoted to staff engineer. This may be harder in the new Etsy, since the high level engineers who would have vouched for me are gone. But that’s an incentive to continue to branch out within the company. Reducing technical debt is also a focus for the year; that feels like an opportunity that I haven’t quite solved yet.

I enjoyed being a mentor in 2017. I’d like to continue that trend and mentor at least 2 additional engineers in 2018.

The engineers that invested in me when I was a junior engineer

Lots of ink is being spilled over the following Tweet:

This doesn’t reflect the relationship that senior or junior engineers have. It also misunderstands the idea of investment.

A junior engineer is a learning position. In the beginning they need help finishing simple projects. Maybe they get starter bugs or simple tasks to build their confidence. They learn about how the system is designed and how to modify it. They learn when to ask questions and when to build consensus for a change. They make mistakes: they take down the site, and then their bugfix takes down more of the site. They learn about risk mitigation. They can isolate bugs much quicker. Soon, they are building prototypes and investigating new strategies. They learn about working on teams, and running small projects.

Over time they become a different engineer. They can finish any problem given enough time. They can break a project into chunks and finish it with others. In fact, they’ve come so far that they can usefully teach other engineers. This single leveled-up engineer has become a force of nature: one person who can not only complete any engineering project this company can throw at them, but can run teams and raise the level of everybody around them.

This doesn’t happen in a vacuum. Their more senior teammates helped them grow. Each of their tasks had to help the team. When the junior engineer started down the wrong path, the senior engineer saved them time by explaining alternatives. They made the junior engineer aware of their unknown unknowns. This takes lot of time. My dad once told me a rule of thumb about rock quarry employees: their first year costs twice their salary because they need to be trained. I see no reason to believe that junior software engineers are cheaper. I’m more likely to believe they are more expensive.

It took a village to turn me from a junior coder into a senior software engineer. I want to highlight a few people who invested that time in me, even though I was “slowing them down.”


When I first graduated from college, I wasn’t sure whether I wanted to go to grad school or work in industry. So I split the difference and worked at a computer vision research lab.

My first task was “talk to Charles.” Charles was one-of-a-kind: already over 60, he had been an astrophysics professor until he grew bored and made a late-career switch to computer vision. He was also one of the most effective engineers that I have ever worked with. His niche was a mastery of The Unix Way. He built large reliable batch processing systems out of auto-generated Makefiles, a few networked computers, and tiny single-purpose C++ programs. And he could do this and deliver the final system to the customer in less time than it would take most other teams of 3 engineers to solve it with modern technologies. And he could do it within the time and budget that he predicted.

When I walked into Charles’ office, he told me that I would be working on a project that involved coordinates, so he was giving me an intro to coordinate systems. What he said over the next 45 minutes was dense, and I learned about an incredible number of topics:

  • There were coordinate systems that aren’t latitude+longitude
  • We would be using one called UTM, which splits the world into unique map projections
  • There are coordinate systems like MGRS that describe areas instead of single points
  • It’s possible to find points that could be described multiple ways in some coordinate systems
  • There are separate coordinate systems that better describe locations and areas at the poles
  • Many things we were doing used “reference geoids,” which are surveyed descriptions of the earth with a guaranteed amount of accuracy.
  • Gravity is non-uniform across the earth and doesn’t necessarily always point to the same center based on local geology

And a ton of stuff that I don’t remember. I was desperately trying to hold on to every fact as they whizzed by me.

He was always willing to spend time to teach me. One of my first difficulties was understanding that doing more work could make a system more efficient. He sat me down and diagrammed how we weren’t making things worse, but rather improving the system holistically. He also gave me projects that encouraged me to grow. I understood applied Linear Algebra much better after working with Charles. Sometimes, his lessons were unusual. Once I told him I was having trouble modifying an ancient 1000 line Matlab script to process a new format of digital elevation maps. He disappeared for 45 minutes and came back with a 60 line AWK program. The message was clear: don’t spend time on trash when you can fix it.

He was the first person to take me seriously as a professional software engineer, and I often apply lessons about tradeoffs and scoping down work in my day-to-day. Plus all that awesome coordinate stuff! Gravity doesn’t always point to the exact same place?? How cool is that!


I also worked on mobile robotics at the lab. Here I fell into Oleg’s domain. Where Charles had been gregarious, Oleg was reserved. His style was more judgmental than any manager tutorial would recommend, but he always backed up his judgements with patient explanations of how code and systems should be organized.

When I first designed changes to the system, Oleg would have me whiteboard out what I was planning. He would stare at it for a while and eventually mutter “this is not good.” Then he would show me my errors. My code would work but it would be fragile. Maybe my design for task threading would be hard to modify. He showed me how to better isolate concurrency work so that it would be more resilient by design. He showed me how to isolate concerns of an application. He showed me that you could split a single process into multiple processes that communicate via IPC, so that a crash in one component of your system didn’t take down the whole thing.

Ultimately I learned a lot about robustness. Just because your code works doesn’t mean it will be easy to write, easy to modify, or be fault-tolerant once it is running. Taking a step back and examining holistic architecture approaches gives you a major leg up. So does reworking fragile pieces.

Over time, his responses to my design would soften. I started to get “I think we can improve this” and sometimes I got all the way to “I think you should consider X. But this is good.” His lessons had been taking hold and helping me grow, and he was seeing the benefits of his time investment.

Olga and Luiz

When I joined Google, I mostly worked under Olga and Luiz. Olga was the über tech lead and manager of Docs and the Docs apps, and Luiz was starting to transition from being the tech lead for desktop Docs to having more of a cross-product role. And they were shockingly productive. They could code circles around me while dealing with all the challenges that come with leading a growing team. In some ways, working under each of them was frustrating because they had high standards. Their standards were far higher than the already pedantic ones across the rest of Google. They were also big fans of drive-by reviews, and had no qualms asking you to majorly restructure your code for seemingly subtle reasons. But working under both of them at the same time brought me further in my development than any other single thing.

Olga was a human code linter. Actually, she was more reliable than the actual Javascript linters that we had. Accordingly, she instituted Javascript standards for our project that were much more rigorous than the already-high bar set by the Javascript style guide. But she would also patiently explain her reasoning to people who asked. For instance, she had an exact style that she wanted multiple boolean conditionals checked, and was willing to sit down and explain why. And that sounds like a massive waste of her time, given the scope of her work compared to mine. But she understood that knowledge transfer was important, and was willing to take that time even for trivial things. Under her, I was also put on a trajectory from working on small projects, to working on larger projects, to working on a critical project, to hosting an intern, to leading small teams, to working on a project that was cross-cutting and took eng-years of time.

I worked with Luiz for years, and it was a masterclass in learning how senior engineers mentor junior engineers. He had an egoless way of helping engineers design features. When someone asked him a question, he made sure he understood their spec. And then he dove into the code to see how everything was organized. He sketched out the current system and start soliciting ideas. He’d write down the first thing that came to mind. When people gave suggestions, he’d write them down. He’d explain when things wouldn’t work. When there were no more ideas, he’d start talking through consequences of the changes. By the end of the process they had a solution they collaborated on. He took everybody seriously, and always listened to their thoughts and opinions. I learned an incredible amount by being on the business end of this process and getting a chance to watch him do this every day. Now that I’m at Etsy, I strive to help engineers that ask me questions in the same way that I watched Luiz do successfully for so many years.

Under the two of them, I started as an engineer that could code anything given enough time. I ended as somebody with a much stronger understanding of how to set standards for a team, how to work with junior engineers to level them up, and how to split up truly unbounded things into smaller projects that I could work on with other people.


Tons of engineers have shaped how I work. But I can point to a few engineers that invested real time in my development. You could say they made a horrible decision if you look at the localized time tradeoff. An hour of their time was worth much more than an hour of mine. But they invested in me, and now I am able to invest in other people because of the lessons that I have learned from them. Their investments not only paid off for the company because I became more effective, but I was also able to impart this same wisdom into other people.

What does a tech lead do?

This was written internally at Etsy. I was encouraged to post on my own personal blog so people could share. These are my opinions, and not “Etsy official” in any way.

Motivation for writing this

For the past 5 months, I have been the tech lead on the Search Experience team at Etsy. Our engineering manager had a good philosophy for splitting work between managers and tech leads. The engineering manager is responsible for getting a project to the point where an engineer starts working on it. The tech lead makes sure everything happens after that. Accordingly, this is intended to document the mindset that helps drive “everything after that.”

Having a tech lead has helped our team work smoothly. We’ve generated sizable Gross Merchandise Sales (GMS) wins. We release our projects on a predictable schedule, with little drama. I’ve seen this structure succeed in the past, both at Etsy and at previous companies.

You can learn how to be a tech lead. You can be good at it. Somebody should do it. It might as well be you. This advice sounds a little strange since:

  • It’s a role at many companies, but not always an official title
  • Not every team has them
  • The work is hard, and can be unrecognized
  • You don’t need to be considered a tech lead to do anything this document recommends

But teams run more efficiently and spread knowledge more quickly when there is a single person setting the technical direction of a team.

Who is this meant for?

An engineer who is leading a project of 2-7 people, either officially or unofficially. This isn’t meant for larger teams, or leading a team of teams. In my experience, 8-10 people is an inflection point where communication overhead explodes. At this point, more time needs to be spent on process and organization.

What’s the mindset of a tech lead?

This is a series of principles that led to good results for Search Experience, or are necessary to do the job. I’m documenting what works well in my experience.

More responsibility → Less time writing code

When I was fresh out of college, I worked at a computer vision research lab. I thought the most important thing was to write lots of code. This worked well. My boss was happy, and I was slowly given more responsibility. But then the recession hit military subcontractors, and the company went under. Life comes at you fast!

So I joined BigCo, and started at the bottom of the totem pole again. I focused on writing a lot of code, and learned to do it on large teams. This worked well. I slowly gained responsibility, and was finally given the task of running a small project. Until this point, I had been successful by focusing on writing lots of code. So I was going to write lots of code, right?

Wrong. After 2 weeks, my manager pulled me aside, and said, “Nobody on your team has anything to do because you haven’t organized the backlog of tasks in three days. Why were you coding all morning? You need to make sure your team is running smoothly before you do anything else.”

Okay, point taken.

So I made daily calendar reminders to focus on doing this extra prep work for the team. When I did this work, we moved faster as a three person unit. But I could see on my code stats where I started focusing more on the team. There was a noticeable dip. And I felt guilty, even when I expected this! Commits and lines of code are very easy ways to measure productivity, but when you’re a tech lead, your first priority is the team’s holistic productivity. And you just need to fight the guilt. You’ll still experience it. You just need to recognize the feeling and work through it.

Help others first

It sounds nice to say that you should unblock your team before moving yourself forward, but what does this mean in practice?

First, if you have work, but someone needs your help, then you should help them first. As a senior engineer, your time is leveraged–spending 30 minutes of your time may save days of someone else’s. Those numbers sound skewed, but this is the same principle behind the idea that bugs get dramatically more expensive to fix the later they are discovered. It’s cheaper to do things than redo things. You get a chance to save your teammates from having to rediscover things that are already known, or spare them from writing something that’s already written. Some exploration is good. But there’s always a threshold, and you should encourage teammates to set deadlines based on the task. When they pass it, asking for help is the best move. This could also help with catching bugs that will become push problems or production problems before they are even written.

Same for code reviews. If you have technical work to do, but you have a code review waiting, you should do the code review first. Waiting on someone to review your code is brutal, especially if the reviewing round-trip is really long. If you sit on it, the engineer will context switch to a new task. It’s best to do reviews when their memory of the code is fresh. They’re going to have faster and better answers to your questions, and will be able to quickly tweak their pull request to be submission-ready.

It’s also important to encourage large changes to be split into multiple pull requests. When discussing projects up-front, make sure to recommend how to split it up. For instance, “The first one will add the API, the second one will send the data to the client, and the third one will use the data to render the new component.” This allows you to examine each change in detail, without needing to spend hours reviewing and re-reviewing an enormous pull request. If you believe a change is too risky to submit all at once because it’s so large that you can’t understand all of its consequences, it’s OK to request that it be split up. You should be confident that changes won’t take down the site.

Even with this attitude, you won’t review all pull requests quickly. It’s impossible. For instance, most of my team isn’t in my timezone. I get reviews outside of work hours, and I don’t hop on my computer to review them until I get into work at the crack of 10.

I personally view code reviews and questions as interruptible. If I have a code review from our team, I will stop what I am doing and review it. This is not for everybody, because it’s yet another interruption type, and honestly, it’s exhausting to be interrupted all day. Dealing with interruptions has gotten easier for me over time, but I’ve gotten feedback from several people that it hasn’t for them. You will never be good at it. I’m not. It’s impossible. You will just become better at managing your time out of pure necessity.

Much of your time will be spent helping junior engineers

A prototypical senior engineer is self-directed. You can throw them an unbounded problem, and they will organize. They have an instinct for when they need to build consensus. They break down technical work into chunks, and figure out what questions need to be answered. They will rarely surprise you in a negative way.

However, not everybody is a senior engineer. Your team will have a mix of junior and senior engineers. That’s good! Junior engineers are an investment, and every senior engineer is in that position because people invested in them. There’s no magical algorithm that dictates how to split time between engineers on your team. But I’ve noticed that the more junior a person is, the more time I spend with them.

There’s a corollary here. Make sure that new engineers are aware that they have this option. Make it clear that it is normal to contact you, and that there is no penalty for doing so. I remember being scared to ask senior engineers questions when I was a junior engineer, so I always try hard to be friendly when they ask their first few questions. Go over in-person if they are at the office, and make sure that their question has been fully answered. Check in on them if they disappear for a day or two. Draw a picture of what you’re talking about, and offer them the paper after you’re done talking.

The buck stops here

My manager once told me that leaders take responsibility for problems that don’t have a clear owner. In my experience, this means that you become responsible for lots of unsexy, and often thankless, work to move the team forward.

The question, “What are things that should be easy, but are hard?”, is a good heuristic for where to spend time. For instance, when Search Experience was a new team, rolling out new features was painful. We never tire-kicked features the same way, we didn’t know what groups we should test with, we’d (unpleasantly) surprise our data scientist, and sometimes we’d forget to enable stuff for employees when testing. So I wrote a document that explained, step-by-step, how we should guide features from conception to A/B testing to the decision to launch them or disable them. Then our data scientist added tons of information about when to involve her during this process. And now rolling out features is much easier, because we have a playbook for what to do.

This can be confusing with an engineering manager and/or product manager in the picture, since they should also be default-responsible for making sure things get done. But this isn’t as much of a problem as it sounds. Imagine a pop fly in baseball, where a ball falls between three people. It’s bad if everyone stands still and watches it hit the ground. It’s better if all of you run into each other trying to catch it (since the odds of catching it are better than nobody trying). It’s best if the three of you have a system for dealing with unexpected issues. Regular 1:1s and status updates are a great way to address this, especially in the beginning.

Being an ally

Read Toria Gibbs’ and Ian Malpass’ great post, “Being an Effective Ally to Women and Non-Binary People“, and take it to heart. You’re default-responsible for engineering on your team. And that means it’s up to you to make sure that all of your team members, including those from underrepresented groups, have an ally in you.

“What does being a tech lead have to do with being an ally?” is a fair question.

First, you are the point person within your team. You will be involved in most or all technical discussions, and you will be driving many of them. Make sure members of underrepresented groups have an opportunity to speak. If they haven’t gotten the chance yet, ask them questions like, “Are we missing any options?” or “You’ve done a lot of work on X, how do you think we should approach this?”. If you are reiterating someone’s point, always credit them: “I agree with Alice that X is the right way to go.”

You will also be the point person for external teams. Use that opportunity to amplify underrepresented groups by highlighting their work. If your time is taken up by tech leading, then other people are doing most of the coding on the team. When you give code pointers, mention who wrote it. If someone else has a stronger understanding of a part of the code, defer technical discussions to them, or include them in the conversation. Make sure the right names end up in visible places! For instance, Etsy’s A/B testing framework shows the name of the person who created the experiment. So I always encourage our engineers to make their own experiments, allowing the names to be visible to all of our resident A/B test snoopers (there are dozens of us). If someone contributes to a design, list them as co-authors. You never know how long a document will live.

Take advantage of the role for spreading knowledge

When a team has a tech lead, they end up acting as a central hub of activity. They’ll talk about designs and review code for each of the projects on the team.

If you read all the code your team sends out in pull requests, you will learn at an accelerated rate. You will quickly develop a deep understanding of your team’s codebase. You will see techniques that work. You can ask questions about things that are unclear. If you are also doing code reviews outside of your team, you will learn about new technologies, libraries, and techniques from other developers. This enables you to more effectively support your team with what you have learned from across the company.

In this small team, Alice is the tech lead, and Bob is working directly with Carol. All other projects are 1 person efforts. Alice is in a position where she can learn quickly from all engineers, and spread information through the team.

Since you are in this position, you are able to quickly define and spread best practices through the team. A good resource that offers some suggestions for code reviews is this presentation by former Etsy employee Amy Ciavolino. It is a good team-oriented style. Feel free to adapt parts to your own style. If you’ve worked with me, you’ll notice this sometimes differs from what I do. For instance, if I have “What do you think?” feedback, I prefer to have in-person/Slack/Vidyo conversations. This often ends in brainstorming, and creating a third approach that’s better than what either of us envisioned. But this presentation is a great start, and a strong guideline.

Day-to-day work

As I mentioned above, much of the work of a tech lead is interrupt-driven. This is good for the team, but it adds challenges to scheduling your own time. On a light day, I’ll spend maybe an hour doing tech lead work. But on a heavy day, I’ll get about an hour of time that’s not eaten up by interruptions.

Accordingly, it’s difficult to estimate what day you will finish something. I worked out a system with our engineering manager that worked well. I only took on projects that were either small, non-blocking, or didn’t have a deadline. This is going to work well with teams trying to have a minimal amount of process. This will be a major adjustment on teams that are hyper-organized with estimation.

You need to fight the guilt that comes with this. Your job isn’t to crank out the most code. Your job is to make the people on your team look good. If something important needs to be done, and you don’t have time to do it, you should delegate it. This will help the whole team move forward.

When I’m deciding what to do, I do things in roughly this priority:

Inner loop:

  1. Answer any Slack pings
  2. Help anybody who needs it in my team’s channel
  3. Do any pending code reviews
  4. Make sure everybody on the team has enough work for the rest of the week
  5. Do any process / organizational work
  6. Project work

Once a day:

  1. Check performance graphs. Investigate (or delegate) major regressions to things it looks like we might have affected.
  2. Check all A/B experiments. For new experiments, look for bucketing errors, performance problems (or unexpected gains, which are more likely to be bugs), etc.

Once a week:

  1. Look through the bug backlog, make sure a major bug isn’t slipping through the cracks.

What this means for engineering managers

Many teams don’t have tech leads, but every team needs tech leadership in order to effectively function. This is a call-to-action for engineering managers to examine the dynamics of their teams. Who on your team is performing this work? Are they being rewarded for it? In particular, look for members of underrepresented groups, who may be penalized for writing less code due to unconscious bias.

Imagine a team of engineers. The duties listed above are probably in one of these categories:

A designated tech lead handles the work. If your team falls into this category, then great! Make sure that the engineer or engineers performing these these duties are recognized.

Someone’s taking responsibility for it, on top of their existing work. This can be a blessing or a curse for engineers, based on how the engineering manager perceives leadership work. It’s possible that their work is appreciated. But it’s also possible that people are only witnessing their coding output drop, without recognizing the work to move the team forward. If you’re on a team where #2 is mostly true (tech lead is not formalized, and some engineer is taking responsibility for moving the team forward, at the expense of their own IC work), ask yourself this: are they being judged just for the work they do? Or are they being rewarded for all the transitive work that they enable?

A few people do them, but they often get neglected. Work still gets done in this category, but there are systematic blockers. If nobody owns code reviews, it will take a long time for code to be reviewed. If nobody owns code quality, your codebase will become a swiss cheese of undeleted, broken flags.

Nobody is taking responsibility for them. In this category, some things just won’t get done at all. For instance, if nobody is default-responsible for being an ally for underrepresented groups, then it’s likely that this will just be dropped on the floor. This kind of thing is fractal: if we drop the ball on the group level, we’ve dropped the ball on both the individual, and company-wide, levels.

In conclusion

There is value in having a designated tech lead for your team. They will create and promote best practices, be a point-person within your team, and remove engineering roadblocks. Also, this work is likely already being done by somebody, so it’s important to officially recognize people that are taking this responsibility.

There is also lots of value in officially taking on this role. It allows you to leverage your time to move the organization forward, and enables you to influence engineering throughout the entire team.

If you’re taking on this work, and you’re not officially a tech lead, you should talk with your manager about it. If you’d like to move towards becoming a tech lead, talk to your manager (or tech lead, if you have one!) about any responsibilities you can take on.

Thanks to Katie Sylor-Miller, Rachana Kumar, and Toria Gibbs for providing great feedback on drafts of this, and to everyone who proofread my writing.

My friends trolled each other with my Discord bot, and how we fixed it

My last post describes a Discord bot, named “crbot,” that I wrote for my college friends. Its name is short for “call-and-response bot.” It remembers simple commands that are taught by users. I patterned it after some basic functionality in a bot we have at Etsy. crbot has become a useful part of my friends’ channel. It’s been taught over 400 commands, and two of my friends have submitted patches.

But there were problems. We also used crbot to troll each other. Bug fixes were needed to curb bad behavior. The situation is better now, but I’ve had a nagging question: “should I have seen this coming, or could I have only fixed these problems by being reactive?” I couldn’t answer this without my friends’ perspective. So I asked them!

I was hoping for a small discussion. However, it blew up. With the bot as a staging ground, the subtext of the conversation was about our friendships. Many of us have known each other for a decade. Some, far longer. We’ve seen each other through major life events, like home purchases, children, and weddings. But I don’t remember any conversation where we’ve discussed, at length, how we perceive each other’s actions. And we only resorted to personal attacks once or twice. Go us!

To answer “Could I have seen this coming?,” we’re going to look at this in three parts:

  1. What happened? A story about how the bot was used and abused, and how it changed over time.
  2. What did my friends think? All of their insights from our discussion.
  3. Lessons learned. Could I have seen this coming?

I think it’s worth adding a disclaimer. These discussions have an implicit “within my group of friends” clause. The fixes work because my friends and I have real-life karma to burn when we mess with each other. Not because they’re some panacea.

What happened?

First bot version: ?learn, without ?unlearn

Channel #general

// Teach a bot a command.
Katie: ?learn brent
crbot: Learned about brent

// Later, it's used.
Chris: my meeting went well today
Jake: ?brent
Discord unfurler:
Brent giving a thumbs up

Sidebar: Where did the word “unfurler” come from? Is it named because it “un-f”s the url? Or because it’s actually sailing related? The people need to know.

At launch, my friends’ initial reaction was mixed. Some immediately got it, and taught it inside jokes. Others said, “I don’t understand what this does, or why we’d want this.” One cleverly pointed it to a URL of a precipitation map, only to discover that Discord’s unfurler wasn’t re-caching the image within a useful timespan. By now, everyone has latched onto the bot’s ability to quickly recall gifs. Good enough for me.

At launch, crbot could learn new commands. But it could not forget them. This introduced a land grab, where my friends claimed as many names as possible. Scarcity encouraged usage, which was good for the bot. However, my friends took this opportunity to ?learn ridiculous or insulting things for each other’s names.

It got a little nasty. One message insulted someone’s intelligence. Another referenced the fact that someone is adopted. I’m speaking generically because the phrasing was over-the-top. There wasn’t a legitimate defense against this. You could claim all the commands that you might find insulting, if you were somehow able to foresee every way that you could be offended. For instance, people tried squatting on placeholders on the ?learns of their actual names and online handles. However, the bot is case-sensitive, so they’d need to protect all capitalized variations of their names.

If you’ve read “A Group Is Its Own Worst Enemy” by Clay Shirky, none of this is a surprise. Technology amplified our group’s negative interactions, and there was no moderation to fight back. The joy from trolling on crbot outweighed any reputation hit they took to their actual friendships, especially if they learned the commands in private. However, most of the things that I found really abusive were done in broad daylight.

global ?unlearn

To combat this, Ryan submitted the first user-contributed patch, to add ?unlearn. Now, users could make the bot forget responses.

Channel #general

// Show that the bot has not forgotten about dre
Jake: ?dre
crbot: I know who Dr. Dre is!

// Delete the ?dre command
Jake: ?unlearn dre
crbot: Forgot about dre

// The bot has forgotten about dre
Jake: ?dre
// No response

This helped a little. There was no longer an incentive to be the first to ?learn something. Now, you needed to be the last to learn it. This incentivizes high-quality ?learns. Your crummy commands are going to be replaced by better ones.

There was an adjustment period, where we figured out acceptable use. For instance, I replaced ?popcorn with a better gif, which touched off an argument about ?unlearn etiquette. There was a long sidebar about the life choices of people who post shitty gifs, when better ones exist. We settled on some guidelines, like “the person to ?unlearn a command should be the person who ?learned it.” We don’t always follow these rules. But it’s a good start.

?unlearn introduced a second problem. ?unlearn could be executed in a private channel. This introduced an attack where popular commands could be replaced in a direct message with crbot.

Direct Message with crbot
Attacker: ?unlearn ping
crbot: Forgot about ping
Attacker: ?learn ping Fuck you, ping yourself.
crbot: Learned about ping

Later, in #general
Jake: hey! there's a new release of the bot
Jake: ?ping
crbot: Fuck you, ping yourself.
Jake: :(
Jake: ?unlearn ping
crbot: Forgot about ping
Jake: ?learn ping pong
crbot: Learned about ping
Jake: ?ping
crbot: pong

As a design decision for crbot, I don’t log anything. Basically, I don’t want to respond to “who did X” bookkeeping questions, and I don’t want to write a system for others to do this. I don’t know who ?learned what, and I don’t care. This anonymity created a problem where there is no accountability for private ?unlearns. To this day, I still don’t know who did these. Nobody ever stepped forward. I would claim that I didn’t take part, but that’s what everybody else says, too 🙂

Public-only ?unlearn

We had a group discussion about ?unlearn, where I proposed that ?learn and ?unlearn could only be executed in public channels. My idea was that a public record would force everybody to properly balance the social forces at work. Andrew and Bryce argued that only ?unlearn should be prevented from being executed in private. This would force our real-life karma to be tied to removing someone else’s work. But ?learn should be allowed in private channels, since Easter eggs are fun. Plus, the command list is so large, that a new command will never be found, without being used publicly by the person who created it.

So, Ryan tweaked his ?unlearn implementation so it could only be executed in public channels. Now that a month has gone by, it has elegantly balanced ?unlearn and ?learn within our group of friends. The social forces at work have prevented further abuses of the system.

Hobbit bomb

One of our friends is often called a hobbit. I don’t know the details. Something about his feet.

Anyways, this led to ?hobbitbomb, which pastes a url of his picture. 48 times in one message. So, typing ?hobbitbomb once causes the Discord unfurler to inline the same image 48 times. The effect is that it takes up a massive amount of vertical screen real estate; it takes a long time to scroll past the bomb. It was used 7 times across a month (I used it a few of those times), and then effectively abandoned.

My friends’ reactions fell into 2 camps.

  1. So what?
  2. This makes Discord unusable. Also, this isn’t funny.

At one point, somebody decided that they’d had enough, and they ?unlearned ?hobbitbomb. The original poster, not to be deterred, created ?hobbitbombreturns, ?hobbitbombforever, and ?hobbitbombandrobin, all of which were duplicates of the original. A good meme has a healthy immune system.

Then, there was a lengthy detente, where the capability still existed to ?hobbitbomb, but nobody was using it. Finally, the command was brought up as a major source of frustration during our lengthy conversation on trolling in our channel. My friends settled on a social outcome: they limited the bomb size to 4 images (since 4 hobbits forms a fellowship). It still exists, it still requires scrolling, but it’s not extreme.

What did my friends think?

?unlearn abuse

Multiple patches were needed to balance ?learn and ?unlearn. Despite that, some people in the channel didn’t think that ?learn abuse was noteworthy. Their viewpoint was interesting to me. This required actual code fixes, so it must have been the biggest problem we faced. Right?

But when looking at individual instances, the problems caused by ?unlearn were minor. “I’m kind of annoyed right now,” or “I need to claim my username, so that nobody learns something weird for it.” This happened in low volume over time. For me, it added up to being the worst abuse of crbot. For other people, this was just something mildly annoying to deal with over time.

Hobbit bomb, and my own blind spots

Before the discussion, I hadn’t given ?hobbitbomb a second thought. “So what?” was my official position, and I wasn’t alone in having it. Scrolling seemed like a minor problem. But other people were seriously impacted. One friend felt it was the only abuse in the channel, and had to be reminded of all the ?unlearn abuse.

Before the bot, we had 2 tiers of users: moderators, and an owner. But when I added the bot to the channel, I created another implicit power position as the bot’s maintainer. I can change the bot to prevent behaviors from happening again. I can reject my friends’ patches if I don’t like them. I can still do private ?unlearn myself, since I have access to the Redis database where the commands are stored. And I can just shut off the bot someday, for any reason I want. This cuts both ways – I’m held in check, because the bot can be banned.

Anyways, the most interesting part of our discussion was finding out that I had a blindspot in how I handled this situation. I never thought that ?hobbitbomb was a problem, so I didn’t even file a bug ticket. I had been treating social problems like technical bugs, but this one hadn’t risen to the level of reporting yet. I needed to disconnect myself from my own feelings, and implement fixes based on my users’ complaints. As Chris put it, “the issue is its potential and how different people react to its use.”

Otherwise, you end up like Twitter, which had a long and storied harassment problem that has reportedly cost the company potential buyers. In my experience with crbot, users have great suggestions for fixing problems, and I’ve seen great user suggestions for Twitter. For instance, “I only want to be contacted by people who have accounts that are verified with a phone number. And when I block an account, I never want to see another account associated with that phone number.”

Technical solution, or compromise?

Since technical and social problems are related, I offered to fix ?hobbitbomb technically; I’d limit crbot’s output to a small number of URLs per response. ?hobbitbomb might be the only command that has multiple URLs per response, so it would have little impact. One of my friends pointed out that part of its utility is how annoying it is. So this would have the dual-impact of reducing pain, and reducing utility.

My friends rejected this offer, and decided to work towards a compromise. This was interesting to me; the core problem is still latent in the project. I may still implement my fix. If the bot were exposed to the public, I’d have to implement it, given how the Discord unfurler works. But on the other hand, I can think of a dozen ways to troll somebody with the bot, and I haven’t even finished my second cup of coffee. Plus, the premise of this whole chatroom is that we are friends. We have the option to create house rules, which might not be available in public forums.

During the discussion, we reduced the ?hobbitbomb payload from 48 to 4 images. This is enough to clear a screen, but doesn’t force people to scroll through multiple pages of hobbits. I don’t think that everybody was happy with this, since the ?hobbitbomb payload still exists. But both camps accepted it, and the great ?hobbitbomb war of 2017 was finally put to bed.

Social forces of friendship

Most of the problems with the bot were fixed with fairly light technical solutions, or house rules. For instance, public-only ?unlearn was the last time we saw ?unlearn abused in any capacity, even though there are still plenty of ways to cause mischief. And we have a few house rules; for instance, “don’t ?unlearn commands you didn’t ?learn.”

As Chris pointed out, this implies that everybody in the group assigns some weight to the combination of “we care about each other” and “we care about how we are perceived by each other.” This adds a hefty balancing force to our channel. It also means that all of my fixes for this channel are basically exclusive to this channel. There’s no way that this bot could be added to a public Discord channel. It would turn into a spammy white supremacist in 3 seconds.

Could I have seen this coming?

Or put a better way, “If I had to implement just one anti-trolling solution, before any of my friends ever used the bot, what would I implement?”

I imagined tons of problems that never arose. Nobody mass-unlearned all messages. Nobody mass-replaced all the messages. Nobody did a dictionary attack to ?learn everything. Nobody tried spamming the ?list functionality to get it blocked from using Github Gists. Nobody managed to break the parser, or found a way to get the bot to /leave the channel (not that they didn’t try). I didn’t need to add the ability to undo everything that a specific user had done to the bot. Once my friends saw the utility of crbot, there was little risk in the bot being ruined.

I did foresee spam as a problem. But I would have guessed that it’d be somebody repeating the same message, over and over again, to make the chat unusable. I never expected ?hobbitbomb, one message that was so large that some of my friends thought it broke scrolling. I’m not even sure this is fixable; even if I limited the number of images in a response to one, I imagine that one skinny + tall image can be equally annoying. I’m at the mercy of the unfurler here. Also, my traditional image of spam is something that comes in mass volume, not something that has a massive volume.

So, back to ?learn without ?unlearn. I should have seen that one coming. My idea was that this created scarcity, so people would be encouraged to use the bot. I didn’t imagine that people would use the opportunity to ?learn things that were abusive. Plus, the functionality for ?learn and ?unlearn are quite similar, so I could have quickly gotten it out the door, even if I still wanted to launch the bot with just ?learn. Launching without ?unlearn was too aggressive. Even with social pressures at work, we needed to have the ability to undo.

When reviewing the ?unlearn patch, I never guessed that private ?unlearn would be abused like it was. Honestly, a lot of this surprised me. This wasn’t even the general public; these were all problems that were surfaced by people I’ve known for a decade. If I can’t predict what they’re going to do, then it feels like there’s no hope to figure this out ahead of time, even if you have mental models like “private vs. public,” or “what is the capacity of people to tolerate spam?”

So my key takeaways from this project are pretty simple.

  • Discuss bug fixes with impacted users. They have great opinions on your fixes, and will suggest better ideas than you had. Especially if the people are technical.
  • Treat all user complaints like technical bug reports. Not just the ones you agree with. That doesn’t mean that all reports are important. But they deserve to have estimates for severity, scope of impact, and difficulty of the fix.
  • Plan on devoting post-launch time to fixing social problems with technical fixes. Because you will, whether you plan on it or not.
  • Every action needs to be undone. The most basic of moderation tools. Not even limiting the bot to my own friends obviated this.
  • Balance public-only and public+private. Balance privacy and utility. When something involves your personal information, it should be default-private. When your actions interact with other users, it should be attributed to you.

Thanks to Andrew, Brad, Bryce, Chris, Drew, Eric, Katie, and Ryan for sharing their thoughts!

Writing a Discord bot, and techniques for writing effective small programs

Build with blocks, not by piling sand

My old college friends and I used a Google Hangout to keep in touch. Topics were a mix of “dear lazychat” software engineering questions, political discussion, and references to old jokes. Occasionally, out of disdain for Hangouts, we discussed switching chat programs. A few friends wanted to use “Discord,” and the rest of us ignored them. It was a good system.

But then one day, Google announced they were “sunsetting” (read: murdering) the old Hangouts application, in favor of two replacement applications. But Google’s messaging was odd. These Hangouts applications were targeted to Enterprises? And why two? We didn’t take a lot of time to figure this out, but the writing on the wall was clear: at some point, we would need to move our Hangout.

After the news dropped, my Discord-advocating friends set up a new server and invited us. We jumped ship within the hour.

It turns out that they were right, and we should have switched months ago. Discord is fun. It’s basically Slack for consumers. I mean, there are differences. I can’t add a partyparrot emoji, and that’s almost a dealbreaker[0]. But if you squint, it’s basically Slack, but marketed to gamers.

As we settled in to our new digs, I found I missed some social aspects of Etsy’s Slack culture. Etsy has bots that add functionality to Slack. One of my favorites is irccat. It’s designed to “cat” external information into your IRC channel Slack channel. It’s “everything but the kitchen sink” design; you can fetch server status, weather, stock feeds, a readout of the foodtrucks that are sitting in a nearby vacant lot. A whole bunch of things.

But one of my favorite features is simple text responses. For instance, it has been taught to bearshrug:

Me: ?bearshrug
irccat: ʅʕ•ᴥ•ʔʃ

Or remember URLs, which Slack can unfurl into a preview:

Me: hey team!
Me: ?morning

Lots of little routines build up around it. When a push train is going out to prod, the driver will sometimes ?choochoo. When I leave for the day, I ?micdrop or ?later. It makes Etsy a little more fun.

A week or two ago, I awoke from a nap with the thought, “I want irccat for Discord. I wonder if they have an API.” Yes, Discord has an API. Plus, there is a decent Golang library, Discordgo, which I ended up using.

And away I go!

Side project organization

So, yeah, that age old question, “How much effort should I put into my side project?”

The answer is always, “It’s your side project! You decide!”. And that’s unhelpful. Most of my side projects are throwaway programs, and I write them to throw away. The Discord bot is different; if my friends liked it, I might be tweaking it for years. Or if they hated it, I might throw away the work. So I decided to “grow it.” Write everything on a need-to-have basis.

I get good results when I grow programs, so I’m documenting my ideas around this, and how it sets me up for future success without spending a lot of time on it.

I want to be 100% clear that there’s nothing new here. Agile may call this “simple design.” Or maybe I’m practicing “Worse is Better” or YAGNI. I’ve read stuff written by language designers, Lisp programmers, and rocket scientists about growing their solutions. So here’s my continuation, after standing on all these shoulders.

Growing a newborn program

Most of my side projects programs don’t live for more than a day or two. Hell, some never leave a spreadsheet. Since I spend most of my time writing small programs, it makes sense to have rules in place for doing this effectively.

Writing code in blocks makes it easy to structure your programs

By this, I mean that my code looks roughly like this:

// A leading comment, that describes what a block should do.
something, err := anotherObject.getSomething();
if err != nil {
    // Handle error, or maybe return.
log.Printf("Acquired something: %d",

Start the block with a comment, and write the code for the comment. The comment is optional; feel free to omit it. There aren’t hard-and-fast rules here; many things are just obvious. But I often regret it when I skip them, as measured by the number that I add when refactoring.

Blocks are useful, because the comments give a nice pseudocode skeleton of what the program does. Then, decide whether each block is correct based on the comment. It’s an easy way to fractally reason about your program: Does the high level make sense? Do the details make sense?  Yay, the program works!

For instance, if you took the hello-world version of my chatbot, and turned them into crappy skeletal pseudocode, it would look like this:

    ConnectToDiscord() or die
    PingDiscord() or die
    AddAHandler(handler) or die
    WaitForever() or wait for a signal to kill me

    ReadMessage() or log and return
    IsMessage("?Help") or return

There’s a lot of hand-waving in this pseudocode. But you could implement a chatbot in any language that supported callbacks and had a callback-based Discord library, using this structure.

Divide your code into phases

In my first job out of school, I worked at a computer vision research lab. This was surprisingly similar to school. We had short-term prototype contracts, so code was often thrown away forever. It wasn’t until I got a job at Google later that I started working on codebases that I had to maintain for multiple years in a row.

At the research lab, I learned what “researchy code” was – complicated, multithreaded computer code emulating papers that are dense enough to prevent a layperson from implementing them, but omit enough that a practicing expert can’t implement them either. No modularization. No separation of concerns. Threads updating mutable state everywhere. Not a good place to be.

So, my boss had the insight that we should divide these things at the API level, and have uniform ways to access this information. Not groundbreaking stuff, but this cleverly managed a few problems. Basically, the underlying code could be as “researchy” as the researcher wanted. However, they were bound by the API. So once you modularized it, you could actually build stable programs with unstable components. And once you have a bunch of DLLs with well-defined inputs and outputs, you can string them together into data-processing pipelines very easily. One single policy turned our spaghetti code nightmare into the pasta aisle at the supermarket; the spaghetti’s all there, but it’s packaged up nicely.

I took this lesson forward. When writing small programs, I like to code the steps of the program into the skeleton of the application. For instance, my “real” handler looked like this, after stripping out all the crap:

command, err := parseCommand(...)
if err != nil {

switch command.Type {
case Type_Help:
case Type_Learn:
case Type_Custom:
case Type_List:

Dividing my work into a “parse” and  “send” phase limits the damage; I can’t write send() functions that touch implementation details of parse(), so I’m setting myself up for a future where I can refactor these into interfaces that make sense, and make testing easier.

Avoid optimizations

Fresh out of college, I over-optimized every program I wrote, and blindly followed trends that I read recently. I’d optimize for performance, or overuse design patterns, or abuse SOLID principles, or throw every feature in C++ at a problem. I’m guilty of all of these. Lock me up. Without much industry experience, I just didn’t understand how to tactically use languages, libraries, and design techniques.

So I’ve started making a big list of optimizations that I don’t pursue for throwaway personal programs.

  • Don’t make it a “good” program. It’s fine if it takes 9 minutes to run. It’s fine if it’s a 70 line bash script. Writing it in Chrome’s Javascript debugger is fine. Hell, you’d be shocked how much velocity you can have in Google Sheets.
  • Writing tests vs tracking test cases. Once you’ve written enough tests in your life, you can crank out tests for new projects. But if your project is literally throwaway, there’s a break-even point for hand-testing vs automated testing. Track your manual test cases in something like a Google Doc, and if you’re passing that break even point, you’ll have a list of test cases ready.
  • Make it straightforward, not elegant. My code is never elegant on the first try. I’m fine with that. Writing elegant code requires extra refactoring and extra time. And each new feature could require extra changes to resimplify.
  • Don’t overthink. Just write obvious code. You don’t need to look something up if you can guess it. For instance, variable names: my variable name for redis.Client is redisClient. I’m never going to forget that, and it’s never going to collide with anything. Good abbreviations require project-wide consistency, and for a 1000 line project, it’s hard to get away with a lot of abbreviated names.
  • Don’t make it pretty. For instance, my line length constraints are “not too much.” So if I look at something and say, “that’s a lot!” I keep it. But I refactor if I say, “That’s too much!”


Once I tested the code, and got the bot running, I invited it into our new Discord channel. Everyone reacted differently: some still don’t understand the bot, and others immediately started customizing it. Naturally, my coder friends tried to break it. One tried having it infinitely give itself commands; another fed it malformed commands to see if it would break. Two of my friends have filed bugs against me, and one is planning on adding a feature. My friends have actually adopted it as a member of the channel. I love the feeling of having my software used, even just by a few people.

There have also been some unexpected usages. Somebody tried to link to snowfall images that are updated on the remote server. Unfortunately, Discord’s unfurler caches them, so this approach didn’t work like we wanted it to. Bummer. My program almost came full circle; my call-and-response bot would have been used to cat information into the channel, just like its predecessor, irccat.

So yeah, my chatbot is alive, and now comes the task of turning it from a small weekend project into More Serious code. Which has already started! Click here to follow me on Twitter to get these updates.


Github project:

Version of code in the post:

[0] For people who do not know me well: I am serious. I cannot be more serious.

Review of “A Sense of Urgency” by John Kotter

Fresh out of college, I was a systems software engineer at a computer vision research lab. We had a simple business model: win tons of cheap, low-margin research grants, and throw them at the wall. A few would stick, and we would sell those (at a high margin) to whoever would pay. Namely, the US government, or other military subcontractors.

Me, wearing augmented reality gear. Your tax dollars at work.

My group focused on speculative mobile robotics projects. Per the business model, this means I was repeatedly thrown at a wall. Most of our projects were funded by the government. This wasn’t mandated by anybody. We did this by choice. It was easier, because (a) our leadership had an extensive network of military contracts, and (b) DARPA kept publishing “Request for Proposals” in areas where we had PhDs. We were in a rut, but it was a rut that was filled with money, so we weren’t complaining.

The rest of the company worked on different things, but had similar stories. And all ran on government money. Remember, our business model was based on having occasional breakout successes. Since all candidate projects ran on government money, that means that our breadwinners also ran on government money. All of our eggs were in one basket.

To be fair, this did worry my boss. Over time, he made small efforts to correct this. We did seedling projects with companies that lacked researchers, but could benefit from computer vision and automation. But there were always drawbacks. Civilian research projects pay worse, and expect a quick ROI. And the problems are often harder. If you buy a robotic lawn mower, it absolutely cannot run over all of the tulips that you planted. But it has to run next to the flower bed, because otherwise the grass would stick out. So the robot has to be near-perfect, every time. No excuses, like “it was driving into the sun, and the grass was wet, so the effectiveness of the sensor was compromised.” No, screw you, you ran over my tulips, give me my money back. In comparison, military projects have a simplicity to them. A robot that’s carrying your gear through a desert can get away with a lot, if it doesn’t run into you.

Anyways, 2009 rolled around, and DARPA’s budget was slashed. Projects were cut, delayed, and canceled across the board. The gloom was palpable at industry events, which used to have a pervasive fake political cheeriness I could never stomach. But 2009 was different. Featured speakers would shake their head, and stare blankly into the crowd. “I’ve never seen an environment like this, where they just tell you that your project is canceled. It’s never happened to me, in 21 years!”

By mid-2010, our company was hemorrhaging money, and this gloom spread to our monthly financial all-hands. To avoid going belly-up, we started the process of merging into our parent company, which was coasting off a high-profile technology sale. I left for greener pastures. Many other people followed me out the door.

Oh right, I’m reviewing a book. This could have been straight out of “A Sense of Urgency” by John Kotter. Many of the motivating stories have a similar format:

  1. A company enters a “business as usual” mode. Employees stop focusing on the interaction between the business and the external world, and start focusing on the internal world of the business.
  2. The external world changes, and the business doesn’t notice.
  3. Disaster!
  4. “If only we had acted more urgently about the external world changing!”

Kotter asserts, early and often, that the missing spice from these businesses is “urgency.” Unfortunately, urgency isn’t defined anywhere, at least not cohesively. The book gives you a sense of its definition, but I doubt that everyone ends up in the same place. This makes discussing the book frustrating. So when people tell me, “I think our group is acting more urgently since discussing this book,” I don’t know what they mean.

But again, the book gives you an inkling; urgency is a strong focus on the interaction layer with the real world. I.E. the only parts of your business that mean that you make money. The parts of your business that would need to change if the core assumptions shift. Or maybe this isn’t the definition. But that’s my best guess, and I’m going to use this.

Good parts

My favorite part about the book is that it provides a mental model for noticing complacency, and enacting positive organizational change. It’s not a great model; I tried to list out the different actors in the model, and the motivations and goals that each of them have, and wasn’t able to clearly do it. But the book provides lots of inflection points, which are also useful.

  • “Are we too complacent?”
  • “OK, we’re complacent. Now what?”
  • “Are we reacting to external events?”
  • “Are we focusing enough on the customer?”
  • “Is anyone trying to stop the positive change?”
  • “We’re successful now. How do we avoid taking our eye off the prize?”

These provide a framework for recognizing that you are too internally-focused, helps you identify some of the organizational players (like nay-sayers), tells you how to gain a coalition for implementing change, lists out a ton of red flags for recognizing that your efforts to fight complacency are stalling, and that’s just what I can remember in a few minutes.

The book has a ton of examples, and they are well-curated. Humans are story-driven creatures, so the book is easier to remember than many. For instance, the “fix the company” effort that failed because they outsourced to consultants. Or the one whose meetings were continually rescheduled. Or the story of the woman who effected organizational change by forming a coalition of middle managers who were friends with upper management. That’s the type of political tact that I lack, so it’s interesting to see examples where enacting this type of change is approachable.

I also liked the strong emphasis on the customer. I work on Wholesale at Etsy, so I am surrounded by these stories on both the buyer and seller side. I often take for granted that businesses know their users, to the level of knowing their desires and fears. As someone with access to these kinds of stories, they are extremely useful, because they inform many decisions you make, from product all the way down to engineering.

This book is clearly written for a VP/CEO level, and I am not a VP or a CEO. Really! However, organizations are fractal, so much of the advice is applicable to my daily work. The idea of urgency is helping my team frame discussions. We’re no longer focused on the question of “what is the 100% best way to build this?,” but rather, “what is the best way to balance short-term wins versus long-term investment?” And I think that subtle attitude shift is going to massively impact our success in 2017-2020.

Bad parts

“Build for the future” is a big theme in engineering. It’s the idea that provides order-of-magnitude improvements like Google File System+MapReduce, or Amazon Web Services. The idea that you can invest into your own business to provide these kinds of gains is nowhere to be found.

To be clear, the book never discourages it. “Our competitors are beating us using a new technology” is the type of external factor that Kotter wants you to notice. But the stories are focused on these short-term wins, when there’s really a whole universe of possible stories related to over- or under-investment. Here are some types of modern stories I wish were included:

  • “We focused on engineering and ignored our users.”
  • “We focused on short term wins for 2 years, and now we can’t change our code fast enough to compete with our new competitor”
  • “The performance of our site sucked. I convinced management to invest 18 eng months of time building a new caching layer, and the speed improvements to our site massively improved retention and conversion rates”

The examples in the book are very one-note, and maybe that’s because they are optimized for a business structure where investment produces proportional gains, instead of potentially having order-of-magnitude benefits.

There is also no mention of talking about risk or time. In my experience, there is often a balance between a small investment with a small win, and a large investment with a large win. Maybe that makes sense, given the premise of the book: your business is starting to fail, and you have not realized it yet, so you need to literally pick the one most important thing related to the existence of your business, and optimize that.

Also, did I mention that Kotter doesn’t cohesively define a sense of urgency? It’s the damn title of the book, and it’s an exercise for the reader. C’mon.


The book touches on a common meta-problem for businesses: not focusing on the interaction between your business and the real world. Especially when you’ve already met with some success. For instance, at the computer vision research lab, there was a year and a half between DARPA’s budget being slashed, and our full realization that we were screwed. We spent a lot of time continuing to focus on government projects for our little group, instead of trying to solve the problem, company-wide, that the old avenues of high-tech research funding were drying up.

I guess this means I’m recommending this book because I watched a company fail, from the inside, due to a lack of urgency. Yeah, that sounds right. I’m not sure that this book would have saved the company, but I can tell you that it couldn’t have hurt.

Colorblindness doesn’t really affect Counterstrike: Global Offensive

Source code for this post is on Github

The scientific community has received press lately about the consequences of only publishing positive results. Here is one such article. In the current metagame for academics, where they are only rewarded for positive results, researchers throw negative results in the trash. This introduces biases into scientific reporting. For instance, researchers sometimes add observations to borderline data to try to make them significant, even though this is likely to be statistically invalid.

I recently invalidated a theory of mine, and here are my results.

Conclusion: Colorblindness doesn’t really affect CS:GO

Or more accurately, protanopia doesn’t really affect Counterstrike: Global Offensive.

Here is a video of what a protanope sees, compared to that of a normal person. This is the video that was the most different. Many others were almost identical to their originals. Note that it starts in black-and-white, and fades to color.


Protanope version

If you’re also a protanope, you won’t know why these look different. A friend reacted like this: “[de_train is] crazy washed out. Everything looks yellow all the time.” Most of the other maps weren’t far off, and de_mirage looks almost identical to the original version.

How I produced these videos

I occasionally play Counterstrike: Global Offensive (CS:GO) with some friends. CS:GO is a 5v5 shooter, where two teams engage in short rounds until a team is dead, or an objective is met. It famously has bad visibility; it’s so bad that professional players use a graphics card setting called “digital vibrance” to dramatically increase the color contrast past what the game allows. This turns a bleak military shooter into a game as colorful as a children’s cartoon. This exposes players who would otherwise blend in with the bleak levels. On my Macbook Pro, I’m stuck with turning my brightness to 100% and hoping for the best.

I’m a protanope (red-green colorblind), and I always wondered whether colorblindness makes Counterstrike harder. In one sense, it must be worse. Right? By some estimates, my eyes can differentiate 12% of the color spectrum, compared to normal eyes. On the other hand, it doesn’t seem to matter. I usually die because I’m bad at the game, not because somebody blended in with a crate.

As a first step, I decided to quantify this. I downloaded a bunch of my game demos, and for each round, I recorded why I died in a spreadsheet. Sure enough, I’m bad at Counterstrike.

link to source data

I defined these as follows:

  • Aim duel: I engage in a gunfight and lose
  • bad peek: I left a hiding place and died
  • did not see: Killed by someone who was visible, but I clearly did not see
  • failed clutch: Died when I was the last person alive. These are often low-percentage situations. A minor bright spot: I never died while saving, and I did save a fair amount
  • no cover: I chose to not take cover, and was killed as a result
  • outplayed: Killed from the side or behind. Often because of a communication breakdown.
  • rushing: The strategy was a “contact play,” where we run until we meet the enemy, and I caught a stray bullet
  • too aggressive: Similar to rushing, but instead I’m being stupid
  • trade killed: I sacrificed myself to gain map advantage. I counted these if I was avenged within 5 seconds, without the avenger dying.

So, visibility isn’t an issue for my gameplay, even without digital vibrance. This could have a few explanations:

  1. Maybe other players at my awful matchmaking rank can’t take advantage of visibility.
  2. Maybe I’ve adapted to the poor visibility of the game.
  3. Maybe this is confirmation bias: we remember the times where a hidden person killed us, specifically because everyone talks about the terrible visibility of these maps

Writing code to show this

Last year, I had a weekend project called @JakeWouldSee, where I wrote a colorblind Twitterbot. You tweet it an image, it tweets back with the colorblind version.

The idea is simple: port the bot’s algorithm to work for videos. This will allow my friends to see what Counterstrike looks like to me, and I’m all about friendship.

I used to work at a computer vision research lab, so I decided to write it in C++, on top of OpenCV, Bazel, GoogleTest, and Boost. I haven’t written C++ in 6 years, so I kept it simple.

Sidebar: Image libraries, WHY?

Just as I was leaving the computer vision world in 2010, OpenCV’s new C++ bindings were getting buzz. With this project, I finally had an excuse to try them. In pure computer vision tradition, OpenCV’s “modern” C++ API is good enough. But then you access a 3-channel pixel, and you’re thrown into every C code base you ever hated working with.

Your image is decoded into a matrix of [rows][cols][channels], and you can only access BGR 3-tuples. Seriously. It’s fine that it uses BGR. The memory layout is well-documented. But this is needlessly confusing, and is the type of thing that, in my experience, constantly causes off-by-1 errors in research code.

This shows the danger of using my out-of-date knowledge. OpenCV was clearly written with different design goals than I have. I care about safety, and they likely care about speed and backwards portability. If there’s a new hotness in C++ image encoding/decoding and manipulation, I’d love to hear about it! Especially if these types of shenanigans are caught at compile time.

The results

I wrote a small program, video_colorblind. It ingests any video file that OpenCV can read, and spits out a video file showing what a protanopic person (red-green colorblind) would see. You can see some results above.

You can look at it on Github.

In the top left corner, the program inserts the RMSD between the original and the protanope version. It is calculated by the difference in the XYZ colorspace. Yes, this means the RMSD is wrong, since it is changed by writing the result on the image. But I’m not coding software for the space shuttle, I don’t need perfection. Besides, it’s fun to watch, assuming you like numbers as much as I do.

It turns out that I see most Counterstrike maps correctly, to a small margin of error. I showed the videos to some friends, and they weren’t that impressed. On one hand, it’d be fun to be able to slam Valve for not being friendly to red-green colorblind individuals. I could have gotten some serious Reddit karma for the headline, “Valve hates colorblind people AND I HAVE PROOF.” On the other hand, it’s good to know that my experience is about as frustrating as everyone else’s.

For comparison: this image has an RMSD of about .19.

A picture of sushi that Tyler made for me

A picture of sushi, with normal color vision

Sushi as seen by a protanope

Sushi, as seen by a protanope. Apparently, it is disgusting. RMSD = .19

Most of the CS:GO maps had RMSDs of .001 to .04.

I posted it to Reddit anyways, and got 25-ish upvotes and a few hundred video views. This surprised me… I expected nothing, given the subreddit’s penchant for rants, memes, and clips of famous streamers. This post was technical, and it wasn’t an interesting result. However, it still got some love. More than it would have gotten if I trashed it.

So there you have it! Colorblindness doesn’t affect Counterstrike, and I am bad at the game. Enjoy your week!

What domain names do YCombinator companies use?

I recently started a small software business selling flashcards. I’m working out the particulars, but it’s the first time I’ve done something like this; my previous two engagements were a military subcontractor (~200 employees), and Google (more than I can count). So this is an exciting time for me!

The first thing I discovered is that I need a name. The government wants to know it, banks want to know it, your friends want to know it, and your family wants to know it. Also, how can you namespace your code without one? If you’ve struggled to name a variable, a pet, or a human, you can get a ballpark estimate of the difficulty I had naming a business/product.

After spending a couple of days brainstorming, both solo and with some friends, I remembered that Paul Graham had written something about naming.

I reread his advice, and there was lots to feel sad about:

  • It’s important, since owning $ is a strong signal
  • If you screw up, you probably need to try again
  • You likely need to spend money for a .com

And a ray of sunshine:

  • The “good enough” bar is pretty low

“But wait,” I thought, “what the fuck does Paul Graham know? The only statistic he mentions about new YC companies is that many own their own name, but owning ☃.com and being called Unicode Snowman could meet that standard.”

With that in mind, I dug up a list of YCombinator summer 2015 companies, to see if they actually follow his advice, and how they do it. I could have gone back in time further, but I’m even newer than they are, so I didn’t see what historical data would give me besides a more compelling blog post. There are apparently 86 whose websites were still operating when I made this list, so that’s pretty good!

First, do companies use .com addresses at all?

The number of websites used per-TLD

Yes. It’s not even close! .com is still the preferred domain. OK, so I can’t do a business with Vincent Adultman at

Next, I wanted to categorize how people got their domains from their company name. I came up with four categories:

  • Straight .com: The company has $
  • Word added: Websites like are word-added domains
  • Different TLD: is a different-TLD domain
  • Abbreviation: There was only one, but I couldn’t wedge it into another category. They just abbreviated their name to get their domain.

What do we see?

The source of a domain name

OK, most companies got the straight .com. As Paul Graham notes, it’s close to ⅔ of them.

Finally, I wanted to know where the uniqueness of names came from, if anywhere. I split them into the following categories:

  • English words: Combining enough words together until you get a cheap domain. From the YC S16 batch, fits the bill.
  • English word: Something like I separated this from “english words” due to the unlikeliness of finding one that was affordable.
  • Gibberish: Something you’re not going to find in any dictionary, like
  • Foreign word: a word you won’t find in an English dictionary, but would in another.
  • Misspelling: From the current batch, fit into this category.
  • Word+number: 37signals would fall into this category.
  • Religious name: There are many gods across all mythologies, and some of them didn’t rape OR pillage. Try these.
  • Name: Naming your product after a human name of some kind; bonus points if it’s an unusual name, or an unusual spelling of a common name
  • Abbreviation: For when your domain name doesn’t really matter.

And we’re left with this:

A graph showing how each domain name ended up unique

So what conclusions can we draw?

  • Paul Graham’s advice is current, with respect to YC companies
  • .com domains are overwhelmingly popular
  • The most common domains combine English words. The days of startups named things like are over, and thank God.

After doing this, I made a list and started reaching out to owners. Interesting findings:

  • For how scammy they are, domain squatters are reliable. The reps were very responsive, and nobody tried to scam me. They tried to overcharge me, but that’s just business. I’m not going to recommend any, because I hate the business model, but I was pleasantly surprised at the experience of being overcharged.
  • Domains that were defunct (either they don’t resolve, just throw server errors, or haven’t been modified in 5+ years) never respond. By my count, I contacted 15 people with defunct domains, got 0 responses. The Internet is littered with the corpses of defunct domains.
  • The inspection period that everyone uses on is shorter than many DNS registrar’s “my site was hacked and DNS was changed, please revert!” window. Not sure who is preventing anarchy here, besides the reptuation of and the domain registrar. Everything worked out fine, and a cursory search didn’t reveal horror stories. Maybe it’s fine?

Where am I today? I got the domain I had to pay a domain squatter for it, which sucks. Thankfully, it wasn’t a lot of money, and a few rounds of negotation didn’t take much time. I put up a landing page (and wrote this blog post) so I can start to get some authority from Google, but it’s not ready for visitors yet. The copy doesn’t even mention flash cards yet! What a n00b. I’m going to start following the 50% rule for growth traction (from Gabriel Weinberg’s book “Traction.“) Accordingly, I will spend half of my time on marketing. will go through quite a few iterations in short notice.