A short guide to structuring code to write better tests

Why write this?

Well-written tests often have a positive return on investment. This makes sense; bugs become more expensive to fix the later in the development process they are discovered. This is backed by research. This also matches my experience at Etsy, my current employer. Detecting a bug in our development environment is cheaper than detecting it in staging, which is cheaper than detecting it in production, which is cheaper than trying to divine what a forums post means when it says “THEY BROKE SEARCH AGAIN WHY CAN’T THEY JUST FIX SEARCH??,” which is cheaper than debugging a vague alert about async jobs failing.

Over my career I’ve rediscovered what many know: there are good tests and bad tests. Good tests are mostly invisible except when they catch regressions. Bad tests fail frequently and their failures aren’t real regressions. More often they’re because the test logic makes assumptions about implementation logic and the two have drifted. These tests need endless tweaking to sync the implementation and test logic.

So here’s a guide to help you write better tests by improving how your code is structured. It’s presented as a set of guidelines. They were developed over a few years when I was at Google. My team noticed that we had good tests and bad tests, and we invested time in digging up characteristics of each. I feel like they are applicable outside the original domain, since I have successfully used these techniques since then.

Some may point out that this post isn’t a “short guide” by many definitions. But I think it’s better than saying “Read this 350 page book on testing. Now that I have pointed you to a resource I will not comment further on the issue.”

Please ask me questions!

Get HYPE for a testing discussion!

“Testing” is a broad topic, so I want to explain the domain I have in mind. I’m targeting a database-driven website or API. I’m not thinking about countless other environments like microcontrollers or hard realtime robotics or batch data processing pipelines or anything else. The techniques in this post can be applied broadly, and can be applicable outside of the web domain. But not all of them work for all situations. You’re in the best position to decide what works for you.

For discussion, I will introduce an imaginary PHP testing framework for evil scientists looking to make city-wide assertions: “Citizens of New York”, or cony[0]. It will be invoked as follows:

$x = 3;
cony\BEHOLD::that($x)->equals(3);
cony\BEHOLD::that($x)->isNotNull();

Terminology

Everyone has their own testing terminology. That means this blog post is hopeless. People are going to skip this section and and disagree with something that I didn’t say. This happened with my test readers even though the terminology section was already in place. But here goes!

Here are some definitions from Martin Fowler – Mocks Aren’t Stubs:

Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).

Mocks are […] objects pre-programmed with expectations which form a specification of the calls they are expected to receive.

Martin Fowler’s test object definitions

Here are a few more definitions that I will use:

Unit test: A test that verifies the return values, state transitions, and side effects of a single function or class. Assumed to be deterministic.

Integration test: A test that verifies the interaction between multiple components. May be fully deterministic or include non-deterministic elements. For instance, a test that executes a controller’s handler backed by a real database instance.

System test: A test that verifies a full system end-to-end without any knowledge of the code. Often contains nondeterministic elements like database connections and API requests. For instance, a Selenium test.

Real object: A function or class that you’d actually use in production.

Fragile test: A test whose assertion logic easily diverges from the implementation logic. Failures in fragile tests are often not due to regressions, but due to a logic divergence between the test and implementation.

A few more definitions I needed

This post mostly discusses using “real” vs “fake” vs “mocks.” When I say “fake” I will be interchanging a bunch of things that you can find defined in Martin Fowler’s article, like dummy, fake, stub, or a spy. This is because their implementations are often similar or identical despite being conceptually different. The differences matter in some contexts, but they don’t contribute much to this discussion.

Dependency injection is your best friend

Injecting a dependency means to pass it in where they are needed rather than statically accessing or constructing them in place.

For instance:

// No dependency injection.
public static function isMobileRequest(): bool {
   $request = HttpRequest::getInstance();
   // OMITTED: calculate $is_mobile from $request's user agent
   return $is_mobile;
}

// With dependency injection.
public static function isMobileRequest(HttpRequest $request): bool {
   // OMITTED: calculate $is_mobile from $request's user agent
   return $is_mobile;
}

Dependency injection makes this easier to test for three reasons.

First examine the static accessor for the HTTP request. Imagine testing it. You’d need to create machinery in the singleton to set an instance for testing. Alternatively you will need to mock out that call. But the following test is much simpler:

public static function testIsMobileRequest(): bool {
    $mobile_request = Testing_HttpRequest::newMobileRequest();
    $desktop_request = Testing_HttpRequest::newDesktopRequest();

    cony\BEHOLD::that(MyClass::isMobileRequest($mobile_request))->isTrue();
    cony\BEHOLD::that(MyClass::isMobileRequest($desktop_request))->isFalse();
}

Second, passing dependencies allows common utils to be written. There will be a one-time cost to implement newMobileRequest() and newDesktopRequest() if they don’t exist when you start writing your test. But other tests can use them once they exist. Writing utils pays off very quickly. Sometimes after only one or two usages.

Third, dependency injection will pay off for isMobileRequest() as the program grows. Imagine that it’s nested a few levels deep: used by a configuration object that’s used by a model util that’s called by a view. Now you’re calling your view renderer and you see that it takes an HTTP request. This has two benefits. It exposes that the behavior of the view is parameterized by the HTTP request. It also lets you say, “that’s insane! I need to restructure this” and figure out a cleaner structure. This is a tradeoff; you need to manage some parameter cruft to get these benefits. But in my long experience with this approach, managing these parameters aren’t a problem even when the list grows really long. And the benefits are worth it.

Inject the smallest thing needed by your code

We can make isMobileRequest even more maintainable. Look at testIsMobileRequest again. To write a proper test function, an entire HttpRequest needs to be created twice. Imagine that it gains extra dependencies over time. A MobileDetector and a DesktopDetector and a VirtualHeadsetDetector and a StreamProcessor. And because other tests inject their own, the constructors use dependency injection.

public static function testIsMobileRequest(): bool {
    $mobile_detector = new MobileDetector();
    $desktop_detector = new DesktopDetector();
    $vh_detector = new VirtualHeadsetDetector();
    $stream_processor = new StreamProcessor();

    $mobile_request = Testing_HttpRequest::newMobileRequest(
        $mobile_detector, $desktop_detector, $vh_detector, $stream_processor
    );

    $desktop_request = Testing_HttpRequest::newDesktopRequest(
        $mobile_detector, $desktop_detector, $vh_detector, $stream_processor
    );

    cony\BEHOLD::that(MyClass::isMobileRequest($mobile_request))->isTrue();
    cony\BEHOLD::that(MyClass::isMobileRequest($desktop_request))->isFalse();
}

It’s more code than before. That’s fine. This is what tests tend to look like when you have lots of dependency injection. But this test can be simpler. The implementation only needs the user agent in order to properly classify a request.

public static function isMobileRequest(string $user_agent): bool {
    // OMITTED: calculate $is_mobile from $user_agent
    return $is_mobile;
}

public static function testIsMobileRequest(): bool {
    $mobile_ua = Testing_HttpRequest::$mobile_useragent;
    $desktop_ua = Testing_HttpRequest::$desktop_useragent;

    cony\BEHOLD::that(MyClass::isMobileRequest($mobile_ua))->isTrue();
    cony\BEHOLD::that(MyClass::isMobileRequest($desktop_ua))->isFalse();
}

We’ve made the code simpler by only passing in the limited dependency. The test is also more maintainable. Now isMobileRequest and testIsMobileRequest won’t need to be changed whenever changes are made to HttpRequest.

You should be aggressive about this. You need to instantiate the transitive closure of all dependencies in order to test an object. Keeping the dependencies narrow makes it easier to instantiate objects for test. This makes testing easier overall.

Write tests for failure cases

In my experience, failure cases are often neglected in tests. There’s a major temptation to check in a test when it first succeeds. There are often more ways for code to fail than to succeed. Failures can be nearly impossible to replicate manually, so it’s important to automatically verify failure cases in tests.

Understanding the failure cases for your systems is a major step towards resilience. Failure tests execute logic that could be the difference between partial degradation and a full outage: what happens when things go wrong? What happens when the connection to the database is down? What happens when you can’t read a file from disk? The tests will verify that your system behaves as expected when there is a partial outage, or that your users get the proper error messages, or whatever behaviors you need to ensure that the single failure doesn’t turn into a full-scale outage.

This isn’t a magic wand. There will always be failures that you don’t think to test, and they will bring down your site inevitably. But you can minimize this risk by starting to add failure tests as you code.

Use real objects whenever possible

You often have several options for injecting dependencies into the implementation being tested. You could construct a real instance of the dependency. You could create an interface for the dependency and create a fake implementation. And you could mock out the dependency.

When possible, prefer to use a real instance of the object rather than fakes or mocks. This should be done when the following circumstances are true:

  • Constructing the real object is not a burden. This becomes more likely when dependency injecting the smallest thing needed by the code
  • The resulting test is still deterministic
  • State transitions in the real object can be detected completely via the object’s API or the return value of the function

The real object is preferable to the fake because the test will be a verification of the real interaction the dependency and the fake will have in production. You can verify the correct thing happened in a few different ways. Maybe you’re testing whether the return values change in response to the injected object. Or you can check that the function actually modifies the state of the dependency, like seeing that an in-memory key value store has been modified.

The real object is preferable to the mock because it doesn’t make assumptions about how the two objects interact. The exact API details of the interaction is not important compared to what it actually does to the dependency. Mocks often create fragile tests since they record everything that should be happening; what methods should be invoked, any parameters that are being passed, etc.

Even worse, the test author indicates what the return value from the object is. It may not be a sane return value for the parameters when the test is written. It may not remain true over time. It bakes extra assumptions into the test file that don’t need to be there. And imagine that you go through the trouble of mocking a single method 85 times, and you implement a major change to the real method’s behavior that may invalidate the mock returns. Now you will need to go examine each of the 85 cases and decide how each of them will change and additionally how each of the test cases will need to adapt. Or alternatively you will fix the two that fail and hope that the other 83 are still accurate just because they’re still passing. For my money, I’d rather just use the real object.

The key observation is that “how did something get changed?” matters way less than “what changed?” Your users don’t care which API puts a word into spellcheck. They just care that it persists between page reloads. A corollary is that if “how” matters quite a lot, then you should be using a mock or a spy or something similar.

Combining this with the structuring rules above creates a relatively simple rule: Reduce necessary dependencies whenever possible, and prefer the real objects to mocks when you need complex dependencies.

A careful reader will note that using real objects turns unit tests into deterministic integration tests. That’s fine. Improving the maintenance burden is more desirable than maintaining ideological purity. Plus you will be testing how your code actually runs in production. Note that this isn’t an argument against unit tests – all of the structuring techniques in this doc are designed to make it easier to write unit tests. This is just a tactical case where the best unit test turns out to be a deterministic integration test.

Another complaint I’ve heard to this approach is “but a single error in a common dependency could cause dozens of errors across all tests.” That’s actually good! You made dozens of integration errors and the test suite caught all of them. What a time to be alive. These are also easy to debug. You can choose from dozens of stack traces to help investigate what went wrong. In my experience, the fix is usually in the dependency’s file rather than needing to be fixed across tons of files.

Prefer fakes to mocks

A real object should not be used if you can’t verify what you need from its interface, or it’s frustrating to construct, or it is nondeterministic. At that point the techniques at your disposal are fake implementations and mock implementations. Prefer fake implementations over mock implementations when all else is equal. This reuses much of the same reasoning as the previous section.

Fake implementation of a viking ship

Fake viking ship implementation

Despite their name, a fake implementation is a trivial but real implementation of an interface. When your code interacts with the fake object, side effects and return values should follow the same contract as the real implementation. This is good. You are verifying that your code behaves correctly with a correct implementation of the interface. You can also add convenience setters or getters to your fake implementation that you might not ordinarily put on the interface.

Fakes also minimize the number of assumptions that a test makes about the implementation. You’re not specifying the exact calls that are going to be made, or the order that the same function returns different values, or the exact values of parameters. Instead you will be either checking that the return value of your function changes based on data in the fake, or you will be verifying that the state of the fake matches your expectations after test function execution.

Here’s an example implementation:

interface KeyValueStore {
    public function has(string $key): bool;
    public function get(string $key): string;
    public function set(string $key, string $value);
}

// Only used in production. Connects to a real Redis implementation.
// Includes error logging, StatsD, everything!
class RedisKeyValueStore implements KeyValueStore {}

class Testing_FakeKeyValueStore implements KeyValueStore {
    public function __construct() { $this->data = []; }

    public function has(string $key): bool {
        return array_key_exists($key, $this->data);
    }

    public function get(string $key): string {
        if (!$this->has($key)) {
            throw new Exception("No key $key");
        }
        return $this->data[$key];
    }

    public function set(string $key, string $value) {
        $this->data[$key] = $value;
    }
}

And here is a sample test that uses it:

// Implementation.

public function needsToBeCached(string $data, KeyValueStore $store): string {
    if ($store->has($data)) {
        return $store->get($data);
    }

    // OMITTED: Something that calculates $result_of_operation.

    $store->set($data, $result_of_operation);
    return $result_of_expensive_operation;
}

// Tests.

public function testNeedsToBeCached_emptyCache(): {
    $expected = 'testtesttest';
    $store = new Testing_KeyValueStore();
    $actual = needsToBeCached('test', $store);

    // Test both the return value and the state
    // transition for the cache.
    cony\BEHOLD::that($actual)->equals($expected);
    cony\BEHOLD::that($store->get('test'))->equals($expected);
}

public function testNeedsToBeCached_warmCache(): {
    $stored = '867-5309';
    $store = new Testing_KeyValueStore();
    $store->set('test', $stored);
    $actual = needsToBeCached('test', $store);

    // Verifies that the cache is used by ensuring that
    // it returns the value from the cache, and not the
    // calculated value.
    cony\BEHOLD::that($actual)->equals($stored);
    cony\BEHOLD::that($store->get('test'))->equals($stored);
}

Another benefit is that you now have a reusable test implementation of KeyValueStore that you can easily use anywhere. As you tweak the implementation of needsToBeCached() over time you will only need to change the tests when the side effects and return value changes. You will not need to update tests to keep the mocks up-to-date with the exact logic that is used in the implementation.

There are many cases where this is a bad fit, and anything that sounds like a bad idea is probably a bad idea. Don’t fake a SQL database. If your code has an I/O boundary like network requests, you will basically have no choice but to mock that. You can always abstract it behind other layers, but at some point you will need to write a test for that final layer.

Prefer writing a simple test with mocks to faking a ton of things or writing a massive integration test

I spend lots of time encouraging test authors to avoid mocks as a default testing strategy. I acknowledge that mocks exist for a reason. To borrow the XML adage, an automatic mocking framework is like violence: if it doesn’t solve your problem you’re not using enough of it. A determined tester can mock as many things as possible to isolate an effect in any code. My ideal testing strategy is more tactical and requires discipline. Imagine that you’re adding the first test for an ancient monolithic controller. You have roughly three options to write the test: prep a database to run against a fake request you construct, spending a ton of time refactoring dependencies, or mocking a couple of methods. You should probably do the latter one out of pragmatism. Just writing a test at all will make the file more testable, since now the infrastructure exists.

You can slowly make improvements as you continue to make edits. You can also slowly improve the code’s organization as you go. This will start to enable you to use techniques that lead to less fragile tests.

Always weigh the cost and benefit of the approaches you take. I’ve outlined several techniques above that I think lead to better tests. Unfortunately they may not be immediately usable on your project yet. It takes time to reshape a codebase. As you use them you will discover what works best for your own projects, and you should slowly improve them as you go.

System tests pay for themselves, but it’s hard to predict which ones are worth writing

At Google, my team had a long stretch where we wrote a system test for every regression. We were optimistic that they would become easier to write over time. Eventually the burden could not be ignored: they were flaky and we never ended up in our dream tooling state. So we phased out this strategy. But one day I was discussing an “incremental find” system test with a few teammates. We figured out that this single test saved us from regressing production an average of 4 times per person. Our bugs surfaced on our dev machines instead of later in our deployment process. This saved each of us lots of expensive debugging from user reports or monitoring graphs.

We couldn’t think of another system test that was nearly that valuable. It followed a Pareto distribution: most bugs were caught by a few tests. Many tests caught only a bug or two. Many other tests had similar characteristics (user-visible, simple functionality backed by lots of complex code, easy to make false assumptions about the spec), but only this one saved full eng-months.

So system tests aren’t magic and all of my experience with this suggests that we should only use them tactically. The critical paths for the customer flow are a good first-order metric to target what system tests we should write. Consider adding new system tests as the definition of your critical path changes.

What’s next?

Write tests for your code! Tests are the best forcing function for properly structuring your code. Properly structuring your implementation code will make testing easier for everyone. As you come up with good generic techniques, share them with people on your team. When you see utilities that others will find useful.

Even though this guide is well north of 3000 words, it still only scratches the surface of the subject of structuring code and tests. Check out Refactoring” by Martin Fowler if you’d like to read more on the subject of how to write code to be more testable.

I don’t recommend following me on Twitter unless you want to read a software engineer complain about how cold it is outside.

 

Thanks to everyone at Etsy who provided feedback on drafts of this, whether you agreed with everything or not!

 

Footnotes

[0] I’ve seen this joke before but I can’t figure out where. Please send me pointers to the source material!

2017 year in review: layoffs, success and burnout in tech leadership, and trolls

I try to grow and learn every year. But I have done a bad job of capturing these lessons for others to consume. I want to break out of this cycle. Accordingly, I’m experimenting with a “year in review” format.

Most of my learnings before 2017 were focused on technology. Fresh out of college, my first side projects involved new programming languages and technologies. As my career progressed, I learned about working on huge software projects and breaking down enormous problems to be tractable. But 2017 was different. It was the first time where the primary lessons involved the interface between technology and people.

Why did this happen? A combination of professional circumstances and the particular side project I chose. This was the first year where I was regularly a tech lead. This came with successes and failures. It gave me new perspective on how to improve a team’s effectiveness. It also made me relearn how I interact with my coworkers.

Etsy itself had a crazy year. We had multiple rounds of layoffs. Even our CEO and CTO were replaced. This is not the first time this has happened at a place I worked, but these cuts were the most drastic I’ve ever experienced. Watching friends and colleagues leave the company was depressing. On the other hand, it’s fascinating to compare Etsy today with the way it was a year ago.

I also implemented a small Discord bot for my friends to use. It led to trolling, and it had real social consequences amongst my college friends. I learned about how simple choices in software can enable trolling, even when the social cost seems to outweigh the lulz.

Successes and failures as a tech lead

I have led projects in the past. But I’ve never been responsible for the technical output of teams for 9 straight months.

I’m thankful to report that I was mostly successful. I summarized some lessons in a blog post if you’d like to know the gritty details. A draft of the blog post got traction within Etsy. Unexpectedly this made me a go-to on the subject just by writing about it. People still ask me for 1:1s to pick my brain about tech leadership. This made me want to write more, even if I have to accept that most of the stuff I write will mostly be unread.

I also rediscovered that making public assertions means that I end up in public disagreements. I think I handled this well in 2017. My work style is becoming more egoless as I get older. I’m interested in finding right answers instead of being right. By having conversations with the people, I learned a lot about the pressures that individuals and teams faced. I also discovered something interesting: their direct assertions are often unproductive on their face. But incorporating the author’s real situation into my mental model actually makes my view more nuanced. There’s probably something deeper here about persuasion, but I haven’t pieced it together.

As a tech lead, I also discovered that I needed to change my attitude on a day-to-day basis. I’ve been told that my demeanor when I get really serious is intimidating. I also have a bad tendency left over from Google to make strong assertions to turn conversations into negotiations. This comes off badly at a place like Etsy which is much more collaborative by default. I worry about it a lot. But honestly it hasn’t been a problem in my work as an individual contributor. I strive to be positive and friendly when interacting with coworkers. I ended up splitting time between these parts of my personality as an individual contributor: I got dour when doing individual work. I shed that attitude when doing person-to-person work and focused on being friendly and helpful.

I needed to make changes as a tech lead. Suddenly I was interruptable. Spending half of my day in “serious mode” doesn’t work when I can’t schedule my conversations. Especially since the role of a tech lead involves moving the team forward. It falls apart if people are hesitant to approach me. I made real structural changes to my life. Now I do personal projects from a coffee shop for 1-2 hours before work. This gives me a chance to perk up. I’m also more sensitive to eliminating stressors. I think that making these changes has had positive consequences through the rest of my life.

I also experienced some failures. I view these as a sign that I’m pushing myself. I found a limit on how abstracted I can be from engineering. I like first-order engineering work like coding and designing projects. I also like the parts of tech leadership that enable other people to work: designing and parallelizing the work of projects, unblocking engineers, answering questions, working together on designs. I’ll call this second-order engineering work, as it’s one level removed from doing the work myself.

I learned that I couldn’t be more abstracted than this. This year I had an extended span where my primary job was helping reshape how a team was organized, and how it interfaced with the rest of the company. This was on top of responsibilities for the technical output of the team. I lost the ability to see the connection between my effort and the output. It started burning me out and I stopped wanting to come to work. I brought it to my manager’s attention when I realized this, and she helped me find a new project. It’s good to see that I’ve applied lessons from previous burnouts so they don’t become problems when they happen.

I’ll have to switch over to management if I want to eventually accomplish my 20 year career goals. But I learned that this is not the time yet.

Layoffs

Etsy had a crazy year. The board replaced our CEO and CTO. There were two rounds of layoffs. This affected almost a quarter of our workforce. After I found a list of people affected, I realized I was lucky to still have a job. It wasn’t like they lopped off the bottom quintile of performers. Good people were in the wrong place at the wrong time.

Some friends were booted from the company. Others voluntarily left afterwards because Etsy wasn’t what it used to be. It’s sad that it came to this. Honestly, there’s not much to learn from my experience with this. It would mostly be reflections on using gallows humor as a coping mechanism.

Our ex-CEO’s experience was much more instructive. Chad Dickerson was Etsy’s CEO until he announced his own firing and the first round of layoffs at an emergency all-hands. That could have been the end of the story. He was still technically employed for a month to ease the transition. But he could have just vanished from the spotlight. Nobody would have blamed him. Yet Chad faced his final curtain with a bow.

It’s an Etsy tradition for employees to give a “last lecture.” You never know what you’re going to get. Some people have bones to pick. Others treat it as a way to reflect on their entire lives. One was even presented as a fake roguelike. Chad’s last lecture was a story of his Etsy career. You saw his passion for the music he listened to, his values, and stories from his upbringing. It was the story of the people that he worked with and the struggles they faced. It was an overwhelmingly positive presentation. Even being ripped away from the CEO slot didn’t change the fact that Chad was pure class.

And then the new regime settled in. It was fascinating to see the difference in approaches. The old guard valued portfolio diversification. They tried to extend Etsy to be an organization that always set out on new journeys and made new bets. The new leadership has the complete opposite focus. They don’t just track core business metrics. They nerd out over them. We aggressively A/B test. We pare experiments and features to their essence to get real feedback quicker. This gives us confidence to throw away bad ideas. Despite the layoffs, there is a lot to like about working in the new model. We always know where we are going and we know what we will do when we get there. It will be interesting to see how Etsy handles a 2018 world where we need to mix short- and long-term goals. But things are looking up.

My friends trolled the shit out of each other with my side project

Etsy has an anachronistically-named Slack bot called “irccat”. It has dozens of built-in lookup functions: stock prices, server health, etc. In my estimation, its most common use is to store gif and meme URLs.

Earlier this year, my friends and I realized that we didn’t have the cognitive capacity to hold Google’s chat application matrix in our heads long enough to decide which was best for our group. Two of our friends were obsessed with Discord. We pulled the trigger and switched. Discord is a simple consumer version of Slack. It’s marketed to gamers, but it would work for most groups of people.

With legal’s blessing, I implemented a version of irccat for my friends to use on Discord. I wrote the first version in a few hours. It was painfully simple. All you could do was teach it commands. It also had the ability to build parameterized commands like LMGTFY and Urban Dictionary URLs. I talked to the server moderator about adding the bot. He agreed that we could give it a shot. So I invited it and taught it a few things to show people how to use it.

And use it they did! Since launch it has been taught over 1100 commands. People notice when it’s down, which is the best complement a side project of mine has ever received.

Unfortunately, we also took advantage of the super simple functionality to troll the shit out of each other. Commands couldn’t even be deleted in my first version. This started a land grab to learn insulting things to each other’s names and gaming handles. Now I appreciate the enormous social differences between “I use this software at my employer” and “I use this software within my group of friends.” My college friends have an assumption that we’ll forgive each other. We’ve known each other for more than a decade. But at work one false move means that you are looking for a new job. This seems to change the math quite a bit. I’m afraid to imagine what would have happened if a group of strangers used it.

So I implemented command deletion. Suddenly, commands started being swapped out with troll responses in private messages with the bots. It took a few iterations to find a local implementation minima where trolling wasn’t a major problem.

It’s easy to say “You should have seen this coming!”. But a takeaway is that it’s much easier to list the potential problems than it is to guess the exact social issues something will have in practice. Now I appreciate why rapid iteration is baked into a lot of industry contexts: if you can’t see these things coming, being reactive is your best option. A corollary is that I need to budget time for reactive engineering when implementing social software.

It’s not perfect. Right now, we’re having a conversation about whether downvoting is too mean when it is done publicly. Spamming is also a minor problem when features are a novelty. There was a 1 hour spamfest when Ryan implemented karma. This settled down when I announced that if I was provoked, the next feature I would implement is “permaban from ever using the bot again.” Apparently A Group Is Its Own Worst Enemy still holds true today. I thought I would be resistant to needing strong moderation tools since the bot was designed for my friends. But it turns out that we’re still just a social group, and our interactions with software still follow the same patterns as these communities with strangers.

Some of my friends also contributed features to the bot! This was tricky. I wanted to include their work while keeping the codebase at a high standard. One of the contributors has children. I was afraid that work could be abandoned if he went too far in the wrong direction. I also wanted to avoid any hard feelings. I tried to handle this by restructuring the project to be more understandable and usable. I filed tickets that outlined specs that I would like to implement. And I tried to be accommodating when they were testing their changes. No changes were abandoned, so I think this was a success overall.

Looking forward

I constantly re-evaluate goals. This means that I have a low probability of achieving the goals that I have for 2018. But that’s okay, because it’s likely that whatever else I’d like to do will be better.

I started studying Machine Learning as a side project. I’m not interested in developing novel algorithms. I’d like to become reasonably effective as a practitioner in the field, as measured by being happy about my performance in a Kaggle competition. This is daunting from where I am sitting: Coursera courses, multiple textbooks, new coding frameworks, etc. But it’s achievable now that I work on side projects in the morning. Dedicated study time raises my probability of success.

I’d like to run at least 2 marathons this year. I hurt my leg in October 2016 and it took 7 months before I recovered enough to run 2 miles. But now I’m starting to get back to my old distances. Finishing a March race is achievable at this point. I’d like to challenge my personal best in the fall. This is going to be tough; I’m 20-30 pounds over my normal racing weight range and I can’t drop weight as easily as I used to.

I’d also like to write more. I wrote 5 blog posts in 2017. I’d like to increase that to 12 in 2018. This post is one. I have a second queued up about refactoring code to make it more testable. That means that I need 10 more for the rest of the year. The only thing stopping me from doing it is myself.

Professionally, I’d like to get promoted to staff engineer. This may be harder in the new Etsy, since the high level engineers who would have vouched for me are gone. But that’s an incentive to continue to branch out within the company. Reducing technical debt is also a focus for the year; that feels like an opportunity that I haven’t quite solved yet.

I enjoyed being a mentor in 2017. I’d like to continue that trend and mentor at least 2 additional engineers in 2018.

The engineers that invested in me when I was a junior engineer

Lots of ink is being spilled over the following Tweet:

This doesn’t reflect the relationship that senior or junior engineers have. It also misunderstands the idea of investment.

A junior engineer is a learning position. In the beginning they need help finishing simple projects. Maybe they get starter bugs or simple tasks to build their confidence. They learn about how the system is designed and how to modify it. They learn when to ask questions and when to build consensus for a change. They make mistakes: they take down the site, and then their bugfix takes down more of the site. They learn about risk mitigation. They can isolate bugs much quicker. Soon, they are building prototypes and investigating new strategies. They learn about working on teams, and running small projects.

Over time they become a different engineer. They can finish any problem given enough time. They can break a project into chunks and finish it with others. In fact, they’ve come so far that they can usefully teach other engineers. This single leveled-up engineer has become a force of nature: one person who can not only complete any engineering project this company can throw at them, but can run teams and raise the level of everybody around them.

This doesn’t happen in a vacuum. Their more senior teammates helped them grow. Each of their tasks had to help the team. When the junior engineer started down the wrong path, the senior engineer saved them time by explaining alternatives. They made the junior engineer aware of their unknown unknowns. This takes lot of time. My dad once told me a rule of thumb about rock quarry employees: their first year costs twice their salary because they need to be trained. I see no reason to believe that junior software engineers are cheaper. I’m more likely to believe they are more expensive.

It took a village to turn me from a junior coder into a senior software engineer. I want to highlight a few people who invested that time in me, even though I was “slowing them down.”

Charles

When I first graduated from college, I wasn’t sure whether I wanted to go to grad school or work in industry. So I split the difference and worked at a computer vision research lab.

My first task was “talk to Charles.” Charles was one-of-a-kind: already over 60, he had been an astrophysics professor until he grew bored and made a late-career switch to computer vision. He was also one of the most effective engineers that I have ever worked with. His niche was a mastery of The Unix Way. He built large reliable batch processing systems out of auto-generated Makefiles, a few networked computers, and tiny single-purpose C++ programs. And he could do this and deliver the final system to the customer in less time than it would take most other teams of 3 engineers to solve it with modern technologies. And he could do it within the time and budget that he predicted.

When I walked into Charles’ office, he told me that I would be working on a project that involved coordinates, so he was giving me an intro to coordinate systems. What he said over the next 45 minutes was dense, and I learned about an incredible number of topics:

  • There were coordinate systems that aren’t latitude+longitude
  • We would be using one called UTM, which splits the world into unique map projections
  • There are coordinate systems like MGRS that describe areas instead of single points
  • It’s possible to find points that could be described multiple ways in some coordinate systems
  • There are separate coordinate systems that better describe locations and areas at the poles
  • Many things we were doing used “reference geoids,” which are surveyed descriptions of the earth with a guaranteed amount of accuracy.
  • Gravity is non-uniform across the earth and doesn’t necessarily always point to the same center based on local geology

And a ton of stuff that I don’t remember. I was desperately trying to hold on to every fact as they whizzed by me.

He was always willing to spend time to teach me. One of my first difficulties was understanding that doing more work could make a system more efficient. He sat me down and diagrammed how we weren’t making things worse, but rather improving the system holistically. He also gave me projects that encouraged me to grow. I understood applied Linear Algebra much better after working with Charles. Sometimes, his lessons were unusual. Once I told him I was having trouble modifying an ancient 1000 line Matlab script to process a new format of digital elevation maps. He disappeared for 45 minutes and came back with a 60 line AWK program. The message was clear: don’t spend time on trash when you can fix it.

He was the first person to take me seriously as a professional software engineer, and I often apply lessons about tradeoffs and scoping down work in my day-to-day. Plus all that awesome coordinate stuff! Gravity doesn’t always point to the exact same place?? How cool is that!

Oleg

I also worked on mobile robotics at the lab. Here I fell into Oleg’s domain. Where Charles had been gregarious, Oleg was reserved. His style was more judgmental than any manager tutorial would recommend, but he always backed up his judgements with patient explanations of how code and systems should be organized.

When I first designed changes to the system, Oleg would have me whiteboard out what I was planning. He would stare at it for a while and eventually mutter “this is not good.” Then he would show me my errors. My code would work but it would be fragile. Maybe my design for task threading would be hard to modify. He showed me how to better isolate concurrency work so that it would be more resilient by design. He showed me how to isolate concerns of an application. He showed me that you could split a single process into multiple processes that communicate via IPC, so that a crash in one component of your system didn’t take down the whole thing.

Ultimately I learned a lot about robustness. Just because your code works doesn’t mean it will be easy to write, easy to modify, or be fault-tolerant once it is running. Taking a step back and examining holistic architecture approaches gives you a major leg up. So does reworking fragile pieces.

Over time, his responses to my design would soften. I started to get “I think we can improve this” and sometimes I got all the way to “I think you should consider X. But this is good.” His lessons had been taking hold and helping me grow, and he was seeing the benefits of his time investment.

Olga and Luiz

When I joined Google, I mostly worked under Olga and Luiz. Olga was the über tech lead and manager of Docs and the Docs apps, and Luiz was starting to transition from being the tech lead for desktop Docs to having more of a cross-product role. And they were shockingly productive. They could code circles around me while dealing with all the challenges that come with leading a growing team. In some ways, working under each of them was frustrating because they had high standards. Their standards were far higher than the already pedantic ones across the rest of Google. They were also big fans of drive-by reviews, and had no qualms asking you to majorly restructure your code for seemingly subtle reasons. But working under both of them at the same time brought me further in my development than any other single thing.

Olga was a human code linter. Actually, she was more reliable than the actual Javascript linters that we had. Accordingly, she instituted Javascript standards for our project that were much more rigorous than the already-high bar set by the Javascript style guide. But she would also patiently explain her reasoning to people who asked. For instance, she had an exact style that she wanted multiple boolean conditionals checked, and was willing to sit down and explain why. And that sounds like a massive waste of her time, given the scope of her work compared to mine. But she understood that knowledge transfer was important, and was willing to take that time even for trivial things. Under her, I was also put on a trajectory from working on small projects, to working on larger projects, to working on a critical project, to hosting an intern, to leading small teams, to working on a project that was cross-cutting and took eng-years of time.

I worked with Luiz for years, and it was a masterclass in learning how senior engineers mentor junior engineers. He had an egoless way of helping engineers design features. When someone asked him a question, he made sure he understood their spec. And then he dove into the code to see how everything was organized. He sketched out the current system and start soliciting ideas. He’d write down the first thing that came to mind. When people gave suggestions, he’d write them down. He’d explain when things wouldn’t work. When there were no more ideas, he’d start talking through consequences of the changes. By the end of the process they had a solution they collaborated on. He took everybody seriously, and always listened to their thoughts and opinions. I learned an incredible amount by being on the business end of this process and getting a chance to watch him do this every day. Now that I’m at Etsy, I strive to help engineers that ask me questions in the same way that I watched Luiz do successfully for so many years.

Under the two of them, I started as an engineer that could code anything given enough time. I ended as somebody with a much stronger understanding of how to set standards for a team, how to work with junior engineers to level them up, and how to split up truly unbounded things into smaller projects that I could work on with other people.

Conclusion

Tons of engineers have shaped how I work. But I can point to a few engineers that invested real time in my development. You could say they made a horrible decision if you look at the localized time tradeoff. An hour of their time was worth much more than an hour of mine. But they invested in me, and now I am able to invest in other people because of the lessons that I have learned from them. Their investments not only paid off for the company because I became more effective, but I was also able to impart this same wisdom into other people.

What does a tech lead do?

This was written internally at Etsy. I was encouraged to post on my own personal blog so people could share. These are my opinions, and not “Etsy official” in any way.

Motivation for writing this

For the past 5 months, I have been the tech lead on the Search Experience team at Etsy. Our engineering manager had a good philosophy for splitting work between managers and tech leads. The engineering manager is responsible for getting a project to the point where an engineer starts working on it. The tech lead makes sure everything happens after that. Accordingly, this is intended to document the mindset that helps drive “everything after that.”

Having a tech lead has helped our team work smoothly. We’ve generated sizable Gross Merchandise Sales (GMS) wins. We release our projects on a predictable schedule, with little drama. I’ve seen this structure succeed in the past, both at Etsy and at previous companies.

You can learn how to be a tech lead. You can be good at it. Somebody should do it. It might as well be you. This advice sounds a little strange since:

  • It’s a role at many companies, but not always an official title
  • Not every team has them
  • The work is hard, and can be unrecognized
  • You don’t need to be considered a tech lead to do anything this document recommends

But teams run more efficiently and spread knowledge more quickly when there is a single person setting the technical direction of a team.

Who is this meant for?

An engineer who is leading a project of 2-7 people, either officially or unofficially. This isn’t meant for larger teams, or leading a team of teams. In my experience, 8-10 people is an inflection point where communication overhead explodes. At this point, more time needs to be spent on process and organization.

What’s the mindset of a tech lead?

This is a series of principles that led to good results for Search Experience, or are necessary to do the job. I’m documenting what works well in my experience.

More responsibility → Less time writing code

When I was fresh out of college, I worked at a computer vision research lab. I thought the most important thing was to write lots of code. This worked well. My boss was happy, and I was slowly given more responsibility. But then the recession hit military subcontractors, and the company went under. Life comes at you fast!

So I joined BigCo, and started at the bottom of the totem pole again. I focused on writing a lot of code, and learned to do it on large teams. This worked well. I slowly gained responsibility, and was finally given the task of running a small project. Until this point, I had been successful by focusing on writing lots of code. So I was going to write lots of code, right?

Wrong. After 2 weeks, my manager pulled me aside, and said, “Nobody on your team has anything to do because you haven’t organized the backlog of tasks in three days. Why were you coding all morning? You need to make sure your team is running smoothly before you do anything else.”

Okay, point taken.

So I made daily calendar reminders to focus on doing this extra prep work for the team. When I did this work, we moved faster as a three person unit. But I could see on my code stats where I started focusing more on the team. There was a noticeable dip. And I felt guilty, even when I expected this! Commits and lines of code are very easy ways to measure productivity, but when you’re a tech lead, your first priority is the team’s holistic productivity. And you just need to fight the guilt. You’ll still experience it. You just need to recognize the feeling and work through it.

Help others first

It sounds nice to say that you should unblock your team before moving yourself forward, but what does this mean in practice?

First, if you have work, but someone needs your help, then you should help them first. As a senior engineer, your time is leveraged–spending 30 minutes of your time may save days of someone else’s. Those numbers sound skewed, but this is the same principle behind the idea that bugs get dramatically more expensive to fix the later they are discovered. It’s cheaper to do things than redo things. You get a chance to save your teammates from having to rediscover things that are already known, or spare them from writing something that’s already written. Some exploration is good. But there’s always a threshold, and you should encourage teammates to set deadlines based on the task. When they pass it, asking for help is the best move. This could also help with catching bugs that will become push problems or production problems before they are even written.

Same for code reviews. If you have technical work to do, but you have a code review waiting, you should do the code review first. Waiting on someone to review your code is brutal, especially if the reviewing round-trip is really long. If you sit on it, the engineer will context switch to a new task. It’s best to do reviews when their memory of the code is fresh. They’re going to have faster and better answers to your questions, and will be able to quickly tweak their pull request to be submission-ready.

It’s also important to encourage large changes to be split into multiple pull requests. When discussing projects up-front, make sure to recommend how to split it up. For instance, “The first one will add the API, the second one will send the data to the client, and the third one will use the data to render the new component.” This allows you to examine each change in detail, without needing to spend hours reviewing and re-reviewing an enormous pull request. If you believe a change is too risky to submit all at once because it’s so large that you can’t understand all of its consequences, it’s OK to request that it be split up. You should be confident that changes won’t take down the site.

Even with this attitude, you won’t review all pull requests quickly. It’s impossible. For instance, most of my team isn’t in my timezone. I get reviews outside of work hours, and I don’t hop on my computer to review them until I get into work at the crack of 10.

I personally view code reviews and questions as interruptible. If I have a code review from our team, I will stop what I am doing and review it. This is not for everybody, because it’s yet another interruption type, and honestly, it’s exhausting to be interrupted all day. Dealing with interruptions has gotten easier for me over time, but I’ve gotten feedback from several people that it hasn’t for them. You will never be good at it. I’m not. It’s impossible. You will just become better at managing your time out of pure necessity.

Much of your time will be spent helping junior engineers

A prototypical senior engineer is self-directed. You can throw them an unbounded problem, and they will organize. They have an instinct for when they need to build consensus. They break down technical work into chunks, and figure out what questions need to be answered. They will rarely surprise you in a negative way.

However, not everybody is a senior engineer. Your team will have a mix of junior and senior engineers. That’s good! Junior engineers are an investment, and every senior engineer is in that position because people invested in them. There’s no magical algorithm that dictates how to split time between engineers on your team. But I’ve noticed that the more junior a person is, the more time I spend with them.

There’s a corollary here. Make sure that new engineers are aware that they have this option. Make it clear that it is normal to contact you, and that there is no penalty for doing so. I remember being scared to ask senior engineers questions when I was a junior engineer, so I always try hard to be friendly when they ask their first few questions. Go over in-person if they are at the office, and make sure that their question has been fully answered. Check in on them if they disappear for a day or two. Draw a picture of what you’re talking about, and offer them the paper after you’re done talking.

The buck stops here

My manager once told me that leaders take responsibility for problems that don’t have a clear owner. In my experience, this means that you become responsible for lots of unsexy, and often thankless, work to move the team forward.

The question, “What are things that should be easy, but are hard?”, is a good heuristic for where to spend time. For instance, when Search Experience was a new team, rolling out new features was painful. We never tire-kicked features the same way, we didn’t know what groups we should test with, we’d (unpleasantly) surprise our data scientist, and sometimes we’d forget to enable stuff for employees when testing. So I wrote a document that explained, step-by-step, how we should guide features from conception to A/B testing to the decision to launch them or disable them. Then our data scientist added tons of information about when to involve her during this process. And now rolling out features is much easier, because we have a playbook for what to do.

This can be confusing with an engineering manager and/or product manager in the picture, since they should also be default-responsible for making sure things get done. But this isn’t as much of a problem as it sounds. Imagine a pop fly in baseball, where a ball falls between three people. It’s bad if everyone stands still and watches it hit the ground. It’s better if all of you run into each other trying to catch it (since the odds of catching it are better than nobody trying). It’s best if the three of you have a system for dealing with unexpected issues. Regular 1:1s and status updates are a great way to address this, especially in the beginning.

Being an ally

Read Toria Gibbs’ and Ian Malpass’ great post, “Being an Effective Ally to Women and Non-Binary People“, and take it to heart. You’re default-responsible for engineering on your team. And that means it’s up to you to make sure that all of your team members, including those from underrepresented groups, have an ally in you.

“What does being a tech lead have to do with being an ally?” is a fair question.

First, you are the point person within your team. You will be involved in most or all technical discussions, and you will be driving many of them. Make sure members of underrepresented groups have an opportunity to speak. If they haven’t gotten the chance yet, ask them questions like, “Are we missing any options?” or “You’ve done a lot of work on X, how do you think we should approach this?”. If you are reiterating someone’s point, always credit them: “I agree with Alice that X is the right way to go.”

You will also be the point person for external teams. Use that opportunity to amplify underrepresented groups by highlighting their work. If your time is taken up by tech leading, then other people are doing most of the coding on the team. When you give code pointers, mention who wrote it. If someone else has a stronger understanding of a part of the code, defer technical discussions to them, or include them in the conversation. Make sure the right names end up in visible places! For instance, Etsy’s A/B testing framework shows the name of the person who created the experiment. So I always encourage our engineers to make their own experiments, allowing the names to be visible to all of our resident A/B test snoopers (there are dozens of us). If someone contributes to a design, list them as co-authors. You never know how long a document will live.

Take advantage of the role for spreading knowledge

When a team has a tech lead, they end up acting as a central hub of activity. They’ll talk about designs and review code for each of the projects on the team.

If you read all the code your team sends out in pull requests, you will learn at an accelerated rate. You will quickly develop a deep understanding of your team’s codebase. You will see techniques that work. You can ask questions about things that are unclear. If you are also doing code reviews outside of your team, you will learn about new technologies, libraries, and techniques from other developers. This enables you to more effectively support your team with what you have learned from across the company.

In this small team, Alice is the tech lead, and Bob is working directly with Carol. All other projects are 1 person efforts. Alice is in a position where she can learn quickly from all engineers, and spread information through the team.

Since you are in this position, you are able to quickly define and spread best practices through the team. A good resource that offers some suggestions for code reviews is this presentation by former Etsy employee Amy Ciavolino. It is a good team-oriented style. Feel free to adapt parts to your own style. If you’ve worked with me, you’ll notice this sometimes differs from what I do. For instance, if I have “What do you think?” feedback, I prefer to have in-person/Slack/Vidyo conversations. This often ends in brainstorming, and creating a third approach that’s better than what either of us envisioned. But this presentation is a great start, and a strong guideline.

Day-to-day work

As I mentioned above, much of the work of a tech lead is interrupt-driven. This is good for the team, but it adds challenges to scheduling your own time. On a light day, I’ll spend maybe an hour doing tech lead work. But on a heavy day, I’ll get about an hour of time that’s not eaten up by interruptions.

Accordingly, it’s difficult to estimate what day you will finish something. I worked out a system with our engineering manager that worked well. I only took on projects that were either small, non-blocking, or didn’t have a deadline. This is going to work well with teams trying to have a minimal amount of process. This will be a major adjustment on teams that are hyper-organized with estimation.

You need to fight the guilt that comes with this. Your job isn’t to crank out the most code. Your job is to make the people on your team look good. If something important needs to be done, and you don’t have time to do it, you should delegate it. This will help the whole team move forward.

When I’m deciding what to do, I do things in roughly this priority:

Inner loop:

  1. Answer any Slack pings
  2. Help anybody who needs it in my team’s channel
  3. Do any pending code reviews
  4. Make sure everybody on the team has enough work for the rest of the week
  5. Do any process / organizational work
  6. Project work

Once a day:

  1. Check performance graphs. Investigate (or delegate) major regressions to things it looks like we might have affected.
  2. Check all A/B experiments. For new experiments, look for bucketing errors, performance problems (or unexpected gains, which are more likely to be bugs), etc.

Once a week:

  1. Look through the bug backlog, make sure a major bug isn’t slipping through the cracks.

What this means for engineering managers

Many teams don’t have tech leads, but every team needs tech leadership in order to effectively function. This is a call-to-action for engineering managers to examine the dynamics of their teams. Who on your team is performing this work? Are they being rewarded for it? In particular, look for members of underrepresented groups, who may be penalized for writing less code due to unconscious bias.

Imagine a team of engineers. The duties listed above are probably in one of these categories:

A designated tech lead handles the work. If your team falls into this category, then great! Make sure that the engineer or engineers performing these these duties are recognized.

Someone’s taking responsibility for it, on top of their existing work. This can be a blessing or a curse for engineers, based on how the engineering manager perceives leadership work. It’s possible that their work is appreciated. But it’s also possible that people are only witnessing their coding output drop, without recognizing the work to move the team forward. If you’re on a team where #2 is mostly true (tech lead is not formalized, and some engineer is taking responsibility for moving the team forward, at the expense of their own IC work), ask yourself this: are they being judged just for the work they do? Or are they being rewarded for all the transitive work that they enable?

A few people do them, but they often get neglected. Work still gets done in this category, but there are systematic blockers. If nobody owns code reviews, it will take a long time for code to be reviewed. If nobody owns code quality, your codebase will become a swiss cheese of undeleted, broken flags.

Nobody is taking responsibility for them. In this category, some things just won’t get done at all. For instance, if nobody is default-responsible for being an ally for underrepresented groups, then it’s likely that this will just be dropped on the floor. This kind of thing is fractal: if we drop the ball on the group level, we’ve dropped the ball on both the individual, and company-wide, levels.

In conclusion

There is value in having a designated tech lead for your team. They will create and promote best practices, be a point-person within your team, and remove engineering roadblocks. Also, this work is likely already being done by somebody, so it’s important to officially recognize people that are taking this responsibility.

There is also lots of value in officially taking on this role. It allows you to leverage your time to move the organization forward, and enables you to influence engineering throughout the entire team.

If you’re taking on this work, and you’re not officially a tech lead, you should talk with your manager about it. If you’d like to move towards becoming a tech lead, talk to your manager (or tech lead, if you have one!) about any responsibilities you can take on.

Thanks to Katie Sylor-Miller, Rachana Kumar, and Toria Gibbs for providing great feedback on drafts of this, and to everyone who proofread my writing.

My friends trolled each other with my Discord bot, and how we fixed it

My last post describes a Discord bot, named “crbot,” that I wrote for my college friends. Its name is short for “call-and-response bot.” It remembers simple commands that are taught by users. I patterned it after some basic functionality in a bot we have at Etsy. crbot has become a useful part of my friends’ channel. It’s been taught over 400 commands, and two of my friends have submitted patches.

But there were problems. We also used crbot to troll each other. Bug fixes were needed to curb bad behavior. The situation is better now, but I’ve had a nagging question: “should I have seen this coming, or could I have only fixed these problems by being reactive?” I couldn’t answer this without my friends’ perspective. So I asked them!

I was hoping for a small discussion. However, it blew up. With the bot as a staging ground, the subtext of the conversation was about our friendships. Many of us have known each other for a decade. Some, far longer. We’ve seen each other through major life events, like home purchases, children, and weddings. But I don’t remember any conversation where we’ve discussed, at length, how we perceive each other’s actions. And we only resorted to personal attacks once or twice. Go us!

To answer “Could I have seen this coming?,” we’re going to look at this in three parts:

  1. What happened? A story about how the bot was used and abused, and how it changed over time.
  2. What did my friends think? All of their insights from our discussion.
  3. Lessons learned. Could I have seen this coming?

I think it’s worth adding a disclaimer. These discussions have an implicit “within my group of friends” clause. The fixes work because my friends and I have real-life karma to burn when we mess with each other. Not because they’re some panacea.

What happened?

First bot version: ?learn, without ?unlearn

Channel #general

// Teach a bot a command.
Katie: ?learn brent https://www.bitlog.com/wp-content/uploads/2017/05/brent.gif
crbot: Learned about brent

// Later, it's used.
Chris: my meeting went well today
Jake: ?brent
crbot: https://www.bitlog.com/wp-content/uploads/2017/05/brent.gif
Discord unfurler:
Brent giving a thumbs up

Sidebar: Where did the word “unfurler” come from? Is it named because it “un-f”s the url? Or because it’s actually sailing related? The people need to know.

At launch, my friends’ initial reaction was mixed. Some immediately got it, and taught it inside jokes. Others said, “I don’t understand what this does, or why we’d want this.” One cleverly pointed it to a URL of a precipitation map, only to discover that Discord’s unfurler wasn’t re-caching the image within a useful timespan. By now, everyone has latched onto the bot’s ability to quickly recall gifs. Good enough for me.

At launch, crbot could learn new commands. But it could not forget them. This introduced a land grab, where my friends claimed as many names as possible. Scarcity encouraged usage, which was good for the bot. However, my friends took this opportunity to ?learn ridiculous or insulting things for each other’s names.

It got a little nasty. One message insulted someone’s intelligence. Another referenced the fact that someone is adopted. I’m speaking generically because the phrasing was over-the-top. There wasn’t a legitimate defense against this. You could claim all the commands that you might find insulting, if you were somehow able to foresee every way that you could be offended. For instance, people tried squatting on placeholders on the ?learns of their actual names and online handles. However, the bot is case-sensitive, so they’d need to protect all capitalized variations of their names.

If you’ve read “A Group Is Its Own Worst Enemy” by Clay Shirky, none of this is a surprise. Technology amplified our group’s negative interactions, and there was no moderation to fight back. The joy from trolling on crbot outweighed any reputation hit they took to their actual friendships, especially if they learned the commands in private. However, most of the things that I found really abusive were done in broad daylight.

global ?unlearn

To combat this, Ryan submitted the first user-contributed patch, to add ?unlearn. Now, users could make the bot forget responses.

Channel #general

// Show that the bot has not forgotten about dre
Jake: ?dre
crbot: I know who Dr. Dre is!

// Delete the ?dre command
Jake: ?unlearn dre
crbot: Forgot about dre

// The bot has forgotten about dre
Jake: ?dre
// No response

This helped a little. There was no longer an incentive to be the first to ?learn something. Now, you needed to be the last to learn it. This incentivizes high-quality ?learns. Your crummy commands are going to be replaced by better ones.

There was an adjustment period, where we figured out acceptable use. For instance, I replaced ?popcorn with a better gif, which touched off an argument about ?unlearn etiquette. There was a long sidebar about the life choices of people who post shitty gifs, when better ones exist. We settled on some guidelines, like “the person to ?unlearn a command should be the person who ?learned it.” We don’t always follow these rules. But it’s a good start.

?unlearn introduced a second problem. ?unlearn could be executed in a private channel. This introduced an attack where popular commands could be replaced in a direct message with crbot.

Direct Message with crbot
Attacker: ?unlearn ping
crbot: Forgot about ping
Attacker: ?learn ping Fuck you, ping yourself.
crbot: Learned about ping

Later, in #general
Jake: hey! there's a new release of the bot
Jake: ?ping
crbot: Fuck you, ping yourself.
Jake: :(
Jake: ?unlearn ping
crbot: Forgot about ping
Jake: ?learn ping pong
crbot: Learned about ping
Jake: ?ping
crbot: pong

As a design decision for crbot, I don’t log anything. Basically, I don’t want to respond to “who did X” bookkeeping questions, and I don’t want to write a system for others to do this. I don’t know who ?learned what, and I don’t care. This anonymity created a problem where there is no accountability for private ?unlearns. To this day, I still don’t know who did these. Nobody ever stepped forward. I would claim that I didn’t take part, but that’s what everybody else says, too 🙂

Public-only ?unlearn

We had a group discussion about ?unlearn, where I proposed that ?learn and ?unlearn could only be executed in public channels. My idea was that a public record would force everybody to properly balance the social forces at work. Andrew and Bryce argued that only ?unlearn should be prevented from being executed in private. This would force our real-life karma to be tied to removing someone else’s work. But ?learn should be allowed in private channels, since Easter eggs are fun. Plus, the command list is so large, that a new command will never be found, without being used publicly by the person who created it.

So, Ryan tweaked his ?unlearn implementation so it could only be executed in public channels. Now that a month has gone by, it has elegantly balanced ?unlearn and ?learn within our group of friends. The social forces at work have prevented further abuses of the system.

Hobbit bomb

One of our friends is often called a hobbit. I don’t know the details. Something about his feet.

Anyways, this led to ?hobbitbomb, which pastes a url of his picture. 48 times in one message. So, typing ?hobbitbomb once causes the Discord unfurler to inline the same image 48 times. The effect is that it takes up a massive amount of vertical screen real estate; it takes a long time to scroll past the bomb. It was used 7 times across a month (I used it a few of those times), and then effectively abandoned.

My friends’ reactions fell into 2 camps.

  1. So what?
  2. This makes Discord unusable. Also, this isn’t funny.

At one point, somebody decided that they’d had enough, and they ?unlearned ?hobbitbomb. The original poster, not to be deterred, created ?hobbitbombreturns, ?hobbitbombforever, and ?hobbitbombandrobin, all of which were duplicates of the original. A good meme has a healthy immune system.

Then, there was a lengthy detente, where the capability still existed to ?hobbitbomb, but nobody was using it. Finally, the command was brought up as a major source of frustration during our lengthy conversation on trolling in our channel. My friends settled on a social outcome: they limited the bomb size to 4 images (since 4 hobbits forms a fellowship). It still exists, it still requires scrolling, but it’s not extreme.

What did my friends think?

?unlearn abuse

Multiple patches were needed to balance ?learn and ?unlearn. Despite that, some people in the channel didn’t think that ?learn abuse was noteworthy. Their viewpoint was interesting to me. This required actual code fixes, so it must have been the biggest problem we faced. Right?

But when looking at individual instances, the problems caused by ?unlearn were minor. “I’m kind of annoyed right now,” or “I need to claim my username, so that nobody learns something weird for it.” This happened in low volume over time. For me, it added up to being the worst abuse of crbot. For other people, this was just something mildly annoying to deal with over time.

Hobbit bomb, and my own blind spots

Before the discussion, I hadn’t given ?hobbitbomb a second thought. “So what?” was my official position, and I wasn’t alone in having it. Scrolling seemed like a minor problem. But other people were seriously impacted. One friend felt it was the only abuse in the channel, and had to be reminded of all the ?unlearn abuse.

Before the bot, we had 2 tiers of users: moderators, and an owner. But when I added the bot to the channel, I created another implicit power position as the bot’s maintainer. I can change the bot to prevent behaviors from happening again. I can reject my friends’ patches if I don’t like them. I can still do private ?unlearn myself, since I have access to the Redis database where the commands are stored. And I can just shut off the bot someday, for any reason I want. This cuts both ways – I’m held in check, because the bot can be banned.

Anyways, the most interesting part of our discussion was finding out that I had a blindspot in how I handled this situation. I never thought that ?hobbitbomb was a problem, so I didn’t even file a bug ticket. I had been treating social problems like technical bugs, but this one hadn’t risen to the level of reporting yet. I needed to disconnect myself from my own feelings, and implement fixes based on my users’ complaints. As Chris put it, “the issue is its potential and how different people react to its use.”

Otherwise, you end up like Twitter, which had a long and storied harassment problem that has reportedly cost the company potential buyers. In my experience with crbot, users have great suggestions for fixing problems, and I’ve seen great user suggestions for Twitter. For instance, “I only want to be contacted by people who have accounts that are verified with a phone number. And when I block an account, I never want to see another account associated with that phone number.”

Technical solution, or compromise?

Since technical and social problems are related, I offered to fix ?hobbitbomb technically; I’d limit crbot’s output to a small number of URLs per response. ?hobbitbomb might be the only command that has multiple URLs per response, so it would have little impact. One of my friends pointed out that part of its utility is how annoying it is. So this would have the dual-impact of reducing pain, and reducing utility.

My friends rejected this offer, and decided to work towards a compromise. This was interesting to me; the core problem is still latent in the project. I may still implement my fix. If the bot were exposed to the public, I’d have to implement it, given how the Discord unfurler works. But on the other hand, I can think of a dozen ways to troll somebody with the bot, and I haven’t even finished my second cup of coffee. Plus, the premise of this whole chatroom is that we are friends. We have the option to create house rules, which might not be available in public forums.

During the discussion, we reduced the ?hobbitbomb payload from 48 to 4 images. This is enough to clear a screen, but doesn’t force people to scroll through multiple pages of hobbits. I don’t think that everybody was happy with this, since the ?hobbitbomb payload still exists. But both camps accepted it, and the great ?hobbitbomb war of 2017 was finally put to bed.

Social forces of friendship

Most of the problems with the bot were fixed with fairly light technical solutions, or house rules. For instance, public-only ?unlearn was the last time we saw ?unlearn abused in any capacity, even though there are still plenty of ways to cause mischief. And we have a few house rules; for instance, “don’t ?unlearn commands you didn’t ?learn.”

As Chris pointed out, this implies that everybody in the group assigns some weight to the combination of “we care about each other” and “we care about how we are perceived by each other.” This adds a hefty balancing force to our channel. It also means that all of my fixes for this channel are basically exclusive to this channel. There’s no way that this bot could be added to a public Discord channel. It would turn into a spammy white supremacist in 3 seconds.

Could I have seen this coming?

Or put a better way, “If I had to implement just one anti-trolling solution, before any of my friends ever used the bot, what would I implement?”

I imagined tons of problems that never arose. Nobody mass-unlearned all messages. Nobody mass-replaced all the messages. Nobody did a dictionary attack to ?learn everything. Nobody tried spamming the ?list functionality to get it blocked from using Github Gists. Nobody managed to break the parser, or found a way to get the bot to /leave the channel (not that they didn’t try). I didn’t need to add the ability to undo everything that a specific user had done to the bot. Once my friends saw the utility of crbot, there was little risk in the bot being ruined.

I did foresee spam as a problem. But I would have guessed that it’d be somebody repeating the same message, over and over again, to make the chat unusable. I never expected ?hobbitbomb, one message that was so large that some of my friends thought it broke scrolling. I’m not even sure this is fixable; even if I limited the number of images in a response to one, I imagine that one skinny + tall image can be equally annoying. I’m at the mercy of the unfurler here. Also, my traditional image of spam is something that comes in mass volume, not something that has a massive volume.

So, back to ?learn without ?unlearn. I should have seen that one coming. My idea was that this created scarcity, so people would be encouraged to use the bot. I didn’t imagine that people would use the opportunity to ?learn things that were abusive. Plus, the functionality for ?learn and ?unlearn are quite similar, so I could have quickly gotten it out the door, even if I still wanted to launch the bot with just ?learn. Launching without ?unlearn was too aggressive. Even with social pressures at work, we needed to have the ability to undo.

When reviewing the ?unlearn patch, I never guessed that private ?unlearn would be abused like it was. Honestly, a lot of this surprised me. This wasn’t even the general public; these were all problems that were surfaced by people I’ve known for a decade. If I can’t predict what they’re going to do, then it feels like there’s no hope to figure this out ahead of time, even if you have mental models like “private vs. public,” or “what is the capacity of people to tolerate spam?”

So my key takeaways from this project are pretty simple.

  • Discuss bug fixes with impacted users. They have great opinions on your fixes, and will suggest better ideas than you had. Especially if the people are technical.
  • Treat all user complaints like technical bug reports. Not just the ones you agree with. That doesn’t mean that all reports are important. But they deserve to have estimates for severity, scope of impact, and difficulty of the fix.
  • Plan on devoting post-launch time to fixing social problems with technical fixes. Because you will, whether you plan on it or not.
  • Every action needs to be undone. The most basic of moderation tools. Not even limiting the bot to my own friends obviated this.
  • Balance public-only and public+private. Balance privacy and utility. When something involves your personal information, it should be default-private. When your actions interact with other users, it should be attributed to you.

Thanks to Andrew, Brad, Bryce, Chris, Drew, Eric, Katie, and Ryan for sharing their thoughts!

Writing a Discord bot, and techniques for writing effective small programs

Build with blocks, not by piling sand

My old college friends and I used a Google Hangout to keep in touch. Topics were a mix of “dear lazychat” software engineering questions, political discussion, and references to old jokes. Occasionally, out of disdain for Hangouts, we discussed switching chat programs. A few friends wanted to use “Discord,” and the rest of us ignored them. It was a good system.

But then one day, Google announced they were “sunsetting” (read: murdering) the old Hangouts application, in favor of two replacement applications. But Google’s messaging was odd. These Hangouts applications were targeted to Enterprises? And why two? We didn’t take a lot of time to figure this out, but the writing on the wall was clear: at some point, we would need to move our Hangout.

After the news dropped, my Discord-advocating friends set up a new server and invited us. We jumped ship within the hour.

It turns out that they were right, and we should have switched months ago. Discord is fun. It’s basically Slack for consumers. I mean, there are differences. I can’t add a partyparrot emoji, and that’s almost a dealbreaker[0]. But if you squint, it’s basically Slack, but marketed to gamers.

As we settled in to our new digs, I found I missed some social aspects of Etsy’s Slack culture. Etsy has bots that add functionality to Slack. One of my favorites is irccat. It’s designed to “cat” external information into your IRC channel Slack channel. It’s “everything but the kitchen sink” design; you can fetch server status, weather, stock feeds, a readout of the foodtrucks that are sitting in a nearby vacant lot. A whole bunch of things.

But one of my favorite features is simple text responses. For instance, it has been taught to bearshrug:

Me: ?bearshrug
irccat: ʅʕ•ᴥ•ʔʃ

Or remember URLs, which Slack can unfurl into a preview:

Me: hey team!
Me: ?morning
irccat: https://www.bitlog.com/wp-content/uploads/2017/03/IMG_0457.jpg

Lots of little routines build up around it. When a push train is going out to prod, the driver will sometimes ?choochoo. When I leave for the day, I ?micdrop or ?later. It makes Etsy a little more fun.

A week or two ago, I awoke from a nap with the thought, “I want irccat for Discord. I wonder if they have an API.” Yes, Discord has an API. Plus, there is a decent Golang library, Discordgo, which I ended up using.

And away I go!

Side project organization

So, yeah, that age old question, “How much effort should I put into my side project?”

The answer is always, “It’s your side project! You decide!”. And that’s unhelpful. Most of my side projects are throwaway programs, and I write them to throw away. The Discord bot is different; if my friends liked it, I might be tweaking it for years. Or if they hated it, I might throw away the work. So I decided to “grow it.” Write everything on a need-to-have basis.

I get good results when I grow programs, so I’m documenting my ideas around this, and how it sets me up for future success without spending a lot of time on it.

I want to be 100% clear that there’s nothing new here. Agile may call this “simple design.” Or maybe I’m practicing “Worse is Better” or YAGNI. I’ve read stuff written by language designers, Lisp programmers, and rocket scientists about growing their solutions. So here’s my continuation, after standing on all these shoulders.

Growing a newborn program

Most of my side projects programs don’t live for more than a day or two. Hell, some never leave a spreadsheet. Since I spend most of my time writing small programs, it makes sense to have rules in place for doing this effectively.

Writing code in blocks makes it easy to structure your programs

By this, I mean that my code looks roughly like this:

// A leading comment, that describes what a block should do.
something, err := anotherObject.getSomething();
if err != nil {
    // Handle error, or maybe return.
}
log.Printf("Acquired something: %d", something.id)
something.doAnotherThing();

Start the block with a comment, and write the code for the comment. The comment is optional; feel free to omit it. There aren’t hard-and-fast rules here; many things are just obvious. But I often regret it when I skip them, as measured by the number that I add when refactoring.

Blocks are useful, because the comments give a nice pseudocode skeleton of what the program does. Then, decide whether each block is correct based on the comment. It’s an easy way to fractally reason about your program: Does the high level make sense? Do the details make sense?  Yay, the program works!

For instance, if you took the hello-world version of my chatbot, and turned them into crappy skeletal pseudocode, it would look like this:

main:
    ConnectToDiscord() or die
    PingDiscord() or die
    AddAHandler(handler) or die
    WaitForever() or wait for a signal to kill me

handler:
    ReadMessage() or log and return
    IsMessage("?Help") or return
    ReplyWithHelpMessage()

There’s a lot of hand-waving in this pseudocode. But you could implement a chatbot in any language that supported callbacks and had a callback-based Discord library, using this structure.

Divide your code into phases

In my first job out of school, I worked at a computer vision research lab. This was surprisingly similar to school. We had short-term prototype contracts, so code was often thrown away forever. It wasn’t until I got a job at Google later that I started working on codebases that I had to maintain for multiple years in a row.

At the research lab, I learned what “researchy code” was – complicated, multithreaded computer code emulating papers that are dense enough to prevent a layperson from implementing them, but omit enough that a practicing expert can’t implement them either. No modularization. No separation of concerns. Threads updating mutable state everywhere. Not a good place to be.

So, my boss had the insight that we should divide these things at the API level, and have uniform ways to access this information. Not groundbreaking stuff, but this cleverly managed a few problems. Basically, the underlying code could be as “researchy” as the researcher wanted. However, they were bound by the API. So once you modularized it, you could actually build stable programs with unstable components. And once you have a bunch of DLLs with well-defined inputs and outputs, you can string them together into data-processing pipelines very easily. One single policy turned our spaghetti code nightmare into the pasta aisle at the supermarket; the spaghetti’s all there, but it’s packaged up nicely.

I took this lesson forward. When writing small programs, I like to code the steps of the program into the skeleton of the application. For instance, my “real” handler looked like this, after stripping out all the crap:

command, err := parseCommand(...)
if err != nil {
    info(err)
    return
}

switch command.Type {
case Type_Help:
    sendHelp(...)
case Type_Learn:
    sendLearn(...)
case Type_Custom:
    sendCustom(...)
case Type_List:
    sendList(...)
}

Dividing my work into a “parse” and  “send” phase limits the damage; I can’t write send() functions that touch implementation details of parse(), so I’m setting myself up for a future where I can refactor these into interfaces that make sense, and make testing easier.

Avoid optimizations

Fresh out of college, I over-optimized every program I wrote, and blindly followed trends that I read recently. I’d optimize for performance, or overuse design patterns, or abuse SOLID principles, or throw every feature in C++ at a problem. I’m guilty of all of these. Lock me up. Without much industry experience, I just didn’t understand how to tactically use languages, libraries, and design techniques.

So I’ve started making a big list of optimizations that I don’t pursue for throwaway personal programs.

  • Don’t make it a “good” program. It’s fine if it takes 9 minutes to run. It’s fine if it’s a 70 line bash script. Writing it in Chrome’s Javascript debugger is fine. Hell, you’d be shocked how much velocity you can have in Google Sheets.
  • Writing tests vs tracking test cases. Once you’ve written enough tests in your life, you can crank out tests for new projects. But if your project is literally throwaway, there’s a break-even point for hand-testing vs automated testing. Track your manual test cases in something like a Google Doc, and if you’re passing that break even point, you’ll have a list of test cases ready.
  • Make it straightforward, not elegant. My code is never elegant on the first try. I’m fine with that. Writing elegant code requires extra refactoring and extra time. And each new feature could require extra changes to resimplify.
  • Don’t overthink. Just write obvious code. You don’t need to look something up if you can guess it. For instance, variable names: my variable name for redis.Client is redisClient. I’m never going to forget that, and it’s never going to collide with anything. Good abbreviations require project-wide consistency, and for a 1000 line project, it’s hard to get away with a lot of abbreviated names.
  • Don’t make it pretty. For instance, my line length constraints are “not too much.” So if I look at something and say, “that’s a lot!” I keep it. But I refactor if I say, “That’s too much!”

Release

Once I tested the code, and got the bot running, I invited it into our new Discord channel. Everyone reacted differently: some still don’t understand the bot, and others immediately started customizing it. Naturally, my coder friends tried to break it. One tried having it infinitely give itself commands; another fed it malformed commands to see if it would break. Two of my friends have filed bugs against me, and one is planning on adding a feature. My friends have actually adopted it as a member of the channel. I love the feeling of having my software used, even just by a few people.

There have also been some unexpected usages. Somebody tried to link to snowfall images that are updated on the remote server. Unfortunately, Discord’s unfurler caches them, so this approach didn’t work like we wanted it to. Bummer. My program almost came full circle; my call-and-response bot would have been used to cat information into the channel, just like its predecessor, irccat.

So yeah, my chatbot is alive, and now comes the task of turning it from a small weekend project into More Serious code. Which has already started! Click here to follow me on Twitter to get these updates.

Links

Github project: https://github.com/jakevoytko/crbot

Version of code in the post: https://github.com/jakevoytko/crbot/commit/8ceaeaf1ec34a45e91eff49907db1585d5d22f53

[0] For people who do not know me well: I am serious. I cannot be more serious.

Review of “A Sense of Urgency” by John Kotter

Fresh out of college, I was a systems software engineer at a computer vision research lab. We had a simple business model: win tons of cheap, low-margin research grants, and throw them at the wall. A few would stick, and we would sell those (at a high margin) to whoever would pay. Namely, the US government, or other military subcontractors.

Me, wearing augmented reality gear. Your tax dollars at work.

My group focused on speculative mobile robotics projects. Per the business model, this means I was repeatedly thrown at a wall. Most of our projects were funded by the government. This wasn’t mandated by anybody. We did this by choice. It was easier, because (a) our leadership had an extensive network of military contracts, and (b) DARPA kept publishing “Request for Proposals” in areas where we had PhDs. We were in a rut, but it was a rut that was filled with money, so we weren’t complaining.

The rest of the company worked on different things, but had similar stories. And all ran on government money. Remember, our business model was based on having occasional breakout successes. Since all candidate projects ran on government money, that means that our breadwinners also ran on government money. All of our eggs were in one basket.

To be fair, this did worry my boss. Over time, he made small efforts to correct this. We did seedling projects with companies that lacked researchers, but could benefit from computer vision and automation. But there were always drawbacks. Civilian research projects pay worse, and expect a quick ROI. And the problems are often harder. If you buy a robotic lawn mower, it absolutely cannot run over all of the tulips that you planted. But it has to run next to the flower bed, because otherwise the grass would stick out. So the robot has to be near-perfect, every time. No excuses, like “it was driving into the sun, and the grass was wet, so the effectiveness of the sensor was compromised.” No, screw you, you ran over my tulips, give me my money back. In comparison, military projects have a simplicity to them. A robot that’s carrying your gear through a desert can get away with a lot, if it doesn’t run into you.

Anyways, 2009 rolled around, and DARPA’s budget was slashed. Projects were cut, delayed, and canceled across the board. The gloom was palpable at industry events, which used to have a pervasive fake political cheeriness I could never stomach. But 2009 was different. Featured speakers would shake their head, and stare blankly into the crowd. “I’ve never seen an environment like this, where they just tell you that your project is canceled. It’s never happened to me, in 21 years!”

By mid-2010, our company was hemorrhaging money, and this gloom spread to our monthly financial all-hands. To avoid going belly-up, we started the process of merging into our parent company, which was coasting off a high-profile technology sale. I left for greener pastures. Many other people followed me out the door.

Oh right, I’m reviewing a book. This could have been straight out of “A Sense of Urgency” by John Kotter. Many of the motivating stories have a similar format:

  1. A company enters a “business as usual” mode. Employees stop focusing on the interaction between the business and the external world, and start focusing on the internal world of the business.
  2. The external world changes, and the business doesn’t notice.
  3. Disaster!
  4. “If only we had acted more urgently about the external world changing!”

Kotter asserts, early and often, that the missing spice from these businesses is “urgency.” Unfortunately, urgency isn’t defined anywhere, at least not cohesively. The book gives you a sense of its definition, but I doubt that everyone ends up in the same place. This makes discussing the book frustrating. So when people tell me, “I think our group is acting more urgently since discussing this book,” I don’t know what they mean.

But again, the book gives you an inkling; urgency is a strong focus on the interaction layer with the real world. I.E. the only parts of your business that mean that you make money. The parts of your business that would need to change if the core assumptions shift. Or maybe this isn’t the definition. But that’s my best guess, and I’m going to use this.

Good parts

My favorite part about the book is that it provides a mental model for noticing complacency, and enacting positive organizational change. It’s not a great model; I tried to list out the different actors in the model, and the motivations and goals that each of them have, and wasn’t able to clearly do it. But the book provides lots of inflection points, which are also useful.

  • “Are we too complacent?”
  • “OK, we’re complacent. Now what?”
  • “Are we reacting to external events?”
  • “Are we focusing enough on the customer?”
  • “Is anyone trying to stop the positive change?”
  • “We’re successful now. How do we avoid taking our eye off the prize?”

These provide a framework for recognizing that you are too internally-focused, helps you identify some of the organizational players (like nay-sayers), tells you how to gain a coalition for implementing change, lists out a ton of red flags for recognizing that your efforts to fight complacency are stalling, and that’s just what I can remember in a few minutes.

The book has a ton of examples, and they are well-curated. Humans are story-driven creatures, so the book is easier to remember than many. For instance, the “fix the company” effort that failed because they outsourced to consultants. Or the one whose meetings were continually rescheduled. Or the story of the woman who effected organizational change by forming a coalition of middle managers who were friends with upper management. That’s the type of political tact that I lack, so it’s interesting to see examples where enacting this type of change is approachable.

I also liked the strong emphasis on the customer. I work on Wholesale at Etsy, so I am surrounded by these stories on both the buyer and seller side. I often take for granted that businesses know their users, to the level of knowing their desires and fears. As someone with access to these kinds of stories, they are extremely useful, because they inform many decisions you make, from product all the way down to engineering.

This book is clearly written for a VP/CEO level, and I am not a VP or a CEO. Really! However, organizations are fractal, so much of the advice is applicable to my daily work. The idea of urgency is helping my team frame discussions. We’re no longer focused on the question of “what is the 100% best way to build this?,” but rather, “what is the best way to balance short-term wins versus long-term investment?” And I think that subtle attitude shift is going to massively impact our success in 2017-2020.

Bad parts

“Build for the future” is a big theme in engineering. It’s the idea that provides order-of-magnitude improvements like Google File System+MapReduce, or Amazon Web Services. The idea that you can invest into your own business to provide these kinds of gains is nowhere to be found.

To be clear, the book never discourages it. “Our competitors are beating us using a new technology” is the type of external factor that Kotter wants you to notice. But the stories are focused on these short-term wins, when there’s really a whole universe of possible stories related to over- or under-investment. Here are some types of modern stories I wish were included:

  • “We focused on engineering and ignored our users.”
  • “We focused on short term wins for 2 years, and now we can’t change our code fast enough to compete with our new competitor”
  • “The performance of our site sucked. I convinced management to invest 18 eng months of time building a new caching layer, and the speed improvements to our site massively improved retention and conversion rates”

The examples in the book are very one-note, and maybe that’s because they are optimized for a business structure where investment produces proportional gains, instead of potentially having order-of-magnitude benefits.

There is also no mention of talking about risk or time. In my experience, there is often a balance between a small investment with a small win, and a large investment with a large win. Maybe that makes sense, given the premise of the book: your business is starting to fail, and you have not realized it yet, so you need to literally pick the one most important thing related to the existence of your business, and optimize that.

Also, did I mention that Kotter doesn’t cohesively define a sense of urgency? It’s the damn title of the book, and it’s an exercise for the reader. C’mon.

Conclusion

The book touches on a common meta-problem for businesses: not focusing on the interaction between your business and the real world. Especially when you’ve already met with some success. For instance, at the computer vision research lab, there was a year and a half between DARPA’s budget being slashed, and our full realization that we were screwed. We spent a lot of time continuing to focus on government projects for our little group, instead of trying to solve the problem, company-wide, that the old avenues of high-tech research funding were drying up.

I guess this means I’m recommending this book because I watched a company fail, from the inside, due to a lack of urgency. Yeah, that sounds right. I’m not sure that this book would have saved the company, but I can tell you that it couldn’t have hurt.

Colorblindness doesn’t really affect Counterstrike: Global Offensive

Source code for this post is on Github

The scientific community has received press lately about the consequences of only publishing positive results. Here is one such article. In the current metagame for academics, where they are only rewarded for positive results, researchers throw negative results in the trash. This introduces biases into scientific reporting. For instance, researchers sometimes add observations to borderline data to try to make them significant, even though this is likely to be statistically invalid.

I recently invalidated a theory of mine, and here are my results.

Conclusion: Colorblindness doesn’t really affect CS:GO

Or more accurately, protanopia doesn’t really affect Counterstrike: Global Offensive.

Here is a video of what a protanope sees, compared to that of a normal person. This is the video that was the most different. Many others were almost identical to their originals. Note that it starts in black-and-white, and fades to color.

Original

Protanope version

If you’re also a protanope, you won’t know why these look different. A friend reacted like this: “[de_train is] crazy washed out. Everything looks yellow all the time.” Most of the other maps weren’t far off, and de_mirage looks almost identical to the original version.

How I produced these videos

I occasionally play Counterstrike: Global Offensive (CS:GO) with some friends. CS:GO is a 5v5 shooter, where two teams engage in short rounds until a team is dead, or an objective is met. It famously has bad visibility; it’s so bad that professional players use a graphics card setting called “digital vibrance” to dramatically increase the color contrast past what the game allows. This turns a bleak military shooter into a game as colorful as a children’s cartoon. This exposes players who would otherwise blend in with the bleak levels. On my Macbook Pro, I’m stuck with turning my brightness to 100% and hoping for the best.

I’m a protanope (red-green colorblind), and I always wondered whether colorblindness makes Counterstrike harder. In one sense, it must be worse. Right? By some estimates, my eyes can differentiate 12% of the color spectrum, compared to normal eyes. On the other hand, it doesn’t seem to matter. I usually die because I’m bad at the game, not because somebody blended in with a crate.

As a first step, I decided to quantify this. I downloaded a bunch of my game demos, and for each round, I recorded why I died in a spreadsheet. Sure enough, I’m bad at Counterstrike.

link to source data

I defined these as follows:

  • Aim duel: I engage in a gunfight and lose
  • bad peek: I left a hiding place and died
  • did not see: Killed by someone who was visible, but I clearly did not see
  • failed clutch: Died when I was the last person alive. These are often low-percentage situations. A minor bright spot: I never died while saving, and I did save a fair amount
  • no cover: I chose to not take cover, and was killed as a result
  • outplayed: Killed from the side or behind. Often because of a communication breakdown.
  • rushing: The strategy was a “contact play,” where we run until we meet the enemy, and I caught a stray bullet
  • too aggressive: Similar to rushing, but instead I’m being stupid
  • trade killed: I sacrificed myself to gain map advantage. I counted these if I was avenged within 5 seconds, without the avenger dying.

So, visibility isn’t an issue for my gameplay, even without digital vibrance. This could have a few explanations:

  1. Maybe other players at my awful matchmaking rank can’t take advantage of visibility.
  2. Maybe I’ve adapted to the poor visibility of the game.
  3. Maybe this is confirmation bias: we remember the times where a hidden person killed us, specifically because everyone talks about the terrible visibility of these maps

Writing code to show this

Last year, I had a weekend project called @JakeWouldSee, where I wrote a colorblind Twitterbot. You tweet it an image, it tweets back with the colorblind version.

The idea is simple: port the bot’s algorithm to work for videos. This will allow my friends to see what Counterstrike looks like to me, and I’m all about friendship.

I used to work at a computer vision research lab, so I decided to write it in C++, on top of OpenCV, Bazel, GoogleTest, and Boost. I haven’t written C++ in 6 years, so I kept it simple.

Sidebar: Image libraries, WHY?

Just as I was leaving the computer vision world in 2010, OpenCV’s new C++ bindings were getting buzz. With this project, I finally had an excuse to try them. In pure computer vision tradition, OpenCV’s “modern” C++ API is good enough. But then you access a 3-channel pixel, and you’re thrown into every C code base you ever hated working with.

Your image is decoded into a matrix of [rows][cols][channels], and you can only access BGR 3-tuples. Seriously. It’s fine that it uses BGR. The memory layout is well-documented. But this is needlessly confusing, and is the type of thing that, in my experience, constantly causes off-by-1 errors in research code.

This shows the danger of using my out-of-date knowledge. OpenCV was clearly written with different design goals than I have. I care about safety, and they likely care about speed and backwards portability. If there’s a new hotness in C++ image encoding/decoding and manipulation, I’d love to hear about it! Especially if these types of shenanigans are caught at compile time.

The results

I wrote a small program, video_colorblind. It ingests any video file that OpenCV can read, and spits out a video file showing what a protanopic person (red-green colorblind) would see. You can see some results above.

You can look at it on Github.

In the top left corner, the program inserts the RMSD between the original and the protanope version. It is calculated by the difference in the XYZ colorspace. Yes, this means the RMSD is wrong, since it is changed by writing the result on the image. But I’m not coding software for the space shuttle, I don’t need perfection. Besides, it’s fun to watch, assuming you like numbers as much as I do.

It turns out that I see most Counterstrike maps correctly, to a small margin of error. I showed the videos to some friends, and they weren’t that impressed. On one hand, it’d be fun to be able to slam Valve for not being friendly to red-green colorblind individuals. I could have gotten some serious Reddit karma for the headline, “Valve hates colorblind people AND I HAVE PROOF.” On the other hand, it’s good to know that my experience is about as frustrating as everyone else’s.

For comparison: this image has an RMSD of about .19.

A picture of sushi that Tyler made for me

A picture of sushi, with normal color vision

Sushi as seen by a protanope

Sushi, as seen by a protanope. Apparently, it is disgusting. RMSD = .19

Most of the CS:GO maps had RMSDs of .001 to .04.

I posted it to Reddit anyways, and got 25-ish upvotes and a few hundred video views. This surprised me… I expected nothing, given the subreddit’s penchant for rants, memes, and clips of famous streamers. This post was technical, and it wasn’t an interesting result. However, it still got some love. More than it would have gotten if I trashed it.

So there you have it! Colorblindness doesn’t affect Counterstrike, and I am bad at the game. Enjoy your week!

What domain names do YCombinator companies use?

I recently started a small software business selling flashcards. I’m working out the particulars, but it’s the first time I’ve done something like this; my previous two engagements were a military subcontractor (~200 employees), and Google (more than I can count). So this is an exciting time for me!

The first thing I discovered is that I need a name. The government wants to know it, banks want to know it, your friends want to know it, and your family wants to know it. Also, how can you namespace your code without one? If you’ve struggled to name a variable, a pet, or a human, you can get a ballpark estimate of the difficulty I had naming a business/product.

After spending a couple of days brainstorming, both solo and with some friends, I remembered that Paul Graham had written something about naming.

I reread his advice, and there was lots to feel sad about:

  • It’s important, since owning $companyname.com is a strong signal
  • If you screw up, you probably need to try again
  • You likely need to spend money for a .com

And a ray of sunshine:

  • The “good enough” bar is pretty low

“But wait,” I thought, “what the fuck does Paul Graham know? The only statistic he mentions about new YC companies is that many own their own name, but owning ☃.com and being called Unicode Snowman could meet that standard.”

With that in mind, I dug up a list of YCombinator summer 2015 companies, to see if they actually follow his advice, and how they do it. I could have gone back in time further, but I’m even newer than they are, so I didn’t see what historical data would give me besides a more compelling blog post. There are apparently 86 whose websites were still operating when I made this list, so that’s pretty good!

First, do companies use .com addresses at all?

The number of websites used per-TLD

Yes. It’s not even close! .com is still the preferred domain. OK, so I can’t do a business with Vincent Adultman at thestockmarket.biz.

Next, I wanted to categorize how people got their domains from their company name. I came up with four categories:

  • Straight .com: The company has $name.com
  • Word added: Websites like getdropbox.com are word-added domains
  • Different TLD: twitch.tv is a different-TLD domain
  • Abbreviation: There was only one, but I couldn’t wedge it into another category. They just abbreviated their name to get their domain.

What do we see?

The source of a domain name

OK, most companies got the straight .com. As Paul Graham notes, it’s close to ⅔ of them.

Finally, I wanted to know where the uniqueness of names came from, if anywhere. I split them into the following categories:

  • English words: Combining enough words together until you get a cheap domain. From the YC S16 batch, transcendlighting.com fits the bill.
  • English word: Something like parse.com. I separated this from “english words” due to the unlikeliness of finding one that was affordable.
  • Gibberish: Something you’re not going to find in any dictionary, like oovoo.com
  • Foreign word: a word you won’t find in an English dictionary, but would in another.
  • Misspelling: From the current batch, lugg.com fit into this category.
  • Word+number: 37signals would fall into this category.
  • Religious name: There are many gods across all mythologies, and some of them didn’t rape OR pillage. Try these.
  • Name: Naming your product after a human name of some kind; bonus points if it’s an unusual name, or an unusual spelling of a common name
  • Abbreviation: For when your domain name doesn’t really matter.

And we’re left with this:

A graph showing how each domain name ended up unique

So what conclusions can we draw?

  • Paul Graham’s advice is current, with respect to YC companies
  • .com domains are overwhelmingly popular
  • The most common domains combine English words. The days of startups named things like strtp.ly are over, and thank God.

After doing this, I made a list and started reaching out to owners. Interesting findings:

  • For how scammy they are, domain squatters are reliable. The reps were very responsive, and nobody tried to scam me. They tried to overcharge me, but that’s just business. I’m not going to recommend any, because I hate the business model, but I was pleasantly surprised at the experience of being overcharged.
  • Domains that were defunct (either they don’t resolve, just throw server errors, or haven’t been modified in 5+ years) never respond. By my count, I contacted 15 people with defunct domains, got 0 responses. The Internet is littered with the corpses of defunct domains.
  • The inspection period that everyone uses on escrow.com is shorter than many DNS registrar’s “my site was hacked and DNS was changed, please revert!” window. Not sure who is preventing anarchy here, besides the reptuation of escrow.com and the domain registrar. Everything worked out fine, and a cursory search didn’t reveal horror stories. Maybe it’s fine?

Where am I today? I got the domain reviewninja.com. I had to pay a domain squatter for it, which sucks. Thankfully, it wasn’t a lot of money, and a few rounds of negotation didn’t take much time. I put up a landing page (and wrote this blog post) so I can start to get some authority from Google, but it’s not ready for visitors yet. The copy doesn’t even mention flash cards yet! What a n00b. I’m going to start following the 50% rule for growth traction (from Gabriel Weinberg’s book “Traction.“) Accordingly, I will spend half of my time on marketing. reviewninja.com will go through quite a few iterations in short notice.

Transcript of “I am colorblind, and you can too!”

Author’s note: This is a transcript of a talk I gave at Queens JS. I wrote a colorblind Twitterbot, and I presented my findings in this talk. I’m recording it here, for posterity. To save you from my verbal tics, it’s not an exact transcription. I cleaned up the sentences more readable. I didn’t change the meaning of anything.

Link to source code

Link to slides

Welcome to my talk, “I am colorblind, and you can too!”

I am going to take you through the process of how I built a colorblind Twitterbot. Who am I? I am Jake Voytko. It’s very easy to find me online, I’m jakevoytko@gmail.com, @jakevoytko, I’m jakevoytko@any-service-you’ve-ever-heard-of. And I’m red-green colorblind. I actually have a severe version of it. Most people who are colorblind have a mild version, and it turns out that I don’t see much color at all. We’ll get into that later: just how drastic the difference is. For a while, I worked on Google Docs. But now, I’m funemployed and trying to learn new things, and work through some side projects. And this talk is some of the output of this time I’ve taken. I hope you enjoy it!

Before we get into it… I’m sure the colors on the projector are reproduced perfectly, but just in case they’re not, and there’s any confusion with the pictures, I have the Twitterbot running right now. It has all of the pictures of the talk. If you want to go to @JakeWouldSee, or tweet at it with an image in the tweet, it’ll send back the colorblind version of your image. But the problem is that I haven’t load tested it! It is probably going to fall over if everyone does it. But, we’ll see how that goes. It’ll be fun!

A rainbow kite

Normal vision

Rainbow kite, as seen by a protanope

Protanopic vision

So, the talk is divided into three sections. First, I want to talk about what people see when they see color. And once we know that, it’ll be easy to talk about what I see, and how it differs from what a normal person sees. Next, because my colorblindness is so severe, it’s easy to model. And I’ll draw some graphs and show you what that looks like. And finally, we’ll go over what the results are.

Part 1: how do people perceive color?

Red+green doors

Normal vision

Red/green doors with protanope vision

Protanopic vision

Do these two pictures look different to you? audience murmurs “yes” They do? They look almost identical to me. So this is funny. These are two doors that are at Google New York, where I used to work. And all of a sudden, one day they were painted and I didn’t realize it. Apparently, the door on the left is red, and the door on the right is green. In the images, the normal version is on the left, and the colorblind version is on the right. And you’re only supposed to walk through the green door. That was an interesting “Today I learned” for me. audience laughs at my life

So, let’s get into it! How do eyes normally work? So, all of you who can see color normally, you have these three cone cells in your eyes. They each detect different parts of the color spectrum. There are the long-wavelength ones that detect the colors that I list here: red, orange, yellow, and green. You have medium-wavelength ones that pick up, very strongly, yellows and greens, and you have these other short-wavelength ones that pick up blue and cyan. Basically, your brain takes the responses from these three cells and combines it into a color.

To give an example, if the long cell is going off the charts, “I’m getting a really strong reaction!” and the medium and short ones are getting a weak reaction, your brain will take that information and say, “the color that you are looking at is red.” And that part of your vision will be interpreted as red.

It’s useful to know that when we’re doing work with colors, we’re not doing work with the full spectrum, but we’re doing work with the computer’s representation of colors. It’s useful to know how computers represent colors. It’s actually very simple.. you may remember from school that there are three primary colors. And it’s not a coincidence that there are three primary colors, and three types of cone cells in your eyes. The idea behind the primary colors is that each one of them targets one of the cone cells in your eye. If you manipulate the amount of that particular one, it’ll cause that cone cell to respond strongly or not.

Using that concept, you can reproduce most colors that humans are capable of seeing. You can’t reproduce all of them, which I found interesting. That was something that I learned when I did this project. So, most of you work with Javascript, and I assume all of you have worked with frontend stuff. You may know that RGB (Jake: sRGB specifically) is a very common way to represent colors. Red, green, and blue are the primary colors. And if people in the back heard people in the front laugh, it’s because I show pictures of the primary colors on the bottom. And they are red, green, and the landmark Miles Davis album “Kind of Blue.”

So.. what’s different about me? You guys have three type of cones in your eyes. I only have two types of cones in my eyes. I was talking to my friend Sam (who’s in the back right now) about this, and he was telling me I’m actually missing the gene that codes for these long cells. The ones that respond very strongly to red, and also pick up some greens and yellows, and oranges, is just completely missing for me. When my brain is processing the colors, I can’t really differentiate a lot of these colors. The consequence of this is that I don’t really see reds much at all, and I get weaker responses for many of these other colors.

So, to quantify how different my vision is.. if you ignore brightness; bright green equals green, equals dark green, and just look at wavelengths of light.. normal people can differentiate about 150 colors. In your head, guess how many colors I can see with only 2 cones. someone doesn’t follow instructions and yells “100!”. I see 17. audience gasps at how dull my life must be. But that’s just pure wavelengths of light.

To talk about how different I am from a normal colorblind person: have you seen those videos where people put on these glasses, and they cry at sunsets because they’re suddenly so beautiful? Well, those people have a partial response to one of the cones. They have two really strong cones, and one really weak one. The other two kind of dominate the third one. Those Enchroma glasses attenuate the signal response that the other two cones have so that you get a more balanced response between the three colors. You can get a much more balanced color perception, and you can differentiate more wavelengths of light than you could before.

Peanut butter

Normal vision

Peanut butter with protanope vision

Protanopic vision

Another interesting fact that I found out… so; I took this colorblind test for the Enchroma glasses, and it told me that I’m “too colorblind for these glasses to work. But as a consolation prize, here’s a bunch of information about your colorblindness.” I’m reading through it, and it’s mostly stuff that I’ve read before. But at one point, there was this one sentence that said something like, “protanopes (this is the type of colorblindness I have) will even perceive color in the wrong part of the spectrum. For instance, they will perceive peanut butter as green.” And that completely blew my mind, because for my entire life I have seen peanut butter as green. On the projector is the colorblind-processed version of the picture. I’m not sure what it looks like to you; I haven’t shown this picture to anyone. But this is about what I see when I seen peanut butter. Apparently peanut butter is brown. Who knew?

Part 2: Modeling colorblindness

So this slide is dense, so we’re going to spend some time on it. But this goes through how this is modeled. So instead of using RGB, there’s this other color space that is useful for working with color. And that means you can change any RGB pixel into something that lands somewhere in this colorspace. The cool thing about this colorspace, is that it separates luminance (the brightness) from chrominance (the color). To show you how colors land on this, you see this upside-down U thing, where the rainbow follows it

xyY colorspace

xyY colorspace

If you look inside, you’ll see that I’ve drawn this triangle. The corners are R, G, and B, which you can see through my childlike scrawl of handwriting. I have very meticulously and very scientifically reproduced this chart by hand. These are about where the primaries land, that your computer monitor uses. Anything that your computer monitor is capable of reproducing, is inside this triangle. You see there’s a bunch of stuff that your monitor can’t reproduce. But this is a little misleading; green and blue are next to each other in the color spectrum, but you see there’s quite a bit of distance on the curve between them. So there’s not much that can’t be reproduced by RGB.

xyY with confusion lines

xyY with confusion lines

How is this useful? On the bottom is how my program works. It, again, looks like a lot of childlike scrawl. But, it’s not hard when you know what it represents. Normal people can see everything that’s inside of this U. And all of my color vision lands on this one curve. These are the only things I am capable of seeing. (Jake: this is oversimplified) So you see all of these rays that meet at this one point, each ray represents all of the colors that I confuse. Any two colors on this line will look the same to me. My program calculates the line, and intersects it with this curve to produce the color estimate of what I see.

So, I made the first version of the algorithm, and the results were weird. Like, a lot of colors were way too bright. But I couldn’t find the bug; all of my code was correct. I checked it a thousand times. I couldn’t figure out what was going on! And finally I found this paper from 1944 that shed some light on it. audience laughs at how out-of-date my research is I know, right? But eyes weren’t different back then, it’s still good.

My entire life, I’ve always perceived red as being so much darker than green. And people when I was younger would always ask, “WHAT COLOR IS THIS? WHAT COLOR IS THIS?” And I was always able to tell apart red and green because reds were very dark. And people were like, “you’re not really colorblind”. Yes, I promise you. I am. And people would kind of nod when I said they were darker, but we were never talking about the same thing. Apparently, a lot of reds I see at 110 their actual brightness. Just because I’m completely missing that whole red wavelength cone receptor. At the bottom of 24 pages, the paper had this tiny little equation where you can model luminance. It uses another color space, XYZ, and it’s really easy to get that from RGB. And then you can produce the brightness that I see of the color.

At that point, you have enough to go back to RGB. I said that one color space (Jake: xyY) had all of the colors (Jake: waving my arms on a plane), and then all of the brightnesses (Jake: pointing out an axis orthogonal to that plane). And using those two bits of information, you can convert back to RGB, and get an actual estimate of what I see. For all of the “after” images, that’s what I’m doing.

Now that we have all that information, it’s pretty easy to write a Twitterbot. Are people sending it tweets? Is it still up? people in the audience nod “yes” OK cool! Good for it. I wrote it in node.js, which is why I’m here at all. A lot of the stuff here was really interesting to me. You know, I worked at Google for a while, and they take care of things like authentication for you. A lot of stuff like OAuth was stuff that I’ve never worked with.

So for anyone who’s worked with something like OAuth, you know that to run a bot on something like a Twitter account, you need to authenticate as the user. When the user grants the application permission, you get this nice little token and secret pair. Any request that you send to the Twitter API, you send the token with it. You sign it with the secret and produce this hash. And that’s enough for Twitter to say, “yes.. the bot is allowed to do this.” At any point, the user could revoke the token and all of the requests would start failing. So if someone figured out the password for my Twitterbot, and revoked the password, that would take it down.

Twitter also offers these nice persistent streams. They have this user stream endpoint, where you send a hanging GET to this request, and it tells you that you have a tweet. And you don’t need to poll. My bot sits there and waits for Twitter to tell it that it has a message. And then it gets a message, and it can say, “are there image urls?” It can download the images and do the conversion.

So my Twitterbot ends up as this nice little pipeline. It’s mostly around manipulating the Twitter API itself. And because Twitter does everything for you, I was surprised how little I needed to do here.

Part 3: the results

So next, let’s go to how I did. Basically, it’s hard for you to know if it did a good job or not. It’s subjective, right? It’s correct when I see the before and after the same. I’m going to walk through a few images and tell you what worked, and what didn’t.

Flamingo painting

Normal vision

Flamingo with protanopic vision

Protanopic vision

So, do these images look different to you? lots of people say “yes” They look almost identical to me. I ran my program; I wrote a script for it, over every image I’ve ever taken. Which was a couple thousand. And I calculated something called the “root mean square deviation.” Which is basically a long way of saying that it penalizes errors very harshly. Any time it finds a difference, it squares it instead of just using it. You sum all those squared differences together. That finds images that have regions that are drastically different. This was the one that came up as the most different. It’s ironic that it’s this image for several reasons. Primarily, I painted it myself. audience laughs. It was a BYOB painting class, and I got very painfully detailed instructions on how to mix the paints and produce the flamingo. That just goes to show that with enough instruction, I can accomplish just about anything.

Conversation with my friend Lindsay, where she tells me my life is sad

Conversation, wherein I discover how happy my life is. E_NO_HAPPINESS

When I was testing it, all of these images looked the same, before-and-after to me. So I needed to constantly ping all of my friends and ask them, “what’s different? what’s different? what’s different?” I was having a conversation with my friend Lindsay about this one, where she goes over all the differences. She says, “there’s a lot of oranges and pinks, and then in the second one everything is dull gray and yellow.” And I was happy, I say “Thanks! They look almost identical to me.” I was happy that they were different for her. And she sends me a frownie face. With the message, “your life is sad.” people in the audience laugh, secretly agreeing with Lindsay No it’s not!

Normal vision

Sushi with protanopic vision

Protanopic vision

Another interesting image people in the image laugh and groan. Alright, this is a good one! I had Sara Gorecki look over my slide deck, and make sure I always had the before and after images, and I didn’t realize that this would get such a strong reaction. She pointed this one out in particular, as being interesting to her. And apparently to everyone. She said that the sushi on the right looks rancid, and she said she would not eat it. audience laughs in agreement with Sara. But then she asked me an interesting question; she asked, “does sushi look appetizing to you?” And the salmon in the middle does. It looks tasty. But the other two do not look that good. I know it’s good.. my friend Tyler made it for me. He’s not trying to poison me. I try new foods, so I eat a lot of sushi. So I know it’s probably good.. it doesn’t smell. A lot of the foods I eat, it made me realize at a mental level than more of a physical or visceral level, that it’s good.

Ping pong table with normal vision

Normal vision

Ping pong table with protanopic vision

Protanopic vision

And there were some images where there were differences. Where it didn’t produce an identical before-and-after for me. And these are very interesting for me. For this one, does the table on the right look brighter than the table on the left? people say yes. Again, I was talking to Sara about this, and she called it out as an image that doesn’t look very different before and after. She was saying I should maybe try to find another image. And that was interesting to me, because I view the one on the left as being maybe 50% darker than the one on the right. This is an image where it looks like it failed to me; where it doesn’t reproduce what I see. My theory of what’s going on is that there’s more red on the one in the left. And you know that I systematically perceive reds as being very dark. So it looks more different to me than it does to her. But I didn’t look up RGB values, so that’s a guess.

This blue error pops up in a few places. Here, the shirt I’m wearing on the right looks way brighter than on the left. And in this one, I can actually see a color shift. My dad has a shirt on, and I can see that the one on the right is greener than the one on the left. I’m not sure what the color difference really is; I can just see that it shifts.

Hannah with normal vision

Normal vision

Hannah with protanopic vision

Protanopic vision

The color shift appears much more strongly in this picture. This is the 5th most-changed image out of any that I’ve ever taken. But in this one, I can actually see that the table shifts from red to green. For this particular shade of red… actually, I should have asked someone first. Is it red? [[audience says “yes”]] Thank you! For this one, I can see a difference, so it didn’t do a good job of reproducing it. But then I was talking to my friend Hannah about this, and she said that everything about the image was different, even down to her skin color. So I think it does a mostly good job of capturing what I see, but it actually fails on a large region of the image.

So that takes us to the end of the talk. I want to briefly summarize what I went over, since I know there was a lot of information. You see color with 3 cones I see color with 2 cones That means I see roughly 110 the number of wavelength you see Because my color vision is so bad, it’s easy to model You can calculate these confusion lines in xyY to calculate what I see the same, to find an estimate of the color I see I wrote a Twitterbot in node.js that works on this. I’m going to keep it up, feel free to send it images The images look pretty much the same, before and after. So, it did a pretty good job.

Q/A

Does anyone have any questions?

To the best of your knowledge, is there a degree of variation between the colors that people see, that don’t have any type of colorblindness?

The answer is yes. The paper from 1944 had a lot of information on this, actually. It said the color of your lens, the color of your aqueas humour (the gel in your eye), your eye pigmentation, there will actually be quite a bit of individual variation. But it’s not as strong as the difference between the different types of colorblindness, and the difference between colorblindness and normal vision.

Do you have your phone on you right now? Can you take a picture of the crowd and send it to the bot? I’d love to see what we look like to you.

QueensJS with normal vision

Normal vision

QueensJS with protanopic vision

Protanopic vision

What did you use in node.js to change the colors of the images?

I wrote my own code to do this. I did not, for the life of me, want to do image encoding and decoding, because that’s horrible. So I used a library called LWIP that some gentle citizen of the Internet put up. By the way, this is all on Github. If you’re curious about how the code is written, I’ll send this slide deck out somehow, and there’s a link at the beginning where you can look at it.

Do you ever have issues with advertisements, where they think they’ve done well contrasting colors, but they get merged together?

I have a lot of problems with advertisements, but color usually isn’t one of them. Every now and then, something will look weird to me. For instance, the new “Late Night with Stephen Colbert” sign actually looks a little strange to me, because it’s red-on-blue. But it’s a dark blue, and the red looks dark red to me, so the sign looks muddled in general.

As a frontend developer, what can I do? What kind of palettes do I have?

The answer is that I have trouble answering that, because I’m colorbind.

Has your algorithm identified any palettes that are better than others? Should we fall back to the earth tones of the 70s?

Having been on the trains in NYC, you shouldn’t do that.

Does ARIA define any color palette that should help?

I’m not sure. But I have spent a lot of time with the Adobe color picker, and that generally does a very good job of bringing up differentiable colors. So if that gives you something like your color scheme, you’re probaby fine.

Do you have a favorite color? I wore a lot of black in high school

What color was the dress Ohh, I should have added that. I could never see white and gold; I only ever saw blue and black. some people in the audience cheer Epilogue! I color-processed that picture and showed it to my friends. And friends who could see one-or-the-other, could still see white and gold in the picture. So the problem was on my end.

The Dress with normal vision

Normal vision

The Dress with protanopic vision

Protanopic vision

So, there’s this guy with an antenna, and he brings an object up to the antenna, and it announces the color in front of it. Have you considered something like this?

So, have I found useful tools for knowing or feeling what color something is? I’ve tried some apps on my phone to do this. When I’m clothes shopping, if I’m not sure if something’s blue or purple, I’ll put my phone up to it. But it’ll always give me these unhelpful labels. It’ll tell me “this color is schooner, or earl tea”. It never gives me the actual color name, so I never know what I’m buying. (Jake: one time, it told me a shirt was schooner, shady lady, tea, concord, and abbey. I uninstalled the app and asked the saleswoman)

Have you ever thought about making a browser extension to change colors to something you can see?

I have considered that. It’s a harder problem. You kind of need to know how to spread the colors out. But I know a lot about this now, so I could probably figure this out. (Jake: when I was answering this question: I thought of basically dividing the xyY plane into regions, and using dynamic programming to figure out how pivoting regions around the confusion point might give me better color differentiation). But I haven’t yet.