Skip to main content

Blog posts

We haven't adapted teams to the magic wand yet

This does not reflect the opinion of my employer

TL;DR: we are DDoSing each other with code reviews, but it doesn't have to be this way.

Software engineering was solved until about last year. More or less.

Most projects got finished. We know which roles to hire and what levels they should be. We even have a default process, Scrum. If you can't lead an engineering team by yourself, just rub some Scrum on it. It won't be the fastest team in the world, but by golly they will finish the project at some point.

Projects still have lots of problems of course. But we're not bemoaning our ability to ship software like we did back in the '90s and '00s. We know we can do it. We can ship in spite of the problems around us.

Tech leads and project leads are a big reason for this success. They guarantee engineering outcomes. They collaborate with design and product. They lead the technical design. They are the primary reviewer for the project. And what happens when you make someone responsible for the technical execution of a project? They go into the important code reviews, even the ones they're not assigned. Even if they don't say a word, you know they're reading over the code looking for unhandled error cases and race conditions, making sure there aren't fatal flaws under the surface.

This "lead" is a strawman of sorts; it doesn't have to be one person. Maybe you had a cabal of 3 engineers out of 10 that guided the engineering, or you had a small stacked team and they could all truly retain context and hot swap for each other. But for the remainder of the post I will talk about "the lead" and we will all know what I mean.

The lead is the hub in a "hub-and-spoke" team model, since code reviews flow into this central point. Sometimes you'll send individual reviews to other people. But again, if someone is responsible for the technical delivery of a project, you know they're looking at what everyone's doing. You can't avoid this fanout, and it has been our secret sauce for a while. But our new magic wand is turning this fanout into an antipattern.

The magic wand

OK, well, something changed. We have a new magic wand. This magic wand vomits code at an impossible rate. And the worst part: the magic wand is pretty good! It's in a dangerous sweet spot. It can generate an entire "working" website, soup to nuts, from a single large prompt. And hopefully, you took the time to make sure that the API keys aren't on the client... and that the endpoints check auth... and it's doing something with CORS... and 2 dozen other fiddly bits necessary to launching in production... and its not leaking debug errors with important information in 5xx responses... and before long you realize that the generated project wasn't even 50% of the way to a production system.

This magic wand is pressuring our tech and project leads. Don't believe me? Go ask one who works with a lot of agentic coders. "How has code review been feeling lately?" And you'll get a sigh and they'll tell you, "these tools are great but it's a lot to keep up with." I don't know where the performance ceiling is for these tools. But it's obvious that they will produce code faster and faster over the immediate future. This pressure will only increase.

This is creating an interesting problem. The hub will get DDoSed in all of these hub-and-spoke team models. This means that your most senior engineers will be spending a disproportionate amount of time reading and reviewing code.

This will create an even more interesting problem: a paradox! Our most senior engineers will practice less with these new tools, because they're spending their day sweating over line 351 and asking themselves "is it REALLY okay for this module to take on a dependency to the database?" because these are the kinds of questions that lead to decisions that avoid serious problems down the line. But the more junior members of the team are spending their time getting better with agentic programming. They may even start to drive how it's used at the company, while the more senior engineers begin to lack the experience to make these judgement calls themselves.

Software engineering isn't solved anymore. But we're still following the old rules and ignoring the magic wand and its impact.

What can we do about it?

Here's the disappointing part of the post: I don't know!

But you should have seen that answer coming. I told you that software engineering isn't solved anymore! How can I tell you a solution if I don't believe it's solved?

As a consolation prize, I want to higlight some tools and experiments I think are promising in the short term. The situation is evolving rapidly enough that I can't assert an expiration date on these.

Pair programming / the buddy system

When I joined Google in 2010, Google had a regimented code review system. I'm sure it still does, but I haven't worked there for a decade and I can't be bothered to ask anyone there now. Every changelist needed approval by another engineer. Between you and the reviewer, someone needed to be in OWNERS for that directory and someone needed to have "readability," a.k.a. clearance to write code in that language. And even if you had both permissions, someone still needed to explicitly approve your CL.

But there was a neat workaround. If you pair programmed a CL with a second person, you didn't need to get it reviewed by an external party, assuming that you and your pair had OWNERS and readability. This might not have been written down anywhere, but it was a logical application of the rules. One person sent out a changelist and another person approved it in the system. You also happen to be coauthors, but that wasn't forbidden at the time.

And that was a big deal at Google. Code reviews could get really bogged down. Some people just didn't review code that often, and some people just loved bogging down reviews in nitpicks that couldn't be found in any style guide. I knew a platform team that only reviewed external changes once a week, and if they left comments you needed to wait for the next week to hope to God they hit approve. A shortcut was a big deal.

But nobody pair programmed. I sure didn't. I hate pairing unless we're bug hunting or someone's getting training. It feels like a waste to burn 2x the engineering time when everything is going well.

But it's a potential solution to the hub-and-spoke problem with the magic wand. Here's what I'm imagining: a team consists of staff/junior, staff/senior, and senior/senior pairs. These aren't permanent pairings; they're just today's arrangement. Each team prompts together and looks at the output together. The pairing has enough combined seniority that the pairing can own technical decisions. They have the authority to decide that their code can be shipped.

This has an important caveat. These pairings must understand when they need outside input. They need to gossip to the other pairings if they need to highlight an architectural decision or a bad assumption. Or if the pair cannot come to an agreement on a decision they need to find tiebreakers. But ideally these are exceptions; they would be prompting together and reviewing together. By the end, both engineers agree on the technical outcome and own the decisions.

In fact, this would become part of the definition of junior, senior, or staff engineer; how much you're trusted to ask for input when you need it.

They don't have to literally sit with each other for the whole day. They just need to both be responsible for the prompting and agree with the direction of the final code that ships. They don't need to sit together when they're updating documentation or having meetings or shitposting on Slack. But at a certain point you're having a conversation about it and making sure the architecture is reasonable and the verification is correct, and ensuring that you don't need to raise any problems with the team.

How would this look on the previous example of a four IC and one project lead team? Maybe you have two senior:senior pairings, one staff:junior pairing, and then the final floating engineer is situational. Maybe they're performing individual IC work and it will be reviewed later with one of the pairings. Maybe the project lead is kinda doing two pairing assignments at once (instead of effectively the four they had previously). I don't really care; it's your team. You figure it out. But the important thing is that the project lead's workload doesn't scale with team size; the number of pairings does.

I haven't literally pair programmed with someone else yet in this manner. But I've worked on some two-engineer projects recently and it felt pretty good. Each of you have a default reviewer, and nobody is getting overwhelmed by N magic wands.

This has some benefits. First, it provides a concrete path to hire and train junior engineers for your organization. Even if you believe that the software engineering occupation will be decimated over and over by advances in the technology until finally one of Sundar, Sam, or Dario are holding the head of the last engineer, admit that you still need a way to teach new people how to do it. Second, it provides a role for staff engineers as a level, which obviously I appreciate as a staff engineer.

Product engineers

For a few months, I've been saying that I need to become a product manager before a product manager becomes an engineer. It turns out that that already existed, but I arrived at it independently. With the advent of increases in coding velocity, it becomes possible to start projects closer to the final implementation than ever before.

When I finally caught up on my unread backlog of The Pragmatic Engineer newsletters recently, I found an issue with the subject line "The product-minded engineer", which was an interview with the author of the book with the same name.. This was a book about the need to grow your empathy with the user, and ways that technical skills and product skills can mesh together.

Why is this important? Look at the areas in LLMs that are seeing rapid development and rapid adoption. They're all dev tool related! Developers can be insanely productive nowadays, assuming they don't need to figure out what someone else needs them to build. But as soon as the topic is not "development" the process grinds to a halt.

But most companies aren't like that. If I've learned anything from working for B2B and B2C companies, it's that you can't possibly guess what people need without an obsession over qualitative and quantitative feedback. Thus, I believe that engineering is going to be more and more vital in the discovery phase of projects, where you're not even sure what to build. The ultimate software engineer will be one that can perform the product discovery work themselves. It'll be the ones that get better at producing up-front prototypes and iterating on those prototypes.

Have you ever seen a designer in a user research session, just tweaking upcoming mocks as a participant speaks to tailor it to them? Or chatting with a PM and calling an audible to tweak a major part of the mocks before the next session? Engineers who can do this kind of work will become more valuable because they will go beyond just putting hypotheticals for reaction. They will be able to produce working systems for reaction. And sure, maybe they are only 50% prototypes and there is still a bunch of productionization work. But it's clear how adding more firepower to the earliest product iterations will only improve discovery.

I'm sure someone's gonna be like "oh no, the LLM will just be the product manager and the designer and the researcher." Really? You're going to do research for a dating app by putting Codex in front of someone and having them explore a user interview with questions like "So, puny human, is your situation more about copulation or procreation?" I don't see it.

So yeah, I think there will be a period of time where the lines between discovery and execution will blur. I've never worked for a proper startup, so it's possible I'm just making an assertion like "more and more companies will need to act like a startup" or something. But I'll let the startup people assert that for me.

I think this will help address the hub-and-spoke problem because at the start of the project, you start with a system that is already halfway there. You just need to refactor and add tests and productionize. This will reduce the scope of projects (or more accurately, move a lot of scope to the discovery phase) and reduce the during-the-project review workload.

AI code review

This one is exasperating. You have a magic wand that generates a pull request description, commit message, and code. And now you want to check if that magic wand did good work. So you wave the same magic wand -- but held differently! -- and now it's going to see why this code was such a bad idea? It sounds stupid when you say it out loud.

But at the moment, they're actually pretty good; I'd wager that they find more nitty-gritty problems than I do. They do all of the rote callsite checking that you might overlook. They catch swapped parameters of the same type by noticing name mismatches. They will notice when you try to set a dangerous or weird config value.

I mean, not RELIABLY. Half of the comments are horrible.

"Oh no, you changed this!", said the bot.

"Buddy, that's the whole point", said Jake.

But it's a good first pass. I'd be comfortable if my company adopted this rule: "You can't ask for human review until you do a pass with the bot and satisfy its comments." It removes silly errors so that the code reviewer can spend time focusing on the big picture.

I don't think this is some panacea. If an agent produced a major architecture flaw, I don't expect its corresponding reviewer to notice the flaw either. But it adds more value than noise at this point.

To summarize

  • We used to love hub-and-spoke team structures, where reviews would fan-in to a lead engineer responsible for technical execution.
  • LLMs have increased execution velocity, putting lead engineers under additional strain.
  • We need to rethink how to scale teams without scaling the lead's workload.
  • This isn't a solved problem, but there are a few options.
    • Working in pairs / the buddy system, where the pair has enough authority and responsibility to make decisions and ship.
    • Getting engineers more involved in the discovery process
    • Having the bots help out with code review, to remove obvious problems before a human looks at it.

Claude Code is a great Dad side project environment

I finally did it.

I moved my blog off of Wordpress. It's running on a Go server on a small Digital Ocean droplet.

Why now? Because side projects are fun again. I'm excited about software engineering for the first time in about 15 years. Agentic coding is so new and unsolved! And even better, now I get to make political statements just by saying which agent I use. What a time to be alive.

Oh sure, I've had a lot of fun coding in that time. I've gotten excited about a lot of problems. But it was never software engineering itself. And side projects eventually burned me out. Especially once my toddler was born. I don't want to pick up a project, fight exhaustion, hit a weird error, and yak shave for an hour while hoping that I have fun tomorrow at least.

But wow, Claude[0] really fixed that. I'm not new to Claude or Agents; I've used some version of Copilot/Cursor/Claude at work since Copilot first came out, and I've been using agents for about a year at work. But work's different than side projects. At work, I can't vomit out 30,000 lines of code and hold it up and ask, "Is this anything?" But I now regularly do this at home as part of exploring how far we can push the tools.

And you know what? It's amazing for dad side project time. It can meet me wherever I am.

Is my wife walking to the store with our daughter? That's 20 minutes, I can write a prompt and let it churn once they come back.

Am I exhausted after both my daughter and my pager wake me up? I can just click through my project and whine about the parts I don't like, and Claude will dutifully fix it all. Or maybe I can just vibecode a huge project with the goal of learning something.

Do I have a few hours? Great, let's really bend this codebase to my will. I'm going to micromanage this to within an inch of its life.

And one of the famous slogans from "The Mythical Man Month" was "Build one to throw away." I.E. you should invest time to prototype before building the final system. Claude really changes the cost dynamics; you can build a prototype, prototype a second approach, prototype a third approach, refine the third prototype, and then the production system is within a stone's throw.

The actual Wordpress port

I've wanted to move onto a VPS ever since the Wordpress drama happened years ago. But the juice never seemed worth the squeeze. I mean, I had fewer than 30 blog posts on this blog and just a couple of pages. Why bother, right?

But I first signed up for the Claude Pro account, and I tried thinking about projects that I might be able to one-shot within its narrow token budget. The blog port was a natural fit.

So I wondered if I should just convert the posts to Markdown and host them on Github pages or similar. But I liked the idea of being able to have dynamic server-based content[1].

Overall, I tried to one-shot the port at least 15 times.

In the beginning, I gave it really simple prompts. Basically, "Port www.bitlog.com" to a Golang server with Markdown files storing content." These failed horribly! They'd just make a basic Go server and a few fake posts.

Next, I prodded it to download the content. It would try for a while but I would eventually run out of context. I tried asking it to make a tool to scrape each page, but it tapped out and asked me to export the XML instead.

So I downloaded the XML dump and started telling it, "The XML dump of a Wordpress install is in this directory." And my prompt grew and grew with each telling. So many things needed to be fixed. It linked to images on my remote server instead of hosting them. Pages included Wordpress styling. Opus 4.6's first attempt rendered completely blank pages.

At some point, I started experimenting with subagents and immediately started running out of tokens. This was the point where I upgraded to a Max subscription. That's how they get you and it worked. Well played Anthropic.

I then was looking for a Beads alternative and found beans. I liked the idea of beads. I just wanted an implementation that... evolved a bit slower. Beans was another increase in power. My current experiments involve subagent teams, which are producing mixed results.

But eventually, I wrote this prompt, and I looked over the output. I realized, "This version has a lot of problems, but this is close enough. I can productionize this."

Refining the output

I started comparing the local markup with my Wordpress server. It skipped a bunch of meta tags like OG tags, Twitter markup tags. I made a lot of changes to the visual design (graphic design is my passion), information architecture, etc. This kind of work was great when I was exhausted; I could just whine to it about not liking how the header was styled and it would go and fix it for me.

Then I started asking it to e.g. find accessibility issues. And it came up with some good ones, and suggested good things like having a "skip to content" element. And I noticed something funny! Whenever I commanded it to generate a list of issues that included severity, it would generate a list with 1-3 severe issues, 3ish medium issues, and 3ish low-severity issues. I find that I had to specify what I mean by "severe" for it to generate an honest list; like, "Judge all issues relative to a "severe" flaw that would render the site completely inoperable, like a focus trap."

Deploying

I created a Digital Ocean droplet, pointed DNS at it, and set up SSH keys so that SSH commands would work without needing in-band authentication. And then I told Claude that I wanted to set up Ansible and a reverse proxy, harden the server, etc. It churned for about 15 minutes, and at the end of it my blog post was deployed and all the configs were right.

And then I had to talk Claude off the ledge. Something about its environment was preventing it from seeing the page on HTTPS. I could access it just fine over both HTTP and HTTPS. And then we were live!

Conclusion

I converted a blog from Wordpress to my own Golang server, even though I don't have much time. I am excited about software engineering itself for the first time in 15 years. I have my own theories about how the profession will evolve over the next 5 years[2], which will be the subject of my next blog post.

First, obviously this took longer than doing it myself, given the number of iterations it took. However, Claude could work when I couldn't. It's indefatigable! On nights that I was too tired to code, but didn't want to play a video game, I could just whine to Claude and it would fix the problems I saw.

It was also a playground for a while. Almost a "code kata," except I wasn't trying to execute a perfect form. I was just walking down a well-worn path, seeing what happened each time I changed a variable or three.

But I did it. I deployed it. I'm happy with the results. And now I'm curious how far I can take this. Can I host my own email server?

Footnotes

[0] You can substitute your favorite agent here.

[1] To be clear, I never will have dynamic server-based content. But man, I love the idea.

[2] TL;DR: We need to become product managers before product managers become coders.

2025 year in review

Professional year in review

I got a lot done professionally in 2025.

For most of the year, I was a backend engineer on the recommendations team at Hinge. We started the year by rolling out my big 2024 project, which was adding Elasticsearch to power new candidate generators in Hinge's recommender. The initial launch was successful: its p99 was 80% lower than our previous Postgres-powered version, and was much more maintainable. For the remainder of the year we added more and more features using this stack, and now it's load bearing for a bunch of product wins. Along the way we learned a lot about scaling Elasticsearch clusters, so I'll try to write a blog post or a conference talk explaining our approach!

By the end of the year, I became the backend tech lead of a recommendations-adjacent team called "Matching." So now I'm spending my time doing the usual TL dance: refining new ideas with product managers, guiding our tech implementations, and then project leading and IC work when everything is humming smoothly. I'm really excited by our roadmap, so I'm hoping 2026 will be a good year as well.

Multiple people have told me that I bring "fun dad energy" to work every day. I'll take it.

Fun dad energy

My daughter is 2.5. I love hanging out with her. She tells jokes that catch me off guard and make me laugh. She loves stories and playing with words. She's trying to guess the first letter of words based on how they sound. She even gets it right sometimes. She's starting to develop real friendships with the kids at school. For the first time, she sometimes wants to just play by herself for 10 or 15 minutes. 2024 was all about her becoming an individual we could talk to. 2025 was all about her growing in complexity. I'm so excited to see what else changes this year.

We tried to develop her athletic side by enrolling her in soccer. Every Sunday, we'd bring her to a field near our apartment. She'd ignore all instructions from the coach. We needed to intensely micromanage her to even kick the ball. Halfway through every class she would jailbreak by sprinting across the field away from the lesson. Her coach was great with kids, which is probably the only reason my daughter was excited to go every week. But a switch flipped after a few months of attendance. She started listening to her coach. She gleefully did everything the coach wanted. And she actually stayed in class instead of trying to flee. We're thinking of enrolling her again for the spring. I think she'll actually want to go.

Much of the rest of the year was pretty mundane. Her sleep schedule stabilized. She's sleeping more than she used to, and so are we. Potty training was hard but achievable. Giving her a balanced diet was hard but achievable. That's a good summary for the year in parenting: hard but achievable.

New apartment

In 2024, our building installed heat pumps. It was a massive project that lasted most of the year, finally ending in November. The 1 bedroom was getting too small for us, so we moved into a rental in the middle of the year. We wanted to sell our apartment but had to wait until the construction finished. And then the unexpected happened: my next-door neighbor put her apartment on the market. Against all odds, she accepted our bid. We moved in at the end of 2025 and I'm living in two apartments next to each other. I still can't believe it. For the first time in my adult life, I have enough space.

Our plan is to combine the two units into one next year. This will be a nightmare. However, it's 2027's nightmare.

We have one advantage: the units might have been combined in the past. Most of the wall between the apartments is a thick load-bearing masonry wall. But I knocked on the walls between the units. There are 2 doorway-sized areas that sound hollow, and are covered in sheetrock instead of plaster. I dug in with a long screwdriver and it sounds like there's a different brick wall between the two (instead of the 12-ish inch thick load-bearing wall). I'm hoping that these are doorways that were bricked over, and that we can just take the brick out and live happily ever after. But I won't know for sure until we take the sheetrock down. Again, 2027's problem!

Newsletter

My daughter had a more predictable sleep schedule this year. So I was better rested and had more free time. I decided to write a newsletter at clientserver.dev. My initial writing prompt was "Money Stuff for tech." I committed to writing two posts per week. So I focused on current events in tech. Since my wife went to bed at 10:30 and I went to bed between 12:00 and 12:30, I had 3-4 hours to write each post.

I wrote two posts per week in the beginning. Over 6 months I got to 268 subscribers. Holy cow, people wanted to read it! This made writing easy: every single post had a guaranteed audience. However, the newsletter took up ALL of my free time. I worked around the clock. I pushed my bedtime later and later to get issues out. I was exhausted. I became more stressed. I become emotional when small changes in my schedule took away time from the newsletter. My wife started begging me to take time off. Once she started trying to intervene, I was like "this is a really bad sign." I stopped working on it in July. Next year, I'm going to start publishing it again without a regular schedule. I think writing is too valuable and I don't want to throw the audience away. bitlog.com will continue to be my own personal writing, and clientserver.dev will continue to be my hot takes on the news.

Having a short stint as a tech writer made me realize a few things:

  1. Sourcing stories on a regular basis is extremely hard.
  2. The good publications are incredible. Shoutout to outlets like The Register that produce high-quality tech journalism. They do an incredible amount of research and writing on a timeframe I can't fathom.
  3. Most publications are absolutely horrific.

For example, journalists apparently don't have time to read anything. They just get a prompt and write and publish. I'd occasionally find things that were (a) mass reported, and (b) trivially provable to be wrong. For example, many outlets reported that Salesforce would not hire software engineers in 2025. But anyone could go to Salesforce's job page and see dozens or hundreds of software engineer job postings. And then you'd read Marc Benioffs' statements and interviews on the subject and you'd realize, "oh, they actually said they are keeping their engineering headcount stable. All of these people are reporting the wrong thing."

Gaming

As I said in the newsletter section, I've had more free time this year. After my newsletter stint, I've picked up more games in my free time.

My #1 game of the year was Hollow Knight: Silksong. What else is there to say? Great soundtrack. Great bosses. A fun moveset on Hornet. If I had any criticisms (besides Bilewater), it'd just be the absence of a challenge boss like Radiance. It feels like a missed opportunity given Hornet's versatility and mobility.

I also sunk almost 100 hours into Blue Prince. Which is a bit weird given that I did not like it! The game is like 40 hours of gameplay spread into 300 because of RNG, and my life is too busy to allow some dude to waste entire days of my life like that. On the plus side, some of the puzzles were genuinely enjoyable. Color was also starting to be a factor in puzzles. I am colorblind with protanopia, which is a severe color deficiency. This game was so subtle and tricky that I never knew whether I was missing some obvious clue, like "this realm's color is only exposed in one place, where you have to notice that a postcard is tinted a specific color." I also understand that color continues to be more and more important as the game progresses.

Spoiler paragraph for anyone curious about how far I got: I was playing with 0 hints or outside help. I love word puzzles and had a lot of fun decoding the Baron's Bafflers, and needed a few visits to the gallery (and a hint from the classroom) to solve all of the painting puzzles. I was clearing out the tunnel, I had gotten most of the way through the 8 doors of the realm puzzle (probably close enough to brute force the remainder). I uncovered the CASTLE puzzle and (correctly) had a few clues that I thought were part of the puzzle. I also unlocked the throne room and figured out how to steal the crown, but wasn't sure if they were connected in any way. I was collecting trophies so that I could unlock the blue tent to see if it did anything or just wasted money. I finished the classroom quiz and looked at the giant pile of hints I hadn't processed yet in my Notion doc, and the list of things that I was waiting for perfect RNG to do, and just couldn't bring myself to try the next thing. So I called it quits and looked up the remainder of the story and major spoilers (Thanks, FuryForged!). I'm not sure if I could have 100%'d the game without help, but I didn't have the time necessary to find out. I also didn't like the story or the lore, although I did appreciate that once you understand the early through late game, it's a story of how a spoiled brat truly earns the right to call the house his own.

I've also rediscovered the joy of just messing around in party games with my friends! Recently we've been playing "Golf With Your Friends" and "RV There Yet" and having a good time.

Looking ahead

My goals for 2026 are pretty simple:

  • Get in shape
  • Take more time off
  • Find more time to write

See you next year!

This entry was posted in uncategorized on January 4, 2026 by jake.