As both developers and stewards of significant open source projects, we’re watching AI coding tools create a new problem for open source maintainers.

AI assistants like GitHub Copilot, Cursor, Codex, and Claude can now generate hundreds of lines of code in minutes. This is genuinely useful; but it has an unintended consequence: reviewing machine generated code is very costly.

The core issue: AI tools have made code generation cheap, but they haven’t made code review cheap. Every incomplete PR consumes maintainer attention that could go toward ready-to-merge contributions.

At Discourse, we’re already seeing this accelerating across our contributor community. In the next year, every engineer maintaining open source projects will face the same challenge.

We need a clearer framework for AI-assisted contributions that acknowledges the reality of limited maintainer time.

A binary system works extremely well here. On one side there are prototypes that simply demonstrate an idea. On the other side there are ready for review PRs that meet a project’s contribution guidelines and are ready for human review.

The lack of proper labeling and rules is destructive to the software ecosystem

The new tooling is making it trivial to create a change set and lob it over the fence. It can introduce a perverse system where project maintainers spend disproportionate effort reviewing lopsided AI generated code that took seconds for contributors to create and now will take many hours to review.

This can be frustrating, time consuming and demotivating. On one side there is a contributor who spent a few minutes fiddling with AI prompts, on the other side you have an engineer that needs to spend many hours or even days deciphering alien intelligence.

This is not sustainable and is extremely destructive.

The prototype

AI coding agents such as Claude Code, Codex, Cursor CLI and more have unlocked the ability to ship a “new kind” of change set, the prototype.

The prototype is a live demo. It does not meet a project’s coding standards. It is not code you vouch for or guarantee is good. It lacks tests, may contain security issues and most likely would introduce an enormous amount of technical debt if merged as is.

That said it is a living demo that can help make an idea feel more real. It is also enormously fun.

Think of it as a delightful movie set.

Prototypes, especially on projects such as Discourse where enabling tooling exists are incredibly easy to explore using tools like dv.

% dv new my-experiment
% dv branch my-amazing-prototype
% dv ls
total 1
* my-amazing-prototype Running 1 minute ago http://localhost:4200

# finally visit http://localhost:4200 to see in action

Prototypes are great vehicles for exploring ideas. In fact you can ship multiple prototypes that demonstrate completely different solutions to a single problem which help decide on the best approach.

Prototypes, video demos and simple visual mockups are great companions. The prototype has the advantage that you can play with it and properly explore the behavior of a change. The video is faster to consume. Sometimes you may want them all.

If you are vibe coding and prototyping there are some clear rules you should follow

  1. Don’t send pull requests (not even drafts), instead lean on branches to share your machine generated code.
  2. Share a short video AND/OR links to a branch AND/OR quotes of particular interesting code from the prototype in issues / or forum posts.
  3. Show all your cards, explain you were exploring an idea using AI tooling, so people know the nature of the change you are sharing.

Maybe you will be lucky and an idea you had will get buy-in, maybe someone else may want to invest the time to drive a prototype into a production PR.

When should you prototype?

Prototyping is fun and incredibly accessible. Anyone can do it using local coding agents, or even coding agents on the cloud such as Jules, Codex cloud, Cursor Cloud, Lovable, v0 and many many more.

This heavily lowers the bar needed for prototyping. Product managers can prototype, CEOs can prototype, designers can prototype, etc.

However, this new fun that opens a new series of questions you should explore with your team.

  • When is a prototype appropriate?
  • How do designers feel about them?
  • Are they distracting? (are links to the source code too tempting)?
  • Do they take away from human creativity?
  • How should we label and share prototypes?
  • Is a prototype forcing an idea to jump the queue?

When you introduce prototyping into your company you need to negotiate these questions carefully and form internal consensus, otherwise you risk creating large internal attitude divides and resentment.

The value of the prototype

Prototypes, what are they good for? Absolutely something.

I find prototypes incredibly helpful in my general development practices.

  • Grep on steroids. I love that prototypes often act as a way of searching through our large code base isolating all the little areas that may need changing to achieve a change
  • I love communicating in paragraphs, but I am also a visual communicator. I love how easy a well constructed prototype can communicate a design idea I have, despite me not being that good in Figma.
  • I love that there is something to play with. It often surfaces many concerns that could have been missed by a spec. The best prototype is tested, during the test you discover many tiny things that are just impossible to guess upfront.
  • The crazy code LLMs generate is often interesting to me, it can sometimes challenge some of my thinking.

The prototype - a maintainers survival guide

Sadly, as the year progresses, I expect many open source projects to receive many prototype level PRs. Not everyone would have read this blog post or even agree with it.

As a maintainer dealing with external contributions:

  • Protect yourself and your time. Timebox initial reviews of large change sets, focus on determining if it was “vibe coded” vs leaving 100 comments on machine generated code that took minutes to generate.
  • Develop an etiquette for dealing with prototypes pretending to be PRs. Point people at contribution guidelines, give people a different outlet. “I am closing this but this is interesting, head over to our forum/issues to discuss”
  • Don’t feel bad about closing a vibe coded, unreviewed, prototype PR!

The ready to review PR

A ready to review PR is the traditional PRs we submit.

We reviewed all the machine generated code and vouch for all of it. We ran the tests and like the tests, we like the code structure, we read every single line of code carefully we also made sure the PR meets a project’s guidelines.

All the crazy code agents generated along the way has been fixed, we are happy to stamp our very own personal brand on the code.

Projects tend to have a large set of rules around code quality, code organisation, testing and more.

We may have used AI assistance to generate a ready to review PR, fundamentally, though this does not matter, we vouch for the code and stand behind it meeting both our brand and a project’s guidelines.

The distance from a prototype to a ready to review PR can be deceptively vast. There may be days of engineering taking a complex prototype and making it production ready.

This large distance was communicated as well by Andrej Karpathy in the Dwarkesh Podcast.

For some kinds of tasks and jobs and so on, there’s a very large demo-to-product gap where the demo is very easy, but the product is very hard.

For example, in software engineering, I do think that property does exist. For a lot of vibe coding, it doesn’t. But if you’re writing actual production-grade code, that property should exist, because any kind of mistake leads to a security vulnerability or something like that.

Veracode survey found that only 55% of generation tasks resulted in secure code. (source).

Our models are getting better by the day, and everything really depends on an enormous amount of parameters, but the core message that LLMs can and do generate insecure code, stands.

On alien intelligence

The root cause for the distance between project guidelines and a prototype is AI alien intelligence.

Many engineers I know fall into 2 camps, either the camp that find the new class of LLMs intelligent, groundbreaking and shockingly good. In the other camp are engineers that think of all LLM generated content as “the emperor’s new clothes”, the code they generate is “naked”, fundamentally flawed and poison.

I like to think of the new systems as neither. I like to think about the new class of intelligence as “Alien Intelligence”. It is both shockingly good and shockingly terrible at the exact same time.

Framing LLMs as “Super competent interns” or some other type of human analogy is incorrect. These systems are aliens and the sooner we accept this the sooner we will be able to navigate the complexity that injecting alien intelligence into our engineering process leads to.

Playing to alien intelligence strength, the prototype

Over the past few months I have been playing a lot with AI agents. One project I am particularly proud of is dv. It is a container orchestrator for Discourse, that makes it easy to use various AI agents with Discourse.

I will often run multiple complete and different throwaway Discourse environments on my machines to explore various features. This type of tooling excels at vibe engineering prototypes.

Interestingly dv was mostly built using AI agents with very little human intervention, some of the code is a bit off brand, that said unlike Discourse or many of the other open source gems I maintain it is a toy project.

Back on topic, dv has been a great factory for prototypes on Discourse. This has been wonderful for me. I have been able to explore many ideas while catching up on my emails and discussions on various Discourse sites.

On banning AI contributions, prototypes and similar

Firstly you must be respectful of the rules any project you contribute has, seek them out and read them prior to contributing. For example: Cloud hypervisor says no AI generated code to avoid licensing risks.

That said, there is a trend among many developers of banning AI. Some go so far as to say “AI not welcome here” find another project.

This feels extremely counterproductive and fundamentally unenforceable to me. Much of the code AI generates is indistinguishable from human code anyway. You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.

The new LLM tooling can be used in tremendous amounts of ways including simple code reviews and simple renamings within a file, to complete change set architecture.

Given the enormous mess and diversity here I think the healthiest approach is to set clear expectations. If I am submitting a PR it should match my brand and be code I vouch for.

As engineers it is our role to properly label our changes. Is our change ready for human review or is it simply a fun exploration of the problem space?

Why is this important?

Human code review is increasingly becoming a primary bottleneck in software engineering. We need to be respectful of people’s time and protect our own engineering brands.

Prototype are fun, they can teach us a lot about a problem space. But when it comes to sending contributions to a project, treat all code as code you wrote, put your stamp of ownership and approval on whatever you build and only then send a PR you vouch for.

Comments

18 days ago

Framing LLMs as “Super competent interns” or some other type of human analogy is incorrect. These systems are aliens and the sooner we accept this the sooner we will be able to navigate the complexity that injecting alien intelligence into our engineering process leads to.

Interesting justification, I personally do not view them as aliens, but rather lost ships that require precise steering and direction. A ship still needs mastery to steer. Vibe-coders are usually coming from a place of no experience, thus the ship won’t steer to its direction well. Dependency and complacency of senior professionals leads to the same thing.

Sam Saffron 18 days ago
Sam Saffron

I agree, it is a bit of both.

The stark alien aspect for me is the incredible competence mixed in with incredible incompetence.

The systems know every coding language and almost every trick in the book, but they often apply the tricks in very weird and alien ways.

Part of it is the “eagerness to please” … eg: you asked me to do it, so I did it.

But part is just over reliance on hacks that should not be deployed and lack of “whole system” thinking.

Completely agree though, you need to know how to steer this tooling to get great results.

Also, something a lot of people do not realize, you need to know when to “give up” and start from scratch. Back to your lost ship analogy, the ship often gets so lost that that trying to steer it in the right direction is both pointless and impossible, sometimes you need a reboot. In fact I would say reboots are needed a lot more frequently than I tend to use them, cause I like playing steer the ship, but it can be counter productive.

Sam Saffron 13 days ago
Sam Saffron

This blog post did land on Hacker News at:

Thank you for the thoughtful discussion.

On proof of work, micro transaction and other ways of passing cost to contributors

There were a few ideas floated around introducing some cost to contributors on open source projects to protect maintainers time.

I think it is interesting as a thought experiment, but completely unworkable. The core of what makes open source “open” is that we are open to 3rd party contributions. A for-pay open source ecosystem would look completely different to what we have today.

I think we must focus on clear rule setting and low tolerance to rule breaking as the only practical mean of addressing the denial of service attack that is forming.

On the LLM hype bubble

The way I see it is that as engineers there is yet another tool in our toolbelt.

This time it is a sort of laser cannon and it comes with no manual. We are writing it as we go.

There are LOTS of ways of misusing this new tool, but also tons of ways of using this tool incredibly effectively.

For example, as “grep but on steroids” nothing comes close. If you need to navigate and isolate where changes belong in a large code base this can save hours of hard and complicated searching.

There are tons of real world highly effective use cases for agents and tons of ways to abuse it.

And… no manual.

On having a better way of managing reputation

This certainly resonates with me, when I submit a PR to a project I am slowly building on my personal brand.

When someone reviews a new PR by me seeing my “reputation” (what hit rate do I have? what am I an expert in?) and so on certainly can help judge. Not sure how we can swing this but there may be an interesting feature for GitHub to think about here.

Jerry’s apartment is not a movie set

This is true. I really should have picked a better picture.

Click bait title is not doing any you favors

Hacker news chose the title:

We need a clearer framework for AI-assisted contributions to open source

This is compelling and non-polarizing.

My polarizing title tries to cover a bit of extra nuance, which sadly can alienate people who simply do not read due polarizing titles.

Vibe coded slop

As emotional as this is, I think it is important for us to think about these words.

This is talking about code you generated but did not bother checking for accuracy and carefully reviewing.

Vibe coded slop CAN become “production grade software” once it is carefully reviewed. The problem though is that you do not know upfront if the machine generated code is slop or not and need to carefully review it as a general rule. It often contains aspects of slop.

I guess this is why there is a trend of selling “Vibe engineering” and better than “Vibe coding”.

On the word slop

AI Slop” as a term has gained tons of popularity over the past few years. It is something anyone interacting with LLMs experiences daily. Sometimes it is a bit of slop sometimes i it is a lot.

The word slop was born out of an Old English word called “sloppe” which mean cow poop. It is not surprising it is upsetting to many.

:hugs:

Join the discussion

What do you think?

comments powered by Discourse