It’s been almost three years since ChatGPT launched and pushed AI into the mainstream.

In the months that followed, competitors like Anthropic’s Claude and Google’s Gemini arrived with their own models. Fast forward to 2025, and users now have hundreds of options, from broad, general-purpose models to lightweight tools fine-tuned for specific tasks.

With new releases dropping almost every week and companies showcasing technical benchmark results to claim superiority, it’s tough for developers to stay current, and even if you do, figuring out which model to actually use is another challenge entirely.

While benchmarks are useful, they’re clean, controlled, and unambiguous, words you probably wouldn’t use to describe software engineering.

Most dev tasks start with grey areas: vague requirements, open-ended expectations, incomplete specs, and a lot of room for interpretation, so with this in mind we thought we’d run our own real world benchmark test and give five AI app-building platforms the exact same ambiguous prompt, with no follow-ups, and judge the results like a developer would.

The Prompt

Firstly, here’s the prompt we gave all of the platforms to work with. We purposely used ambiguous words like modern, and left out details to see if the platform thought of it itself (like a delete button for new items).

The prompt reads as follows:

Build a “Side-Project Mood Tracker” web app with a clean, modern, slightly playful design.

The app should let a user:

- Create a new side-project
- Give it a short description
- Set its current “mood” using emoji (😃 Productive, 😐 Neutral, 😩 Stuck)
- Add optional tags (like “frontend”, “ai”, “learning”)
- See all projects in a simple dashboard
- Filter or sort by mood or tag

Design/style vibes:
- “modern”, “minimal”, “friendly”, “slightly playful”, “soft edges”, “calm colours”, “dashboard layout”, “simple cards”*

Other requirements:
- Editing should happen without page refresh
- Store data however the platform prefers (local, mock DB, internal state)
- Include a lightweight logo/header
- Include at least one small interactive detail (hover animation, micro-interaction, etc.)
- No authentication

We didn’t even clarify what we meant by app, so there’s plenty for it to play with here.

The Criteria

1. Setup & UI (5 points)

Firstly, let’s look at how well it interpreted the prompt, the quality and completeness of the UI, along with the onboarding setup speed and ease.

2. Customising Code (5 points)

How easily a developer can edit or extend the project, switch views, integrate a database, or work with the platform tools. It’s all good fun to create a nice UI, but taking the project forward afterwards is what matters, and that all starts and ends with the ability to edit the code.

3. Codebase (5 points)

How clean, accessible, and transparent the underlying code is — dependencies, file structure, readability, architecture.

This will give each platform a maximum score of 15 points. Without further ado, let’s get into it.


Bubble: A Throwback to Wordpress

The Bubble.io editor

Setup & UI

At first, Bubble.io, seems like any other AI-powered prototyping tool, but with a slight twist.

Whereas many of the others position themselves as something new entirely, different to earlier low-code and no-code tools and CMS’s like Wordpress, Joomla, Framer and so on, Bubble actually feels familiar if you’ve any of those listed in the past, with the UI feeling very similar to Webflow.

The UI it created was generally nice looking, but feels quite dated. It added elements I didn’t ask for, like a large hero header, placeholder content and more decorative components than needed.

It also took a staggering 6 minutes to generate, which considering the issues with the UI, felt like a fairly big investment of time for not much back in return.

Not a great showing in this category, we’re afraid.

Score: 2/5 ⭐⭐

Customising Code

When it comes to customisation, Bubble is, for a developer, quite frustrating.

A click-to-edit approach feels slow and granular, and developers will find themselves looking for code-based ways to edit components across the site, and as the editor gives almost no direct access to code, stylesheets, or dependencies, iterative work feels slow when you’re used to AI-powered IDE workflows.

The lack of customisation and the cluttered UI making the customisation options feel hard to find mean a low score is inevitable:

Score: 1/5

Codebase

Well… unfortunately, there really isn’t much of a codebase to talk about. Bubble hides almost everything behind its visual engine, so you cannot inspect dependencies, file structure, or output tech.

For devs, this is the most restrictive platform, and you’ll find yourself looking to export your code and edit it in your IDE, which is unfortunately not possible, and means Bubble only takes a 1/5 in this category, too:

Score: 1/5

Total Score: 4/15


Rocket: Sensible Building

Rocket's editor

Setup & UI

Compared to Rocket, feels much more in touch with the modern developers’ experience. Firstly, it asks a number of setup questions, such as the developers’ stack of choice, and the number of screens they want to generate to reduce power consumption.

This leads to a slower generation phase, but the results feel more sensible: a functional UI with filters, dates, and a logical layout. Not overbuilt, and not underbuilt, just right.

The completeness of the layout, and the sensible decisions made along the way, mean Rocket does really well here, taking 3.5 out of 5.
Score: 3.5/5 ⭐⭐⭐

Customising Code

Rocket provides code access in its top tab, with integrations for live databases. You can export cleanly into your own environment, which makes for a smoothe hand‑off experience for devs who want to continue in their IDE.

Another strong showing here, with features built with developers in mind.

Score: 4/5 ⭐⭐⭐⭐

Codebase

Firstly, with vibe coders building complex apps and often without any technical knowledge of what they’re building, a ReadMe is essential, and Bolt was the only platform to generate one, explaining structure and architecture.

The app is a React app, with full access to the file tree, along with logs, it makes for a familiar and well-planned environement for developers to start building in.

Score: 4/5 ⭐⭐⭐⭐

Total Score: 11.5/15


Base44: Upgrade to Edit

Base44 Editor

Setup & UI

Base44’s setup is smooth, and it takes it upon itself to add small logical features not mentioned explicitly such as item counts, functioning filters, and delete actions.

Aesthetic-wise, it interpreted words like playful and essentially added a vibrant purple-colour scheme on the back of it. It’s not particularly imaginative, but at least it tried.

Score: 2.5/5 ⭐⭐

Customising Code

So there’s a real sticking point when it comes to customising code with Base44. In a sentence… you can’t… unless you upgrade.

The free plan doesn’t allow code editing, and the code editor is also hidden away from the main preview tab. This immediately limits developer usability, and of course, it’s score. It’s a 1/5, but only because zero felt harsh.

Score: 1/5

Codebase

Not being able to access the code is a real problem. We understnad Base44 wanted to perhaps target non-technical builders, but by restricting code access, it’s almost entirely useless for developers, and actually causes issues.

For example, as you cannot access package.json, you cannot see dependencies. However, with libraries like framer‑motion included in the code but restricted from uninstalling anything, dead weight gets bundled regardless.

Furthermore, many components store shared objects directly inside component files instead of centralising them — making the structure feel repetitive and poorly structured.

Score: 1/5

Total Score: 4.5/5


Bolt: Built for Devs

Bolt's editor

Setup & UI

Bolt interpreted the brief well and added its own small embellishments, such as extra mood button options, sensible component groupings, and a UI that feels like a solid starting point. It runs in preview mode by default but you can switch to full code view instantly, with access to the file tree. A strong start for Bolt:

Score: 4/5

Customising Code

This is where Bolt shines. Firstly, you get a near‑IDE experience with an in-built file editor, terminal, integrated database access (with raw data view), and even a simple version control feature.

The best aspect to this is the system prompt governing design behaviour that you see nestled in the file tree, that reads like so:

For all designs I ask you to make, have them be beautiful, not cookie cutter. Make webpages that are fully featured and worthy for production.

By default, this template supports JSX syntax with Tailwind CSS classes, React hooks, and Lucide React for icons. Do not install other packages for UI themes, icons, etc unless absolutely necessary or I request them.

Use icons from lucide-react for logos.

It’s not an over-prescriptive prompt, nor will it stop the LLM going off-track, but it’s nice to see a Bolt attempting to keep the LLM in check in a transparent way.

Score: 4/5 ⭐⭐⭐⭐

Codebase

Bolt produces a React/Vite project with clean organisation, and while it essentially made this decision without explicit instruction (which is a positive or a negative depending on how you feel about frameworks and libraries), the code quality has undoubtely been improved by the system prompt we mentioned before, light on dependencies (except for the approved lucide-react package).

The only downside is that while you can integrate third-party tech (like a Supabase database), it will use it’s own built-in database as standard, and it’s not always that clear how to replace it with your own.

All things considered, we can’t think of anything major that is missing, so we’re going to go ahead with (nearly) full points.

Score: 4.5/5 ⭐⭐⭐⭐

Total Score: 12.5/15 ⭐⭐⭐⭐


Vercel v0: Smart Design and Build

Vercel v0's Editor

Setup & UI

Vercel’s v0 has become one of the most popular app generators amongst developers, and we understand why.

v0 generated the entire UI faster than any of the other platforms — and produced the most tasteful result, design-wise. Smooth gradients, clean typography, tasteful hover effects, and a strong interpretation of “modern, minimal, slightly playful.”

Now this is not without some oversights, like the fact it it didn’t consider adding new tags, something a dev would naturally expect.

All things considered, it’s a smooth setup with a good return on a one-shot prompt.

Score: 4/5 ⭐⭐⭐⭐

Customising Code

The v0 editor really is great. Not only is it developer‑centric, allowing us to switch seamlessly between preview and code, but it allows us to refine and edit the design system for site-wide styles, manage environments, connect external integrations and even add rules and project-level configuration, which reflects what AI-powered devs have come to expect.

It feels like the most mature editing environment, and feels like it’s built to be part of a larger workflow, allowing devs to export and exit whenever they’re ready.

Score: 4.5/5 ⭐⭐⭐⭐

Codebase

While v0 is transparent in most respects, there are a couple of downsides in the codebase. Firstly, partial code access reveals a large dependency tree; a full Next.js project using React 19, plus many shadcn and Radix components that aren’t used by other platforms.

While these might contribute to the quality of the UI produced by v0, some of these files are referenced in imports but their source files aren’t visible, which can make low‑level editing tricky.

Still, it’s the only tool to adopt React 19, and the code is consistently formatted, so it goes some way to making up for its shortcomings with its ease of export, so you can carry on building in your IDE.

Score: 4.5/5 ⭐⭐⭐⭐

Total Score: 13/15


Finding the Right Fit

Each of the platforms we used produced something beautiful, or something functional, and sometimes a mix of the two. What none of them managed to do was build an application fit for production, but they got us part of the way there.

While not all are designed for developers (or at least, not primarily), most would have their benefits depending on the project.

This experiment shows what benchmarks can’t, how AI tools behave when confronted with ambiguity, taste, and real developer needs.

And in this comparison, Vercel v0 came out on top (with Bolt just behind), not because it’s perfect, but because it feels the most like working with a thoughtful collaborator, which is what AI should do, right?

Other interesting articles:
artificial intelligence
See all articles