Six months ago, the answer to “could you build a real, deployed application end-to-end from a terminal, with an AI coding assistant on the other side of every command?” was a yes-shaped guess. This is the experiment that turned the guess into a product with paying users.
A vertical Short version is up at https://www.youtube.com/watch?v=-mhCWPxuq30 for anyone who’d rather watch in thirty seconds.
What got built, and what it looks like now
Calendar Sync started in early December 2025 as an AI and developer co-development project. The product brief was simple — sync busy times between multiple Google calendars without sending the data anywhere we didn’t control. The development brief was the interesting half: see how far Claude Code could take a real production app. Not a prototype. A real, deployed, paying-customers application with Docker containers, OAuth, webhooks, recurring-event handling, a backlog, a CI pipeline, a staging environment, and a dev journal. The first version synced one event. The honest plan was to stop there.
Six months on, the product is in production on IONOS with real users, a Docker-based deployment, an engineering team, and as of this week a Stripe-backed subscription. Fourteen-day free trial for new users. Monthly or annual paid tier. UK VAT handled correctly via Stripe Tax. Customer self-serve billing portal. Exempt accounts for the team and partners. The plumbing finally matches the product.
The actual finding — the workflow scales
The thing that has surprised me most isn’t whether AI tooling could build the application. By February it was obvious that it could. The surprise has been how cleanly the workflow scaled to a team. The engineering got picked up. Daily use surfaced the edge cases. Each person working from their own terminal, with Claude Code on the other side of each one, all converging on the same codebase. The AI-supported command line stopped being a personal productivity trick and started being how the team actually ships.
For context: 169 commits since March 1. 25 of them are billing- or Stripe- or FEAT-010-related. 105 tests across the suite, all passing on every push. Thirteen days of uninterrupted production uptime as of yesterday. None of those numbers are exceptional in absolute terms — what’s noteworthy is that the development pattern is fully terminal-driven, fully reproducible from the journal entries, and fully composed of small, reviewable commits with named tickets.
What didn’t ship clean
Worth being honest about. The promo-code system got pulled at the last minute before launch. Pre-flight testing caught that codes were being validated against our local database but never actually applied at Stripe checkout. A user would have seen the words “discount applied” on the order summary and then been charged the full price anyway. The team found the gap, the user-facing promo input came out cleanly, and the proper fix — storing each code’s Stripe promotion-code ID alongside the local row and passing it on checkout — is now the next billing-side ticket. Until then, discounts get applied manually via the Stripe Dashboard.
This is the second experiment in a row where the description names something that broke. That feels like the right pattern for the series. The point of an open ledger of experiments is that the failures are as informative as the wins, and burying them under marketing copy would defeat the purpose of doing the experiments publicly at all.
What’s next — the hard one
The next major feature is bidirectional sync. Today, calendars sync one direction — A to B. Real customers want events flowing both ways. The hard part is that every event synced from A becomes a real event in B, which without protection means the sync engine would happily push it back to A on the next cycle. You get infinite loops. Duplicates. A mess.
Six months ago we blocked this pattern at the rule-creation layer for safety — you can’t create an A → B rule and a B → A rule between the same pair of calendars. Unblocking it properly will need loop detection at the engine layer (so a synced event in B doesn’t trigger a sync back to A), a redesigned deterministic-ID scheme so the same source event maps to the same target event regardless of direction, an explicit conflict-resolution rule for the simultaneous-edit case, and an architecture decision document before any code touches main. That’s experiment 7’s work. Hard problem. Real demand. Fun to figure out.
This video is also a marketing-video experiment
Worth naming directly. This series has two parallel purposes. The first is documenting how the videos get made — the open ledger of capabilities, the failures, the tools. The second, which has been quieter until now, is testing whether scripting plus AI tooling plus automation can produce real marketing videos for Rover Engineering products. Not behind-the-scenes content. Not documentary explainers. Actual marketing pieces, at the production cadence of a real product company, at a per-video cost under a few pounds.
Calendar Sync is a real product launching a real subscription this week. The video above is the marketing piece for that launch. The fair question is whether AI-driven production can hit a marketing quality bar that paying customers find acceptable. The honest answer right now is “close, with caveats.” That gap is the interesting work for experiments seven through ten.
How this video got made
This is the sixth instalment in the open series. The cumulative ledger lives in the repo and is reviewed before each new shoot; the rule is that every video must carry forward every previously-proven capability and add at least one new one. By experiment six, the inherited stack includes ElevenLabs cloned-voice narration, Whisper auto-captions, custom Pillow-rendered thumbnails, Final Cut Pro project export, fal.ai Kling AI B-roll, a Stable Audio music bed with sidechain ducking under the narration, vertical-in-landscape framing for iPhone clips, Google Gemini source-clip analysis, user voice mixed alongside synthetic narration, soft-attached SRT caption tracks on the landscape (burned-in on the Short), and a longer runtime that the content earns.
The new capability this round was live production app capture via Playwright. The recording opened a real Chrome browser pointed at the production deployment, captured native video while the dashboard, the Stripe-backed subscription flow, and the post-payment connected state played out, and then a post-processing step ran a gaussian-blur filter over identifying information (account name, email address, payment-method details) before publishing. The dashboard you see in the cut is the actual production application, not a mock.
Worth naming a real limitation that surfaced: the Playwright auto-driver assumed selectors that don’t match production’s markup, so the bulk of the seven-minute recording was wasted minutes while Playwright waited for buttons it would never find. The usable footage was the embedded segments — sign-in, dashboard, Stripe checkout, post-payment connected state. The fix for next time is reading the actual production DOM before writing the selectors. Logged.
Per-video API spend on this round was under £0.10 — the ElevenLabs narration was the only fresh paid call (the music bed and AI B-roll clips were reused from the library). Total time from the first iPhone clip to two videos live on YouTube: about four hours, most of which was the Playwright detour.
Where this leaves the experiment series
If you’re following along: experiment one set the baseline pipeline. Experiment two added captions, custom thumbnails, and the Final Cut handover. Experiment three brought in AI B-roll via Kling. Experiment four added user footage from Drive, Gemini analysis, music, and vertical-in-landscape framing. Experiment five added user voice mixed with narration, soft-attached landscape captions, and the longer runtime. This is experiment six, with the live production app captured directly. Each one builds on the last and the failures are openly logged.
The pattern that’s emerging is interesting and not what I would have predicted at the start of the series. The pipeline is the easy part. The hard part is having something worth saying.