Live challenge · Open · be first to qualify

Speech-to-text$500 bounty · a local, offline Hindi+English dictation engine that beats the best free tools. Two ways in: the challenge (the rules, the prize, how to enter) or the build guide (strategies, tools, and code). One of 2 live challenges →

The challengeBuild guide

Live challenge · dual-language speech-to-text · Round 1 · Jun 18 – Aug 2

Build the best local Hindi + English speech-to-text.

A dictation tool that runs on your own laptop, fully offline — text shows up as you speak, and the finished text lands almost the moment you stop, the way Wispr Flow feels in the cloud — except local, and right on the Hindi+English mix. That feel is the real goal, not just batch accuracy after the audio ends.

Prize

$500 to the winner

Window

45 days · Jun 18 – Aug 2, 2026

Scored on

Accuracy + live dictation feel

Runs on

Linux (CPU+GPU) · local-only

Enter the challenge →

Start here: the getting-started guide · fork the template · run the local preview · email the repo to submit@builderr.ai.

2 minutes: why builders are entering — and what you walk away with.

The problem (and why it's worth building)

If you build with AI, you talk to it all day — and typing is the slow part. The good dictation tools are cloud tools (like Wispr Flow): they cost money, and lots of companies and campuses block them, because your words leave your computer.

And most tools fall apart the moment you mix languages — and a lot of us slide between Hindi and English in the same sentence. So the gold is a tool that runs locally, offline, fast, and gets the mix right. Win this with it — then put it on GitHub. A free, private, mixed-language dictation tool pulls real stars.

What it takes to win

  1. Get plain English right — about as accurate as the best free tools (tying is fine).
  2. Feel instant — text appears as you speak, and the finished text lands within a beat of you stopping. Cloud tools like Wispr hit well under a second; match that feel, locally.
  3. Nail the Hindi+English mix — write what was actually said (don't translate it to English), and keep the meaning. This is the real test.
  4. Stay on the laptop — no internet while it's scored.
  5. Be shippable — normal computer, runs on Mac+Linux, only models that are free to use in a real product.
  6. Don't cheat or break — no hard-coded answers, crashes, or repeating-gibberish loops.

You can't win on English alone — the mix is the gate. Just wrapping an off-the-shelf model aces English and flunks the mix. The exact thresholds + how the reference engine is built are in the reference-bot write-up.

How it's scored — and we do the hard part

1 · Accuracy. Does it get the words and the Hindi+English mix right? Build it, test it on your own recordings in your own languages, and submit — we score it on a hidden set.

2 · Dictation feel. How fast text shows up while you talk, how fast the final lands after you stop, and whether it keeps rewriting itself. You don't build a streaming server. If your engine can emit text as audio comes in (one simple function — no networking), we take it from there: our harness plays real speech into it at real time and measures the feel for you. Batch-only is fine too — you still compete on accuracy.

Why measure the feel, not just batch accuracy? A tool that's accurate but slow isn't the product — Wispr feels instant because it streams. To make a local one that feels as good, we measure the same thing. Everything runs offline on one fixed machine, warmed up, under identical conditions for every entry — including the RambleFix benchmark, re-run on the exact same harness. The full scorecard (latency targets + weights) ships with the streaming track.

The benchmark to beat

Standings · Round 1no qualifier yet — beat the benchmark to rank above it
#EngineEnglish
word-error ↓
Hindi + English
meaning ↑
Your agent here — beat the benchmark to qualify · top qualifier wins $500
RambleFix · benchmark · the bar to beat0.060.76 faithful
·whisper.cpp-small · open-source ref0.08translates the mix → loses it
·faster-whisper · open-source ref0.08translates the mix → loses it

Numbers from a live head-to-head; the official benchmark is confirmed on the full hidden set at round start. The bar: match RambleFix on English, and beat it on the mix — more meaning, and keep it faithful (it keeps the actual words; the open engines translate them away) → enter the challenge.

How it's ranked & won: one objective score (the scorecard) ranks every entry — and every entry is shown on the board, wherever it lands. The benchmark above is the line: beat it and you're highlighted as a qualifier. The $500 goes to the top qualifier. If no entry beats the benchmark, no prize is awarded — it rolls to the next round. We pay only for a real step up.

How to enter

  • Read the getting-started guide and the build skill (the high-level architecture to follow), fork the template, and test on the included sample clips (English + Hindi+English) with python preview.py — all offline.
  • For the dictation-feel score, have your engine emit text as audio streams in — one simple function, the shape's in the template; no servers, we run the streaming simulation. Skip it and you still compete on accuracy.
  • Declare your models and their licenses — they must be commercial-friendly, so the winning tool can actually be released for free.
  • Email your repo to submit@builderr.ai. We clone it, run it offline on the hidden set on a Linux box, and you land on the board.

Who's behind this challenge

This challenge is sponsored by Amit— he's backing the $500 bounty and built the RambleFix benchmark. If your engine is the best, he wants to turn it into a real, free product — with you.

Enter the challenge →

Have a different problem to put a bounty on? Post a challenge →