AI Coding Thoughts

Published on
Last updated on
Technical
AI

AI code

Recently I started implementing a live-caption application. The app is still far from complete, but I’ve already discovered several things worth documenting.

Knowledge Gap

The gap is not on the AI side or my side alone—both have limits. At the outset I proposed Rust simply because Linux is my daily driver and the pipewire crate looked healthy. For the UI, Zed’s gpui seemed like a safe bet.

I asked the AI to research cross-platform audio capture. It recommended CPAL as the abstraction over PipeWire (Linux), Core Audio / ScreenCaptureKit (macOS), and WASAPI (Windows), so we began.

Linux: CPAL soon showed its limits with PipeWire, so we refactored to use the PipeWire API directly.

macOS: first CPAL, then objc2 with Core Audio / ScreenCaptureKit. We burned days getting silence; I suspect we don’t select the correct device we configured, ask AI to investigate, AI eventually noticed we hadn’t selected the correct device. Another hurdle: ScreenCaptureKit demands “Screen and System Audio Recording” permission even when you only need system audio.

Windows: again started with CPAL, then moved to raw WASAPI.

Take-away: even a “deep research” pass can surface outdated crates or docs. Treat every AI suggestion as a draft that needs human validation.

Knowledge Shortage

I knew almost nothing about application architecture or modern UI patterns, and describing what I wanted was often harder than coding it.

Three blind spots the AI had to point out:

  1. Shared buffer: we capture both microphone and system audio; those streams must feed the ASR pipeline and optionally be saved to disk—something I never modeled.
  2. Settings window: when the user hits “Apply,” changes must propagate instantly. The idiomatic fix is an event bus—obvious in hindsight, but I missed it, still in proposal.
  3. Adaptive theme: switching themes on the fly is trivial with gpui-component, once you know the component exists.

We later refactored the UI from raw gpui to gpui-component (a shadcn-inspired wrapper). The migration was painless but time-consuming, mostly because I didn’t have precise names for everyday widgets:

  • Slider: a draggable knob on a line.
  • Border: the edge rectangle of a shape.
  • Dropdown container: originally rendered much wider than its label; I let the AI tighten the layout.
  • main-worker thread mode: when you have one more tasks to do, you need start more worker threads, and the main threads should coordinate them.
  • async: some tasks will consume a lot of time, we don’t want to block there.
  • how to use buffer, how to manage buffer
  • how to use event bus, to apply settings on the fly
Back to Blog