8 posts tagged with "ai-engineering"

The Classic 'Works on My Machine' — Now With Neural Networks

May 21, 2026 · 7 min read · updated May 21, 2026

Software Engineer

There's a version of this story where we made a mistake and fixed it. That's true but incomplete. The fuller version is about what a structured evaluation process gets right, what it misses, and how the ground can shift under you even when you've done the work.

We Spent Five Weeks Making Docling Work. Then We Deleted It.

May 21, 2026 · 6 min read · updated May 21, 2026

Danish Javed

Software Engineer

This is a post-mortem on five weeks of infrastructure work that ended with git rm and 1,452 lines deleted from the lockfile alone.

The library in question is Docling. It's a capable open-source document parser from IBM Research — handles PDFs, tables, figures, DOCX, the lot. On paper it looked like exactly what we needed. In practice it turned out to be a small ML platform hiding inside a Python package, and we didn't fully appreciate that distinction until we were already three acts deep.

Your Codebase Has Rules. Does CI Know That?

May 21, 2026 · 3 min read

Danish Javed

Software Engineer

There's a particular kind of meeting that happens on mixed teams. Someone's opened a pull request, and two engineers are staring at the same diff with completely different facial expressions. One is confused. The other is quietly furious. Neither is wrong, exactly — they just have entirely different mental models of what the codebase is supposed to look like.

That's the drift I'm talking about. Not bugs. Not broken tests. Just two people who've been building in the same repository for months and have somehow ended up with incompatible ideas about what goes where.

You Can't Debug What Bedrock Swallowed

May 21, 2026 · 3 min read

Danish Javed

Software Engineer

There's a particular kind of hell reserved for debugging LLM-backed systems that nobody bothered to instrument. You've got a request that took twelve seconds and you don't know if the slow part was your retrieval pipeline, the prompt construction, the Bedrock call itself, or the post-processing that turned the model's output into something you'd actually show a user. You have logs. You have vibes. You have, essentially, nothing.

We hit this early on an LLM project and it focused the mind quickly.

TDD Was Solving the Agent Problem Before Agents Existed

May 21, 2026 · 4 min read

Danish Javed

Software Engineer

The first time I set an agent loose on a real codebase, it ran out of context before it had done anything useful. That's a clarifying experience.

The repository wasn't exotic — a Python monorepo with shared libraries and some infrastructure code. I drew a diagram to understand what was happening. A rectangle for the full context window; blocks for what was already consumed just from loading the codebase: directory tree, CLAUDE.md, relevant modules, config, dependencies. The bar was more than half full before the agent had read a single line of task context or seen a single error message.

The Blockers Don't Care That You're Using AI

May 21, 2026 · 3 min read

Danish Javed

Software Engineer

I wrote a post back in 2021 about walking skeletons — the idea that before you go deep on features, you ship something thin and deployable end-to-end. Not because it's useful to users, but because it flushes out the real blockers while the cost of finding them is still low. Permissions. Pipelines. Infrastructure assumptions that looked fine on a whiteboard.

That advice hasn't aged out. If anything, AI projects have made it more relevant, not less.

The Metric Your Users Feel Before You Measure It

May 21, 2026 · 4 min read

Danish Javed

Software Engineer

Working on a streaming chat product taught me something: the standard latency metrics don't really describe what users experience. They're not waiting for a page to load or an API to return a JSON blob. They're watching tokens appear — and what they feel before anything appears is the thing most teams aren't measuring.

That thing is time-to-first-token. TTFT.

Journey To The Centre Of The Stack

November 30, 2020 · 5 min read

I first wrote this post in 2020 after spending several weeks containerising a legacy application I hadn't built and didn't fully understand. The experience was mostly archaeology — reading old config files, tracing hardcoded paths, figuring out what half a dozen processes actually did before touching anything. By the time I had a working Docker image, I'd earned it.

I'm updating it now because the journey has changed, and I think it's worth being honest about how.