<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://works-in-prod.github.io/</id>
    <title>Works in Prod Blog</title>
    <updated>2026-05-21T09:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://works-in-prod.github.io/"/>
    <subtitle>Works in Prod Blog</subtitle>
    <icon>https://works-in-prod.github.io/img/logo.gif</icon>
    <entry>
        <title type="html"><![CDATA[The Classic 'Works on My Machine' — Now With Neural Networks]]></title>
        <id>https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/</id>
        <link href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/"/>
        <updated>2026-05-21T09:00:00.000Z</updated>
        <summary type="html"><![CDATA[There's a version of this story where we made a mistake and fixed it. That's true but incomplete. The fuller version is about what a structured evaluation process gets right, what it misses, and how the ground can shift under you even when you've done the work.]]></summary>
        <content type="html"><![CDATA[<p>There's a version of this story where we made a mistake and fixed it. That's true but incomplete. The fuller version is about what a structured evaluation process gets right, what it misses, and how the ground can shift under you even when you've done the work.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-document-fidelity-matters">Why Document Fidelity Matters<a href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/#why-document-fidelity-matters" class="hash-link" aria-label="Direct link to Why Document Fidelity Matters" title="Direct link to Why Document Fidelity Matters" translate="no">​</a></h2>
<p>Our virtual assistant answers questions by retrieving content from product documentation — policies, guides, structured forms. For retrieval to produce accurate, grounded answers, the source material has to survive parsing intact.</p>
<p>This sounds obvious until you see what "not intact" looks like in practice. A table rendered as a flat string of values with no row or column relationship. A heading that becomes a bold paragraph indistinguishable from body copy. A figure caption attached to nothing. These aren't cosmetic defects — they're retrieval failures. The model can't correctly reference what wasn't preserved.</p>
<p>Our requirements were non-negotiable:</p>
<ul>
<li class=""><strong>Scanned and rasterised PDFs</strong> — documents with no embedded text layer, relying entirely on OCR</li>
<li class=""><strong>Table structure</strong> — including merged cells, multi-row headers, nested content</li>
<li class=""><strong>Layout and heading hierarchy</strong> — sections, subsections, callouts, columns</li>
<li class=""><strong>Image semantics</strong> — figures and diagrams needed descriptions, not just placeholders</li>
</ul>
<p>Without these, the knowledge base becomes a lossy approximation of the actual documents. RAG answers become confidently wrong.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="starting-simple-hitting-walls">Starting Simple, Hitting Walls<a href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/#starting-simple-hitting-walls" class="hash-link" aria-label="Direct link to Starting Simple, Hitting Walls" title="Direct link to Starting Simple, Hitting Walls" translate="no">​</a></h2>
<p>We started where most teams start: the obvious libraries. <code>python-docx</code> for Word documents, <code>pdfplumber</code>, <code>pypdf</code>, and <code>PyMuPDF</code> for PDFs. These are excellent tools — fast, lightweight, well-maintained, and more than capable for clean digital documents.</p>
<p>Our documents weren't clean. The simpler libraries handled native digital PDFs without trouble but broke down on scanned content and gave up entirely on complex table structures. Merged cells became misaligned rows. Headings lost their hierarchy. OCR was absent or rudimentary.</p>
<p>We needed to go deeper.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="building-the-evaluation-harness">Building the Evaluation Harness<a href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/#building-the-evaluation-harness" class="hash-link" aria-label="Direct link to Building the Evaluation Harness" title="Direct link to Building the Evaluation Harness" translate="no">​</a></h2>
<p>Rather than try libraries one at a time, we built a structured comparison harness — <a href="https://github.com/ambersariya/pdf-parsing-comparison" target="_blank" rel="noopener noreferrer" class="">available here</a> if you want to run it yourself. Ten parsers, standardised document inputs representing the range of formats we'd encounter in production, and a consistent set of metrics:</p>
<ul>
<li class="">Wall-clock parse time</li>
<li class="">Peak memory consumption</li>
<li class="">Word count accuracy (proxy for text extraction fidelity)</li>
<li class="">Table detection and structure preservation</li>
<li class="">Heading and layout extraction quality</li>
</ul>
<p>The harness made evaluation repeatable and honest. We ran everything through the same documents, measured the same things, and scored against the same rubric.</p>
<p><strong>Docling won on every qualitative dimension that mattered.</strong> It produced structured, readable Markdown that actually reflected the document's intent — tables with correct cell relationships, headings with correct hierarchy, image descriptions, and full OCR support for scanned content. For our requirements, it wasn't close.</p>
<p>We shipped it.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-constraint-we-didnt-measure">The Constraint We Didn't Measure<a href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/#the-constraint-we-didnt-measure" class="hash-link" aria-label="Direct link to The Constraint We Didn't Measure" title="Direct link to The Constraint We Didn't Measure" translate="no">​</a></h2>
<p>Here is where the story gets instructive.</p>
<p>Docling is not a lightweight parser. It's a neural network pipeline: layout detection models, table structure analysis, OCR, and PyTorch inference — all running locally, inside the pod, on every document processed.</p>
<p>We had evaluated it on an M4 Pro MacBook with Apple Silicon MPS acceleration. Near-GPU performance for PyTorch workloads. Parse times of two to three seconds per document.</p>
<p>Production was CPU-only Kubernetes nodes.</p>
<p>The performance gap was not a percentage difference. It was an order of magnitude. What took seconds on the MacBook took minutes on a CPU pod. Our AWS load balancer had a 60-second timeout. Docling on CPU regularly exceeded that.</p>
<p>The consequence: timeouts, 502 errors, retry storms, queue backlog, pod memory pressure. Five weeks of thread throttling, semaphore tuning, and concurrency experiments — the full story is in the <a class="" href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/">Docling post-mortem</a>. The constraint was never the code. It was the hardware, and we had benchmarked against hardware we didn't have in production.</p>
<p>There's a human element worth naming too. We were under pressure to find a solution that met those qualitative requirements, and Docling met them convincingly. When you're looking for something specific and you find it, the instinct is to commit — not to ask what happens when you move it to a different machine. That instinct is understandable. It's also exactly when the infrastructure question matters most.</p>
<p>The harness was rigorous. The question it was missing: <em>does this library assume hardware you don't have?</em></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="pragmatism-over-purity">Pragmatism Over Purity<a href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/#pragmatism-over-purity" class="hash-link" aria-label="Direct link to Pragmatism Over Purity" title="Direct link to Pragmatism Over Purity" translate="no">​</a></h2>
<p>At some point the engineering question stopped being "how do we make Docling work on CPU" and became "why does it have to run locally at all?"</p>
<p>We could have chased GPU nodes. We could have built an async queue and worked around the timeout with a callback model. We could have kept tuning. Any of those would have been defensible.</p>
<p>Instead we stepped back. The requirement was structured, semantically meaningful content extracted from documents. That's a capability problem, not an infrastructure problem. The assumption that it needed to be solved with local inference was an artefact of the evaluation process, not a genuine constraint.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="now-its-all-prompt-driven">Now It's All Prompt-Driven<a href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/#now-its-all-prompt-driven" class="hash-link" aria-label="Direct link to Now It's All Prompt-Driven" title="Direct link to Now It's All Prompt-Driven" translate="no">​</a></h2>
<p>Claude via Bedrock handles our complete requirements — scanned PDFs, merged-cell tables, layout hierarchy, image semantics — without a byte of local inference.</p>
<p>The implementation is straightforward. For documents under 4.5MB we send a document block directly to the Bedrock API. For larger documents, we rasterise each page to PNG and send image blocks. Claude returns structured Markdown that preserves the document's intent.</p>
<p>Pod CPU stays flat during parsing. No timeouts. No GPU nodes. No async queue. No concurrency tuning. The "parsing pipeline" is a well-prompted API call.</p>
<p>The accuracy is comparable to Docling on our document set. The operational complexity is dramatically lower.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-the-evaluation-harness-gets-right--and-what-it-doesnt">What the Evaluation Harness Gets Right — And What It Doesn't<a href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/#what-the-evaluation-harness-gets-right--and-what-it-doesnt" class="hash-link" aria-label="Direct link to What the Evaluation Harness Gets Right — And What It Doesn't" title="Direct link to What the Evaluation Harness Gets Right — And What It Doesn't" translate="no">​</a></h2>
<p>The structured evaluation process had real value. It forced rigour where gut feel would have been faster but less reliable. The harness surfaced Docling as the correct answer to the qualitative question we were asking.</p>
<p>The gap was in the question itself. We measured capability and performance, but performance on the wrong hardware. For any library that runs local inference — ML models, neural networks, GPU-accelerated workloads — production hardware parity is not an optional benchmark condition. It's the first one.</p>
<p>There's also a broader point about evaluation framing. The harness asked "which library does this best?" It didn't ask "should this be a library at all?" As LLM APIs have matured, the answer to a class of document understanding problems has shifted from "find the best local model" to "describe what you need and ask a capable model." The evaluation dimension that matters now isn't which OCR pipeline is most accurate — it's whether the capability can be prompt-driven and whether that changes your operational posture.</p>
<p>For us it did, substantially.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="lessons">Lessons<a href="https://works-in-prod.github.io/the-classic-works-on-my-machine-now-with-neural-networks/#lessons" class="hash-link" aria-label="Direct link to Lessons" title="Direct link to Lessons" translate="no">​</a></h2>
<p><strong>1. Hardware parity is line one of the evaluation checklist for ML-heavy libraries.</strong>
Benchmarking on an M4 Pro and deploying to CPU Kubernetes is not a benchmark. It's a misdirection. Add production-equivalent hardware to the evaluation environment before architectural commitment.</p>
<p><strong>2. Structured evaluation is worth building — but the harness only finds what you measure.</strong>
The comparison harness was the right approach. We just needed one more measurement axis: "does this work on the hardware we actually have?"</p>
<p><strong>3. Pragmatism beats purity.</strong>
We could have made Docling work in production. The question was whether the cost — GPU nodes, async queues, operational complexity — was proportionate to the benefit over an API-based alternative. It wasn't.</p>
<p><strong>4. The right abstraction level has shifted.</strong>
A year ago the answer to "parse this complex PDF" was "find the best parser library." Today it's often "send it to a VLM." The evaluation harness needs to include that option, and the question needs to be capability-first rather than implementation-first.</p>
<p><strong>5. Five weeks is feedback, not failure.</strong>
The operational pain of running Docling in production gave us the forcing function to reconsider the approach. Teams that avoid the pain by over-engineering around it (GPU nodes, larger instances, longer timeouts) often miss the signal.</p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="ai-engineering" term="ai-engineering"/>
        <category label="python" term="python"/>
        <category label="lessons-learned" term="lessons-learned"/>
        <category label="document-parsing" term="document-parsing"/>
        <category label="testing" term="testing"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[We Spent Five Weeks Making Docling Work. Then We Deleted It.]]></title>
        <id>https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/</id>
        <link href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/"/>
        <updated>2026-05-21T08:00:00.000Z</updated>
        <summary type="html"><![CDATA[This is a post-mortem on five weeks of infrastructure work that ended with git rm and 1,452 lines deleted from the lockfile alone.]]></summary>
        <content type="html"><![CDATA[<p>This is a post-mortem on five weeks of infrastructure work that ended with <code>git rm</code> and 1,452 lines deleted from the lockfile alone.</p>
<p>The library in question is <a href="https://github.com/DS4SD/docling" target="_blank" rel="noopener noreferrer" class="">Docling</a>. It's a capable open-source document parser from IBM Research — handles PDFs, tables, figures, DOCX, the lot. On paper it looked like exactly what we needed. In practice it turned out to be a small ML platform hiding inside a Python package, and we didn't fully appreciate that distinction until we were already three acts deep.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="act-i-the-optimistic-beginning">Act I: The Optimistic Beginning<a href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/#act-i-the-optimistic-beginning" class="hash-link" aria-label="Direct link to Act I: The Optimistic Beginning" title="Direct link to Act I: The Optimistic Beginning" translate="no">​</a></h2>
<p>The first pull request adding Docling was merged and reverted on the same day. A flag from the universe that was politely ignored.</p>
<p>A couple of days later it was back in with the proper integration. The complications started immediately:</p>
<ul>
<li class="">A Docker entrypoint script was needed to pre-download HuggingFace models at
container startup</li>
<li class=""><code>HOME</code> and <code>HF_HOME</code> env vars had to be manually set so the image could
write to its own cache directories</li>
<li class="">The <code>DOCLING_MODELS</code> list kept breaking as a shell argument — positional
args split incorrectly, then comma-separated, then space-separated — three
separate fixes for what should have been a config value</li>
<li class="">Tesseract OCR and OpenCV had to be added to the runtime Docker stage</li>
<li class="">EasyOCR kept sneaking back into the model list and had to be explicitly
excluded every time</li>
</ul>
<p>None of this is catastrophic. But it's the kind of friction that tells you something about what you're dealing with.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="act-ii-the-model-infrastructure-tax">Act II: The Model Infrastructure Tax<a href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/#act-ii-the-model-infrastructure-tax" class="hash-link" aria-label="Direct link to Act II: The Model Infrastructure Tax" title="Direct link to Act II: The Model Infrastructure Tax" translate="no">​</a></h2>
<p>Because Docling models can't be downloaded at cold-start in production — far too slow (EKS health checks started failing almost immediately) — a CI/CD model sync workflow was introduced to pre-bake them into the Docker image. The container ballooned from a small fish to a blowfish, which led to caching the models in S3 with an InitContainer. This became its own small project: a GitHub Actions workflow to sync models, config to point Docling at the S3 cache path, and downstream fixes when the sync workflow itself had bugs.</p>
<p>The application now had an out-of-band model synchronisation pipeline that had to be kept in step with the Docling version in <code>pyproject.toml</code>. Updating the OCR engine or parser models meant updating the Dockerfile, updating the sync workflow, and triggering the S3 pipeline before deploy — in that order.</p>
<p>This is the moment where "we added a parsing library" became "we are now operating a small model registry."</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="act-iii-the-performance-whack-a-mole">Act III: The Performance Whack-a-Mole<a href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/#act-iii-the-performance-whack-a-mole" class="hash-link" aria-label="Direct link to Act III: The Performance Whack-a-Mole" title="Direct link to Act III: The Performance Whack-a-Mole" translate="no">​</a></h2>
<ul>
<li class="">OMP thread count reduced from 4 to 2 because Docling was spawning more CPU
threads than the pod's limit. The pod was being throttled.</li>
<li class=""><code>images_scale</code> pinned to 1.0, accelerator switched to <code>AUTO</code></li>
<li class="">Thread bootstrap broke entirely and had to be fixed</li>
<li class="">The parse method was decomposed to make tuning easier</li>
<li class="">OCR engine switched from EasyOCR to RapidOCR — which then had to be added to
the model sync workflow and the Docker defaults</li>
<li class="">OCR skipped entirely for native digital PDFs — meaning Docling's headline
feature wasn't being used for the most common input type</li>
</ul>
<p>On the same day as that last round of fixes, a VLM-based parser was added and validated in a few hours. It used Bedrock's document API directly. No local models. No thread budget. Just an API call.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="act-iv-the-quiet-betrayal">Act IV: The Quiet Betrayal<a href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/#act-iv-the-quiet-betrayal" class="hash-link" aria-label="Direct link to Act IV: The Quiet Betrayal" title="Direct link to Act IV: The Quiet Betrayal" translate="no">​</a></h2>
<p>A config change silently disabled Docling routing and enabled the VLM parsers by default. Docling was still in the codebase, still in the Docker image, still pulling in Torch and Tesseract and RapidOCR and a full HuggingFace model cache.</p>
<p>It was handling zero traffic.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="act-v-the-purge">Act V: The Purge<a href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/#act-v-the-purge" class="hash-link" aria-label="Direct link to Act V: The Purge" title="Direct link to Act V: The Purge" translate="no">​</a></h2>
<p>Gone. All of it.</p>
<ul>
<li class="">The Docling parser and all its tests</li>
<li class="">The S3 model sync CI workflow</li>
<li class="">All Docling-specific settings and constants</li>
<li class="">Torch, Tesseract, OpenCV from the build</li>
<li class="">The Docker entrypoint script</li>
<li class="">The thread-count tuning, the <code>images_scale</code> pin, all of it</li>
</ul>
<p>1,452 lines deleted from <code>uv.lock</code> alone.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-replaced-it">What replaced it<a href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/#what-replaced-it" class="hash-link" aria-label="Direct link to What replaced it" title="Direct link to What replaced it" translate="no">​</a></h2>
<p>Two parsers, one fallback chain, zero local ML models.</p>
<p>The primary path sends the raw document bytes directly to Bedrock as a document content block — one API call, no pre-processing. Claude handles layout, tables, and embedded images natively. The only constraint is an undocumented ~4.5 MB limit on document blocks; files over that automatically fall through to the fallback.</p>
<p>The fallback rasterises each page to an image using <code>pypdfium2</code> and sends them to Bedrock vision in parallel. DOCX files go through LibreOffice headless first to become a PDF, then hit the same rasterise-and-describe path.</p>
<p>The comparison:</p>
<table><thead><tr><th>Concern</th><th>Docling</th><th>Now</th></tr></thead><tbody><tr><td>PDF text extraction</td><td>Torch + Tesseract + layout models</td><td>Bedrock document block</td></tr><tr><td>Tables</td><td>Docling HTML table mode</td><td>Claude extracts to HTML natively</td></tr><tr><td>Images / figures</td><td>SmolVLM locally</td><td>Bedrock vision per page</td></tr><tr><td>DOCX</td><td>pydocx parser</td><td>LibreOffice → PDF → same VLM path</td></tr><tr><td>Dependencies</td><td>Torch, Tesseract, RapidOCR, OpenCV, HF models</td><td>pypdfium2, LibreOffice (system)</td></tr><tr><td>Infrastructure</td><td>S3 model sync pipeline, entrypoint bootstrap</td><td>Nothing — models live in Bedrock</td></tr></tbody></table>
<p>The entire local inference stack is gone. Parsing is now API calls with a lightweight rasterisation step as fallback for large files.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-lesson-if-there-is-one">The lesson, if there is one<a href="https://works-in-prod.github.io/we-spent-five-weeks-making-docling-work-then-we-deleted-it/#the-lesson-if-there-is-one" class="hash-link" aria-label="Direct link to The lesson, if there is one" title="Direct link to The lesson, if there is one" translate="no">​</a></h2>
<p>Docling required bundling a mini ML stack — Torch, Tesseract, RapidOCR, HuggingFace models — plus a dedicated S3 sync pipeline and several rounds of thread-count tuning, to do PDF parsing that a Bedrock API call does better with zero infrastructure overhead and a thirty-line parser class.</p>
<p>In hindsight the clue was in the very first day: merged and reverted before anyone had even run it in a container. The library wasn't broken — it did what it said. The mistake was not recognising early enough that we were running local inference on CPU-based nodes. Docling's layout models and OCR pipeline are designed for GPU workloads — on CPU they're slow by nature, not by misconfiguration. No amount of OMP thread tuning was going to fix that. We gave it a Kubernetes pod with a CPU limit and spent three weeks wondering why it was slow, when the answer was baked into the infrastructure choice from the start.</p>
<p>To be clear: Docling is a capable library. On GPU-backed infrastructure, with a proper model serving layer, it likely performs well — and there are self-hosted or air-gapped contexts where a managed API isn't an option and something like Docling is exactly the right tool. We may also have missed configuration options that would have helped. It's entirely possible that with different infrastructure or more time, we'd have got there.</p>
<p>But that's the point. The question isn't whether a tool works in the right environment — it's whether your environment is the right one for it. Running local inference on CPU nodes in an application service, when a managed API exists that does the job with less ops surface, is a mismatch. Not a failure of the tool. A failure of context.</p>
<p>We learnt it the expensive way.</p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="ai-engineering" term="ai-engineering"/>
        <category label="python" term="python"/>
        <category label="lessons-learned" term="lessons-learned"/>
        <category label="document-parsing" term="document-parsing"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[The Art of the Architecture Diagram Is Knowing What to Leave Out]]></title>
        <id>https://works-in-prod.github.io/the-art-of-the-architecture-diagram-is-knowing-what-to-leave-out/</id>
        <link href="https://works-in-prod.github.io/the-art-of-the-architecture-diagram-is-knowing-what-to-leave-out/"/>
        <updated>2026-05-21T07:00:00.000Z</updated>
        <summary type="html"><![CDATA[There's a particular type of diagram that engineers love producing and nobody can read. It has boxes for every service, arrows for every dependency, labels on each arrow explaining the protocol, and a legend in the corner that requires its own legend. It is, technically speaking, accurate. It is also completely useless.]]></summary>
        <content type="html"><![CDATA[<p>There's a particular type of diagram that engineers love producing and nobody can read. It has boxes for every service, arrows for every dependency, labels on each arrow explaining the protocol, and a legend in the corner that requires its own legend. It is, technically speaking, accurate. It is also completely useless.</p>
<p>I've been doing C4 modelling for a while now and I genuinely love it — not because it's fashionable, but because it gives you a proper mental model for what a diagram is actually <em>for</em>. System Context, Container, Component, Code. Four levels. Each one answers a different question for a different audience. The mistake most teams make is picking the wrong level, or worse, mixing levels in the same diagram because they couldn't decide and nobody wanted to have the argument.</p>
<p>The tooling has always been a headache. I tried a few diagram-as-code options and they all seemed to get in the way more than they helped. Recently switched to <a href="https://d2lang.com/" target="_blank" rel="noopener noreferrer" class="">D2</a> and it's been noticeably better. Minimal syntax, clean output, doesn't try to do seventeen things at once. It renders the diagram and gets out of your way.</p>
<p>The discipline isn't in drawing the boxes though — it's in deciding what <em>not</em> to draw. Engineers instinctively want to put everything in. Every dependency. Every call. Every integration they personally wired up at 11pm on a Thursday and feel a certain proprietary affection for. The result is a diagram that documents institutional knowledge and communicates nothing.</p>
<p>Here's the difference in practice. The noisy version — everything that's technically true:</p>
<img decoding="async" loading="lazy" src="https://works-in-prod.github.io/d2/blog/The%20Art%20of%20the%20Architecture%20Diagram%20Is%20Knowing%20What%20to%20Leave%20Out/0.svg" alt="d2 diagram" class="img_ev3q">
<p>The clean version — what the Container diagram should actually say:</p>
<img decoding="async" loading="lazy" src="https://works-in-prod.github.io/d2/blog/The%20Art%20of%20the%20Architecture%20Diagram%20Is%20Knowing%20What%20to%20Leave%20Out/1.svg" alt="d2 diagram" class="img_ev3q">
<p>Same system. The second one takes thirty seconds to understand. The first takes thirty minutes and a whiteboard session that everyone leaves more confused than when they arrived.</p>
<p>My preference is top-down layout — things flow downward, dependencies point in one direction, you can scan it. It's a small thing that makes a significant difference to whether a diagram reads as a story or as a pub quiz question.</p>
<p>AI tools make the arrows problem considerably worse. They're very close to the code — that's their whole thing — and if you ask one to generate an architecture diagram, it will dutifully render every import, every function call, every database relationship it can find. The result looks impressively comprehensive. It communicates approximately nothing.</p>
<p>Simplification requires a judgement call about what matters to the reader. That depends on who the reader is, what decision they're trying to make, and what level of detail actually serves them. Current AI tools don't have access to any of that context. A human has to make the call. That judgement is the actual skill the diagram is expressing.</p>
<p>The test I use: if someone new to the team can look at your diagram and explain back what the system does — without you hovering over their shoulder narrating — the diagram is doing its job. If they need you to explain the diagram, you've drawn a very expensive set of notes.</p>
<p>Pick your level. Remove the noise. Push back on the arrows.</p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="architecture" term="architecture"/>
        <category label="diagrams" term="diagrams"/>
        <category label="c4" term="c4"/>
        <category label="d2" term="d2"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Your Codebase Has Rules. Does CI Know That?]]></title>
        <id>https://works-in-prod.github.io/your-codebase-has-rules-does-ci-know-that/</id>
        <link href="https://works-in-prod.github.io/your-codebase-has-rules-does-ci-know-that/"/>
        <updated>2026-05-21T06:00:00.000Z</updated>
        <summary type="html"><![CDATA[There's a particular kind of meeting that happens on mixed teams. Someone's opened a pull request, and two engineers are staring at the same diff with completely different facial expressions. One is confused. The other is quietly furious. Neither is wrong, exactly — they just have entirely different mental models of what the codebase is supposed to look like.]]></summary>
        <content type="html"><![CDATA[<p>There's a particular kind of meeting that happens on mixed teams. Someone's opened a pull request, and two engineers are staring at the same diff with completely different facial expressions. One is confused. The other is quietly furious. Neither is wrong, exactly — they just have entirely different mental models of what the codebase is <em>supposed</em> to look like.</p>
<p>That's the drift I'm talking about. Not bugs. Not broken tests. Just two people who've been building in the same repository for months and have somehow ended up with incompatible ideas about what goes where.</p>
<p>The friction tends to come from different professional histories. Engineers who came up through software development often have layered architecture drilled in early. Engineers who came up through data science and ML often optimised for iteration speed over structure. Neither background is wrong — they just don't automatically agree on where things belong.</p>
<p>The answer, at least in part, is <a href="https://github.com/jwbargsten/pytest-archonon" target="_blank" rel="noopener noreferrer" class="">pytest-archon</a>. It lets you write tests that assert structural rules about your codebase. Not "does this function return the right value" — more like "nothing in the API layer should reach directly into the database layer." Rules you'd normally write in a wiki nobody reads, enforced as a test that CI will actually fail on.</p>
<p>Here's what that looks like:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> pytest_archon </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> archrule</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_api_does_not_import_database</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        archrule</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"api-layer-isolation"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">.</span><span class="token keyword" style="color:#00009f">match</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"myapp.api.*"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">should_not_import</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"myapp.database.*"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">check</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"myapp"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>That's it. That test will fail if anyone — human or otherwise — writes an API handler that imports a database model directly. No relying on the comment thread getting resolved. No wiki page that's accurate until it isn't. The build fails. The feedback is immediate.</p>
<p>Which brings me to the agentic angle, because this isn't just about human engineers anymore. When you're vibe-coding with an agent generating chunks of your codebase, the agent doesn't inherently know your architectural rules. It knows how to write Python. It does not know that your team decided six months ago that services should never instantiate repositories directly. Architecture tests give it the same feedback signal they give a human: <em>that's not how we do it here, try again.</em></p>
<p>The rules become the documentation. Living documentation, with teeth. If the structure is correct, the tests pass. If it drifts — whether from a human in a hurry or an agent that didn't know better — they don't.</p>
<p>We caught no bugs this way. We caught something slower and harder to fix than bugs: a gradual divergence in how the team understood the system. An AI engineer pulling in a service directly from an API handler because that's how you'd do it in a notebook. A software engineer quietly losing the will to review it because the conversation about why it's wrong is long and the PR queue is longer.</p>
<p>Architecture tests remove the conversation. The boundary is in the codebase. The codebase enforces it. Everyone, human and agent alike, gets the same feedback.</p>
<p>Getting ahead of drift is worth more than most people give it credit for.</p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="python" term="python"/>
        <category label="testing" term="testing"/>
        <category label="architecture" term="architecture"/>
        <category label="ai-engineering" term="ai-engineering"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[You Can't Debug What Bedrock Swallowed]]></title>
        <id>https://works-in-prod.github.io/you-cant-debug-what-bedrock-swallowed/</id>
        <link href="https://works-in-prod.github.io/you-cant-debug-what-bedrock-swallowed/"/>
        <updated>2026-05-21T05:00:00.000Z</updated>
        <summary type="html"><![CDATA[There's a particular kind of hell reserved for debugging LLM-backed systems that nobody bothered to instrument. You've got a request that took twelve seconds and you don't know if the slow part was your retrieval pipeline, the prompt construction, the Bedrock call itself, or the post-processing that turned the model's output into something you'd actually show a user. You have logs. You have vibes. You have, essentially, nothing.]]></summary>
        <content type="html"><![CDATA[<p>There's a particular kind of hell reserved for debugging LLM-backed systems that nobody bothered to instrument. You've got a request that took twelve seconds and you don't know if the slow part was your retrieval pipeline, the prompt construction, the Bedrock call itself, or the post-processing that turned the model's output into something you'd actually show a user. You have logs. You have vibes. You have, essentially, nothing.</p>
<p>We hit this early on an LLM project and it focused the mind quickly.</p>
<p>AWS Bedrock is opaque by design. You send a prompt, you get tokens back, and what happens between those two events isn't your concern. That's fine — it's not your model to look inside.</p>
<p>The problem is when that opacity bleeds into the code <em>you</em> wrote. Your retrieval logic, your prompt templates, your retry handling, your fallback paths — none of that needs to be a mystery. But without deliberate instrumentation, it becomes one anyway. You end up with a black box you built yourself, which is a considerably more embarrassing situation than the one Bedrock put you in.</p>
<p>Rather than manually sprinkling trace calls everywhere and inevitably missing the interesting bits, I wrote a Python decorator that wraps functions and methods automatically. Every call gets emitted as a span — class name, method name, duration, outcome — and it all folds into a single trace you can read in sequence:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> functools</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> time</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> opentelemetry </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> trace</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tracer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_tracer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">__name__</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">traced</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">func</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token decorator annotation punctuation" style="color:#393A34">@functools</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">wraps</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">func</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">wrapper</span><span class="token punctuation" style="color:#393A34">(</span><span class="token operator" style="color:#393A34">*</span><span class="token plain">args</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">**</span><span class="token plain">kwargs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        class_name </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> args</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">__class__</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">__name__ </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> args </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        span_name </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">class_name</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">.</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">func</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">__name__</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> class_name </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> func</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">__name__</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> tracer</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">start_as_current_span</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">span_name</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> span</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            start </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">perf_counter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">try</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> func</span><span class="token punctuation" style="color:#393A34">(</span><span class="token operator" style="color:#393A34">*</span><span class="token plain">args</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">**</span><span class="token plain">kwargs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">set_status</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">StatusCode</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">OK</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> result</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">except</span><span class="token plain"> Exception </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> exc</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">record_exception</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">exc</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">set_status</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">StatusCode</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ERROR</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">exc</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">raise</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">finally</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">set_attribute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"duration_ms"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">perf_counter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token plain"> start</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1000</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> wrapper</span><br></div></code></pre></div></div>
<p>Apply it to the functions you care about and suddenly your trace reads like a story. Vector search: 40ms. Prompt assembly: 2ms. That "fast" Bedrock call that's actually 3.8 seconds because you're using a large model with a 6,000-token context and no caching — that's visible now. The information was always there. You just couldn't see it.</p>
<p>The part I didn't anticipate: OpenTelemetry handles both technical traces <em>and</em> business metrics through the same pipeline. We used it to answer latency questions ("why did this request take four seconds?") and business questions at the same time ("how many users hit the fallback path today?", "what's our prompt cache hit rate this week?"). Same instrumentation layer, different dimensions. There's something satisfying about a monitoring setup that doesn't require you to maintain two separate systems with two separate mental models.</p>
<p>Here's the thing that surprised me most: a well-instrumented LLM pipeline can actually be easier to reason about than a lot of distributed systems. The order of operations is relatively clear, and when every step emits a span, you can read a trace like a timeline. The non-determinism of the model itself is a different problem — spans won't tell you why the model said what it said — but at least the plumbing stops being a mystery.</p>
<p>The opacity was never really about the LLM. It was about the code around the LLM that we hadn't bothered to make visible.</p>
<p>What I took from this: don't leave observability as something to add later when things go wrong. Wire it in from the start — it's the interface you build for yourself so that when Bedrock starts behaving oddly at 11pm, you have structured data to work with rather than a twelve-second request duration and a shrug.</p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="ai-engineering" term="ai-engineering"/>
        <category label="observability" term="observability"/>
        <category label="python" term="python"/>
        <category label="opentelemetry" term="opentelemetry"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[TDD Was Solving the Agent Problem Before Agents Existed]]></title>
        <id>https://works-in-prod.github.io/tdd-was-solving-the-agent-problem-before-agents-existed/</id>
        <link href="https://works-in-prod.github.io/tdd-was-solving-the-agent-problem-before-agents-existed/"/>
        <updated>2026-05-21T04:00:00.000Z</updated>
        <summary type="html"><![CDATA[The first time I set an agent loose on a real codebase, it ran out of context before it had done anything useful. That's a clarifying experience.]]></summary>
        <content type="html"><![CDATA[<p>The first time I set an agent loose on a real codebase, it ran out of context before it had done anything useful. That's a clarifying experience.</p>
<p>The repository wasn't exotic — a Python monorepo with shared libraries and some infrastructure code. I drew a diagram to understand what was happening. A rectangle for the full context window; blocks for what was already consumed just from loading the codebase: directory tree, CLAUDE.md, relevant modules, config, dependencies. The bar was more than half full before the agent had read a single line of task context or seen a single error message.</p>
<p>That image stuck with me. Half the agent's working memory gone on orientation alone. And the uncomfortable follow-up question: whose fault is that?</p>
<p>Mostly ours, it turns out.</p>
<p>A codebase with fuzzy boundaries, large unfocused modules, and implicit conventions forces the agent to do the same orientation work a new engineer would do — except a new engineer can ask questions, build intuition over weeks, and remember what they learned yesterday. With the tools I've been using, there's no persistent memory between sessions by default. Every session is effectively day one. The codebase has to compensate for what the agent can't retain.</p>
<p>There's a body of practice — going back about twenty-five years — that points in exactly this direction. We just didn't know we were solving this particular problem at the time.</p>
<p>TDD and the XP practices around it — simple design, ruthless refactoring, tests as documentation — produce exactly the properties that make a codebase agent-readable. Small focused units with explicit interfaces. Behaviour described in tests rather than buried in implementation. No accidental complexity quietly accumulating in corners. Clear boundaries that tell you where one thing ends and another begins.</p>
<p>None of this is new. But the agentic era has made the value of it more visible. The "too much to hold in your head at once" problem that TDD was designed to address is the same problem the context window makes concrete. A codebase with small focused units and tests that describe behaviour fits into an agent's context cleanly. One where complexity has accumulated unchecked — regardless of how it got there — does not.</p>
<p>Tests also do something specific for agents that code alone can't: they describe intended behaviour without requiring the agent to read the implementation. A test called <code>test_chat_service_returns_error_on_empty_prompt</code> tells the agent more in one line than several hundred lines of service code could. When an agent needs to understand a boundary, it reads the tests. Targeted context. Problem contained.</p>
<p>The cost angle is real too. Context is billed by the token. An agent flailing around a poorly structured codebase — re-reading files, tracing implicit dependencies, inferring conventions that should be explicit — is burning money before it's produced anything. Good structure isn't just clean, it's cheap.</p>
<p>This also connects to the current conversation around "vibe-coding" and agents generating code freely. From what I've seen, the concern isn't really about whether the agent can write working code — it often can. The concern is whether the codebase it's writing into has enough structure to keep the output coherent over time. Architecture tests help here too: codify the rules, and both human and agent get immediate feedback when something drifts from the intended shape.</p>
<p>The agentic era didn't invent a new problem. It gave us a new, very legible way to feel the cost of one we'd been politely ignoring for years.</p>
<p>TDD and XP have always pushed toward properties — small units, explicit interfaces, behaviour-as-tests — that turn out to be just as valuable for agents as they are for humans. The reasons stack up.</p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="ai-engineering" term="ai-engineering"/>
        <category label="tdd" term="tdd"/>
        <category label="developer-experience" term="developer-experience"/>
        <category label="agile" term="agile"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[The Blockers Don't Care That You're Using AI]]></title>
        <id>https://works-in-prod.github.io/the-blockers-dont-care-that-youre-using-ai/</id>
        <link href="https://works-in-prod.github.io/the-blockers-dont-care-that-youre-using-ai/"/>
        <updated>2026-05-21T03:00:00.000Z</updated>
        <summary type="html"><![CDATA[I wrote a post back in 2021 about walking skeletons — the idea that before you go deep on features, you ship something thin and deployable end-to-end. Not because it's useful to users, but because it flushes out the real blockers while the cost of finding them is still low. Permissions. Pipelines. Infrastructure assumptions that looked fine on a whiteboard.]]></summary>
        <content type="html"><![CDATA[<p>I wrote a post back in 2021 about walking skeletons — the idea that before you go deep on features, you ship something thin and deployable end-to-end. Not because it's useful to users, but because it flushes out the real blockers while the cost of finding them is still low. Permissions. Pipelines. Infrastructure assumptions that looked fine on a whiteboard.</p>
<p>That advice hasn't aged out. If anything, AI projects have made it more relevant, not less.</p>
<p>Here's the thing about AI systems specifically: the surface area for things to silently go wrong is larger. You've got model integrations, inference infrastructure, data pipelines, prompt management, evaluation loops, and whatever cloud hoops your organisation has decided to add on top. Any one of those can be perfectly fine in isolation and a disaster when wired together in a real environment. The feedback loop — getting something running end-to-end early — serves exactly the same purpose it always did. You learn what's actually broken before you've written ten thousand lines of feature code around it.</p>
<p>I saw this recently on an AI project. We pushed to establish the skeleton early, before anyone could argue we weren't ready. And yes, we hit the usual suspects: IAM permissions that looked correct until they didn't, model API access that needed a different approval path than expected, tooling that worked locally and had strong opinions about containers. None of it was AI-specific. I've been burned by the same class of problems in every non-trivial project I've worked on since before "AI engineering" was a job title.</p>
<p>What was different was the speed at which we got through it. That's where the AI era genuinely does change things — not the nature of the blockers, but the time between "we found the problem" and "we fixed it." Digging through IAM policies, drafting the right internal request, figuring out the correct incantation for whichever cloud service had opinions today — all of it moved faster with AI tooling alongside. It's a multiplier on the resolution side, not the discovery side.</p>
<p>Which is worth saying plainly: AI tooling doesn't help you find blockers you never looked for. The skeleton is still the mechanism for surfacing them. You still have to commit to the discipline of doing it early, before the temptation to just build features wins.</p>
<p>That temptation is strong. AI tooling makes feature development feel fast. You can go from idea to working prototype in a morning. That speed makes it very easy to go deep before you've established whether any of it will actually run in production. And then you've got a lot of impressive-looking code and a pipeline that doesn't exist yet.</p>
<p>The honest caveat: this is still a discipline problem more than a tooling problem. AI makes the fixing faster, but it can't make you look for trouble before you think you need to. The teams that skipped the walking skeleton before will probably still skip it now — just with faster excuses.</p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="ai-engineering" term="ai-engineering"/>
        <category label="software-delivery" term="software-delivery"/>
        <category label="walking-skeleton" term="walking-skeleton"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[The Metric Your Users Feel Before You Measure It]]></title>
        <id>https://works-in-prod.github.io/the-metric-your-users-feel-before-you-measure-it/</id>
        <link href="https://works-in-prod.github.io/the-metric-your-users-feel-before-you-measure-it/"/>
        <updated>2026-05-21T02:00:00.000Z</updated>
        <summary type="html"><![CDATA[Working on a streaming chat product taught me something: the standard latency metrics don't really describe what users experience. They're not waiting for a page to load or an API to return a JSON blob. They're watching tokens appear — and what they feel before anything appears is the thing most teams aren't measuring.]]></summary>
        <content type="html"><![CDATA[<p>Working on a streaming chat product taught me something: the standard latency metrics don't really describe what users experience. They're not waiting for a page to load or an API to return a JSON blob. They're watching tokens appear — and what they feel before anything appears is the thing most teams aren't measuring.</p>
<p>That thing is time-to-first-token. TTFT.</p>
<p>I ran into this while load testing a streaming chat endpoint. The obvious thing to reach for is <code>http_req_duration</code> — it's right there in k6, it captures how long the request took, job done. Except it isn't. For a streaming LLM response, <code>http_req_duration</code> captures the entire stream from first byte sent to last byte received. If your model takes two seconds to start streaming and then streams for eight seconds, your p95 latency looks like ten seconds. That tells you almost nothing about whether the product feels responsive.</p>
<p>What matters to the person using a chat interface is: how long until <em>something</em> appears? A response that starts in 800ms and streams for thirty seconds feels fast. A response that sits blank for four seconds then dumps everything at once feels broken — even if the total duration is shorter.</p>
<p>That gap between "request sent" and "first content chunk received" is TTFT, and it's the metric that actually describes streaming UX.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="measuring-it">Measuring it<a href="https://works-in-prod.github.io/the-metric-your-users-feel-before-you-measure-it/#measuring-it" class="hash-link" aria-label="Direct link to Measuring it" title="Direct link to Measuring it" translate="no">​</a></h2>
<p>Standard k6 doesn't parse SSE streams, so you need a custom binary built with <a href="https://github.com/phymbert/xk6-sse" target="_blank" rel="noopener noreferrer" class="">xk6-sse</a>:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">xk6 build --with github.com/phymbert/xk6-sse</span><br></div></code></pre></div></div>
<p>Then define a custom <code>Trend</code> metric and record it the moment the first content event arrives:</p>
<div class="language-javascript codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-javascript codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword module" style="color:#00009f">import</span><span class="token plain"> </span><span class="token imports">sse</span><span class="token plain"> </span><span class="token keyword module" style="color:#00009f">from</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"k6/x/sse"</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword module" style="color:#00009f">import</span><span class="token plain"> </span><span class="token imports punctuation" style="color:#393A34">{</span><span class="token imports"> </span><span class="token imports maybe-class-name">Trend</span><span class="token imports"> </span><span class="token imports punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token keyword module" style="color:#00009f">from</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"k6/metrics"</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> ttft </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">new</span><span class="token plain"> </span><span class="token class-name">Trend</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"ttft_s"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword module" style="color:#00009f">export</span><span class="token plain"> </span><span class="token keyword module" style="color:#00009f">default</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">function</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> start </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token known-class-name class-name">Date</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">now</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">let</span><span class="token plain"> firstTokenRecorded </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">false</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  sse</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">url</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> params</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token parameter">client</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token arrow operator" style="color:#393A34">=&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">on</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"event"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token parameter">event</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token arrow operator" style="color:#393A34">=&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token keyword control-flow" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token operator" style="color:#393A34">!</span><span class="token plain">firstTokenRecorded </span><span class="token operator" style="color:#393A34">&amp;&amp;</span><span class="token plain"> event</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">data</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">!==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"[DONE]"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        ttft</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">add</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">(</span><span class="token known-class-name class-name">Date</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">now</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token plain"> start</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1000</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        firstTokenRecorded </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<p>Critically: read the stream to completion even after you've recorded TTFT. Dropping the connection early skews the load profile — the server is still doing work and your test stops accounting for it.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="load-profile-design">Load profile design<a href="https://works-in-prod.github.io/the-metric-your-users-feel-before-you-measure-it/#load-profile-design" class="hash-link" aria-label="Direct link to Load profile design" title="Direct link to Load profile design" translate="no">​</a></h2>
<p>For observational load testing, a ramp-and-hold pattern gives you clean steady-state numbers to actually reason about:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">VUs</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">100 ┤               ▄▄▄▄▄▄▄</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"> 50 ┤         ▄▄▄▄▄▀       ▀▄▄▄▄▄</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"> 10 ┤   ▄▄▄▄▄▀                   ▀▄▄▄▄▄</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  0 ┤──▀                               ▀──</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      ramp  hold   ramp   hold   ramp  hold</span><br></div></code></pre></div></div>
<p>During a ramp phase, concurrency is in transition — exclude those samples from headline stats. During a hold phase, you have stable concurrency and predictable sample counts. Tag every sample with its phase and VU target so you can filter cleanly in post-processing.</p>
<p>One request per VU per hold window also gives you deterministic sample counts: 50 VUs × 60s hold = exactly 50 requests. Reproducible, comparable across runs.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-you-actually-learn">What you actually learn<a href="https://works-in-prod.github.io/the-metric-your-users-feel-before-you-measure-it/#what-you-actually-learn" class="hash-link" aria-label="Direct link to What you actually learn" title="Direct link to What you actually learn" translate="no">​</a></h2>
<p>Here's where it gets interesting. Once you have TTFT as a real metric, you start seeing things that <code>http_req_duration</code> completely hides.</p>
<p>What we found was that different models have quite different characteristics. Some are fast to start and slow to finish. Some are the opposite. Prompt caching had a measurable effect in our tests — a cache hit on a large system prompt shaved hundreds of milliseconds off first token time, and we wouldn't have seen that signal at all if we'd only been watching total duration.</p>
<p>We also saw how concurrency affects <em>perceived</em> responsiveness differently than it affects throughput. At low VU counts the TTFT p95 looked fine. Add more concurrent users and TTFT degraded before throughput did — which means users start feeling slowness before the dashboards register a problem. Your mileage will vary depending on model and infrastructure, but it's worth checking.</p>
<p>These are the kinds of insights that only exist once you're measuring the right thing. Total request duration isn't the wrong metric — it's just the wrong <em>first</em> metric for a product where the UX is a stream.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="set-a-concrete-target-before-you-test">Set a concrete target before you test<a href="https://works-in-prod.github.io/the-metric-your-users-feel-before-you-measure-it/#set-a-concrete-target-before-you-test" class="hash-link" aria-label="Direct link to Set a concrete target before you test" title="Direct link to Set a concrete target before you test" translate="no">​</a></h2>
<p>Before running at scale, decide what good looks like. What is an acceptable p95 TTFT at your expected concurrency? Write it down before you look at the numbers — otherwise you'll rationalise whatever you find.</p>
<p>Your target will depend on your model, your infrastructure, your users' expectations, and honestly, what your provider can actually deliver under load. The test reveals that; it doesn't guarantee it. But without a target, you're just generating numbers.</p>
<hr>
<p><em>TTFT is one dimension. Token throughput — how fast the stream itself moves once started — is another. Both matter, and they can point in opposite directions. Worth measuring separately.</em></p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="ai-engineering" term="ai-engineering"/>
        <category label="observability" term="observability"/>
        <category label="performance" term="performance"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Stop Arguing With Your Terminal About Python Versions]]></title>
        <id>https://works-in-prod.github.io/stop-arguing-with-your-terminal-about-python-versions/</id>
        <link href="https://works-in-prod.github.io/stop-arguing-with-your-terminal-about-python-versions/"/>
        <updated>2026-05-21T01:00:00.000Z</updated>
        <summary type="html"><![CDATA[Project setup should be boring. Not in a "this is beneath me" way — in a "this takes thirty seconds and I never think about it" way.]]></summary>
        <content type="html"><![CDATA[<p>Project setup should be boring. Not in a "this is beneath me" way — in a "this takes thirty seconds and I never think about it" way.</p>
<p>Most of the time, it isn't.</p>
<p>You clone a repo you haven't touched in six months. The README says "install Python 3.11". You have 3.12. Something breaks. You remember there's a <code>.python-version</code> file somewhere, or maybe a <code>requires-python</code> in <code>pyproject.toml</code>. You spend twenty minutes figuring out which version manager you're even supposed to be using for this project before you've written a single line of code.</p>
<p>This isn't a Python problem. Terraform has <code>tfenv</code>. Node has <code>nvm</code> or <code>volta</code> or <code>.nvmrc</code>. Every language brings its own version manager, its own config file format, its own way of silently using the wrong version. And that's before you even get to figuring out how to run things — is it <code>make test</code>? <code>./scripts/test.sh</code>? Some npm script buried in a <code>package.json</code>? Nobody knows. You ask Slack.</p>
<p>I got tired of this and started using <a href="https://mise.jdx.dev/" target="_blank" rel="noopener noreferrer" class="">mise</a>. It's a single tool that handles both problems: pinned runtimes and discoverable tasks, for any language, in one file.</p>
<p>A Python service looks like this:</p>
<div class="language-toml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-toml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">[tools]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">python = "3.12.3"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[tasks.test]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">description = "Run the test suite"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">run = "pytest"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[tasks.verify]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">description = "Run all checks before pushing"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">depends = ["test", "build"]</span><br></div></code></pre></div></div>
<p>Run <code>mise install</code> and you get exactly that Python version. Run <code>mise tasks</code> and you see everything the project knows how to do. Run <code>mise run verify</code> before pushing. That's it.</p>
<p>The part I find most satisfying is that <code>mise run &lt;task&gt;</code> becomes a stable interface that hides whatever's behind it. I had a project that needed a custom k6 binary with SSE support for load testing a streaming API. Building it requires Go and a tool called <code>xk6</code>, which most people have never heard of. With mise, that's just:</p>
<div class="language-toml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-toml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">[tools]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">go = "1.22.3"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[tasks.build]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">description = "Build k6 with xk6-sse extension"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">run = "xk6 build --with github.com/phymbert/xk6-sse"</span><br></div></code></pre></div></div>
<p>Now <code>mise run build</code> works for everyone — the developer who knows what xk6 is, the one who doesn't, and the CI job. Nobody has to know what's behind it. When I added another extension later, I changed one line. The interface didn't move.</p>
<p>Speaking of CI — this is where the real payoff is. A GitHub Actions workflow for a mise project looks like:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">jobs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">verify</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">runs-on</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ubuntu</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">latest</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">steps</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">uses</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> actions/checkout@v4</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">uses</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> jdx/mise</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">action@v4</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">run</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> mise run verify</span><br></div></code></pre></div></div>
<p><code>mise-action</code> reads <code>mise.toml</code>, installs the pinned versions, and puts them on <code>PATH</code>. Then <code>mise run verify</code> runs the exact same thing you run locally. No separate version install steps. No drift between what CI checks and what you check. This is the thing that makes it worth the setup cost — CI and local are no longer two separate mental models.</p>
<p>The one thing mise can't do is install itself. You need it on the machine before any of this works. I solve that with <a href="https://www.chezmoi.io/" target="_blank" rel="noopener noreferrer" class="">Chezmoi</a>, a dotfile manager that runs once on a fresh machine. A <code>run_once_install-mise.sh</code> script does the bootstrap:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">#!/bin/sh</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">curl https://mise.run | sh</span><br></div></code></pre></div></div>
<p>Then the shell hook in <code>~/.zshrc</code> (also managed by Chezmoi) activates mise per directory:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">eval "$(mise activate zsh)"</span><br></div></code></pre></div></div>
<p>Chezmoi sets up the machine, mise sets up each project. Neither knows the other exists. You go from a blank laptop to a running project without reading a setup guide — which is the point.</p>
<p>It won't fix an undocumented deployment process or a service that can't run locally. It encodes what's already known. And if your team is already settled on <code>nvm</code> + <code>make</code> for a single-language, single-runtime project, the migration cost might not be worth it. The value really compounds when you're working across multiple projects or switching between them regularly — which, in my experience, is most of the time.</p>
<hr>
<p><em><code>mise</code> replaces <code>pyenv</code>, <code>nvm</code>, <code>rbenv</code>, <code>tfenv</code>, <code>asdf</code>, and most other per-language version managers. If you're on <code>asdf</code> already, migration is painless — <code>mise</code> reads <code>.tool-versions</code> files natively.</em></p>]]></content>
        <author>
            <name>Danish Javed</name>
            <uri>https://github.com/ambersariya</uri>
        </author>
        <category label="tooling" term="tooling"/>
        <category label="developer-experience" term="developer-experience"/>
        <category label="mise" term="mise"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Vscode Snippet To Add Markdown Frontmatter]]></title>
        <id>https://works-in-prod.github.io/vscode-snippet-to-add-markdown-frontmatter/</id>
        <link href="https://works-in-prod.github.io/vscode-snippet-to-add-markdown-frontmatter/"/>
        <updated>2022-05-19T11:46:52.000Z</updated>
        <summary type="html"><![CDATA[1. Click on settings for VSCode]]></summary>
        <content type="html"><![CDATA[<ol>
<li class="">Click on settings for VSCode</li>
<li class="">Click on "User Snippets</li>
<li class="">Click on "New Global Snippets File..."</li>
<li class="">Add the following JSON which will be limited to markdown files only</li>
</ol>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"Add Docusaurus blog frontmatter"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"body"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"---"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"draft: true"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"modified: ${CURRENT_YEAR}-${CURRENT_MONTH}-${CURRENT_DATE}T${CURRENT_HOUR}:${CURRENT_MINUTE}:${CURRENT_SECOND}.000Z"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"date: ${CURRENT_YEAR}-${CURRENT_MONTH}-${CURRENT_DATE}T${CURRENT_HOUR}:${CURRENT_MINUTE}:${CURRENT_SECOND}.000Z"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"title: ${TM_FILENAME_BASE/(\\w.*)/${1:/capitalized}/}"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"slug: ${TM_FILENAME_BASE/([\\w-]+$)|([\\w-]+)|([-\\s]+)|([^\\w]+)/${1:/downcase}${2:/downcase}${2:+-}/gm}"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"---"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"description"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Create Blogpost Frontmatter"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"scope"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"markdown,mdx,md"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"prefix"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"blog"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"draft blog"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"frontmatter"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"add frontmatter"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>]]></content>
        <category label="docusaurus" term="docusaurus"/>
        <category label="blog" term="blog"/>
        <category label="how-to" term="how-to"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Establishing A Walking Skeleton For Projects]]></title>
        <id>https://works-in-prod.github.io/establishing-a-walking-skeleton-for-projects/</id>
        <link href="https://works-in-prod.github.io/establishing-a-walking-skeleton-for-projects/"/>
        <updated>2021-09-16T11:56:19.338Z</updated>
        <summary type="html"><![CDATA[I've been reading the excellent book Growing Object-Oriented Software, Guided By Tests and there's so much that resonated with me about starting work on a new project.]]></summary>
        <content type="html"><![CDATA[<p>I've been reading the excellent book <a href="https://www.goodreads.com/en/book/show/4268826-growing-object-oriented-software-guided-by-tests" target="_blank" rel="noopener noreferrer" title="Growing Object-Oriented Software, Guided By Tests" class="">Growing Object-Oriented Software, Guided By Tests</a> and there's so much that resonated with me about starting work on a new project.</p>
<p>As with anything new, give developers some shiny new something to work on and there's always the temptation to dive right in and get started with code. This often means that you're starting from the inside-out of a problem space and often some operational details are overlooked. When we're done solving that problem, trying to release that or to push that to production is often a problem nobody had perceived.</p>
<p>I recently experienced this on a project where we'd resorted to creating the application locally to put that online later. We had an idea of things like tech limitations and choices at the time, and deferring that decision seemed right, but it later came to bite us when we wanted to release the first feature.</p>
<p>We had roadblocks after one another, these came in the form of security policies, technology choices and release process already in place and trying something new. This whole thing cost us a couple of months of back and forth between dev/ops/admin folks.</p>
<p>So if I could tell my past self, I would say, release early and release often even if it means releasing the project skeleton in a hello world state.</p>
<p>In the context of the book I've been reading, establishing a walking skeleton is hugely important.</p>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Journey To The Centre Of The Stack]]></title>
        <id>https://works-in-prod.github.io/journey-to-the-centre-of-the-stack/</id>
        <link href="https://works-in-prod.github.io/journey-to-the-centre-of-the-stack/"/>
        <updated>2020-11-30T11:00:00.000Z</updated>
        <summary type="html"><![CDATA[Containerising legacy software has always been a journey into the unknown. What's changed is who you take with you.]]></summary>
        <content type="html"><![CDATA[<p>I first wrote this post in 2020 after spending several weeks containerising a legacy application I hadn't built and didn't fully understand. The experience was mostly archaeology — reading old config files, tracing hardcoded paths, figuring out what half a dozen processes actually did before touching anything. By the time I had a working Docker image, I'd earned it.</p>
<p>I'm updating it now because the journey has changed, and I think it's worth being honest about how.</p>
<p>The destination is the same. Legacy modernisation still means diving into unfamiliar depth, finding the load-bearing assumptions nobody documented, and making a series of architectural decisions that will outlive the sprint you're in. None of that has gone away.</p>
<p>What's changed is the discovery phase. The part where you spend half a day grepping through twelve config files to find every hardcoded <code>/tmp</code> path. The part where you read three hundred lines of an entrypoint script to understand what order things start in. The part where you're trying to build a mental model of a system from first principles because the person who built it left two years ago.</p>
<p>That part is cheaper now. Not free — cheaper. And for senior engineers in particular, that matters more than it might sound.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-cognitive-load-is-the-real-cost">The cognitive load is the real cost<a href="https://works-in-prod.github.io/journey-to-the-centre-of-the-stack/#the-cognitive-load-is-the-real-cost" class="hash-link" aria-label="Direct link to The cognitive load is the real cost" title="Direct link to The cognitive load is the real cost" translate="no">​</a></h2>
<p>When you're working in unfamiliar legacy code, there's a ceiling on how much you can think about architecture while simultaneously trying to understand what you're looking at. The mental budget goes to comprehension first and decision- making second.</p>
<p>AI tooling shifts that balance. You can ask an agent to map the dependency graph, find all the places a config value is used, summarise what a given module does, or trace what happens to a file after it's uploaded. It doesn't always get this perfectly right, but it gets you oriented faster. And orientation is the precondition for good architectural thinking.</p>
<p>The senior engineer's job in a legacy modernisation isn't to read every file — it's to understand the system well enough to make the right calls. AI handles more of the reading. You do more of the deciding. That's a reasonable trade.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-still-requires-a-human">What still requires a human<a href="https://works-in-prod.github.io/journey-to-the-centre-of-the-stack/#what-still-requires-a-human" class="hash-link" aria-label="Direct link to What still requires a human" title="Direct link to What still requires a human" translate="no">​</a></h2>
<p>This is the part worth being direct about: you cannot vibe code your way through a legacy containerisation.</p>
<p>Legacy systems have accumulated tradeoffs that aren't visible in the code itself. A hardcoded path exists for a reason. Session storage lives in a particular place because of a deployment constraint nobody remembered to remove. An environment variable has a default that only works in production because of how the CI pipeline was wired up five years ago.</p>
<p>An agent will find the path, tell you what it does, maybe suggest you move it to an env var. What it can't tell you is whether that change will break the cron job on the production server that six business processes depend on — the one that isn't in any of the tests because it predates the testing culture.</p>
<p>AI tools don't have access to the organisational context: the deployment constraints, the team agreements, the compliance requirements, the reason something was done a particular way three years ago. That knowledge lives in people, not code. And in legacy systems, it's often the most important knowledge there is.</p>
<p>That's still yours. The judgment about which changes are safe, which tradeoffs are real, which "quick wins" are landmines with a friendly face — that's the work. AI lowers the cost of getting to the point where you can make those calls. It doesn't make the calls for you.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-practical-shape-of-it-now">The practical shape of it now<a href="https://works-in-prod.github.io/journey-to-the-centre-of-the-stack/#the-practical-shape-of-it-now" class="hash-link" aria-label="Direct link to The practical shape of it now" title="Direct link to The practical shape of it now" translate="no">​</a></h2>
<p>For what it's worth, here's roughly how I'd approach a legacy containerisation today, with AI tooling in the picture:</p>
<p><strong>Discovery first.</strong> Ask the agent to map the application — what processes run, what config files exist, what external services are referenced, what file paths are hardcoded. Treat this as a starting point for your own investigation, not a definitive answer. Legacy codebases often have behaviour that doesn't show up in a static read.</p>
<p><strong>Identify the decisions, not just the tasks.</strong> The genuine work in a containerisation is a small number of architectural choices — how to handle sessions, where persistent storage lives, how secrets are managed, how the application handles multiple running instances. Everything else is mechanics. Get to the decisions faster and spend your time there.</p>
<p><strong>Keep the base image simple.</strong> If you have multiple applications sharing a common runtime, extract a base image. Agent tooling is good at spotting what's common across applications — use it for that comparison work.</p>
<p><strong>Externalise everything that changes between environments.</strong> File paths, URLs, secrets, feature flags — environment variables. Not because it's clever, but because it's the minimum requirement for any container to be operationally useful. This hasn't changed.</p>
<p><strong>Test incrementally.</strong> Don't wait until the Dockerfile is complete to run the application. Run it as early as possible, find the first thing that breaks, fix it, repeat. The agent can help write tests for the areas you're refactoring, but you need to know which areas matter enough to test.</p>
<p><strong>Understand what you're committing to.</strong> Every dependency you add to the container — a system library, an OCR engine, a local model — is infrastructure you're now responsible for. The Docling post on this blog exists because we learnt that lesson the long way.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-title-still-fits">The title still fits<a href="https://works-in-prod.github.io/journey-to-the-centre-of-the-stack/#the-title-still-fits" class="hash-link" aria-label="Direct link to The title still fits" title="Direct link to The title still fits" translate="no">​</a></h2>
<p>The journey to the centre of the stack is still a journey. The technology has changed, the tooling has improved, and you have better company on the way down than you did five years ago. But the stack is still there, and the centre of it still contains the decisions that matter.</p>
<p>The difference is that you can now spend more of your cognitive budget on the part that actually requires experience to get right.</p>
<p>That seems like a reasonable upgrade.</p>]]></content>
        <category label="docker" term="docker"/>
        <category label="legacy-software" term="legacy-software"/>
        <category label="modernisation" term="modernisation"/>
        <category label="ai-engineering" term="ai-engineering"/>
        <category label="developer-experience" term="developer-experience"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[JSON Web Tokens]]></title>
        <id>https://works-in-prod.github.io/JSON-WEB-TOKEN/</id>
        <link href="https://works-in-prod.github.io/JSON-WEB-TOKEN/"/>
        <updated>2017-02-28T11:45:44.128Z</updated>
        <summary type="html"><![CDATA[Repost from https://medium.com/@ambersariya/jwt-json-web-token-cd90ef7a7a66]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-is-it">What is it?<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#what-is-it" class="hash-link" aria-label="Direct link to What is it?" title="Direct link to What is it?" translate="no">​</a></h2>
<blockquote>
<p><em>JSON Web Token (JWT) is a compact, URL-safe means of representing claims to be transferred between two parties. The claims in a JWT are encoded as a JSON object that is used as the payload of a JSON Web Signature (JWS) structure or as the plaintext of a JSON Web Encryption (JWE) structure, enabling the claims to be digitally signed or integrity protected with a Message Authentication Code (MAC) and/or encrypted.</em></p>
</blockquote>
<p>JSON Web Tokens are an open, industry-standard <a href="https://tools.ietf.org/html/rfc7519" target="_blank" rel="noopener noreferrer" class=""><strong>RFC 7519</strong></a> method for representing claims securely between two parties. See here: <a href="https://jwt.io/" target="_blank" rel="noopener noreferrer" title="https://jwt.io" class="">https://jwt.io</a></p>
<p>In this context, "claim" can be something like a "command", a one-time authorization, or basically any other scenario that you can word as:</p>
<blockquote>
<p><em>Hello Server B, Server A told me that I could "<strong>claim goes here</strong>", and here’s the (cryptographic) proof.</em></p>
</blockquote>
<p>Before we dive into this further, I’d like to define some terms we use in the realm of authentication.</p>
<blockquote>
<p><strong><em>Authentication</em></strong> <em>— Proving who you are</em></p>
<p><strong><em>Authorization</em></strong> <em>— Being granted access to resources</em></p>
<p><strong><em>Token</em></strong> <em>— medium used to persist authentication and get authorization</em></p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="so-what-does-it-look-like">So, what does It Look Like?<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#so-what-does-it-look-like" class="hash-link" aria-label="Direct link to So, what does It Look Like?" title="Direct link to So, what does It Look Like?" translate="no">​</a></h2>
<p>Well, it looks like another confusing looking string</p>
<p>eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9.TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ</p>
<p>Upon closer inspections, you’ll see that this JWT consist of three parts separated by dots (<code>.</code>), which are:</p>
<ul>
<li class="">
<p>Header</p>
</li>
<li class="">
<p>Payload</p>
</li>
<li class="">
<p>Signature</p>
<p>Header.Payload.Signature</p>
</li>
</ul>
<p>So, let’s break it down a little:</p>
<div class="language-js codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-js codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">// header</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// payload</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// signature</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access maybe-class-name">TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="header">Header<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#header" class="hash-link" aria-label="Direct link to Header" title="Direct link to Header" translate="no">​</a></h3>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">HS256 indicates that this token is signed using HMAC-SHA256.</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"alg"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"HS256"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"typ"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"JWT"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="claimspayload">Claims/Payload<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#claimspayload" class="hash-link" aria-label="Direct link to Claims/Payload" title="Direct link to Claims/Payload" translate="no">​</a></h3>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">The payload contains the claims that we wish to make</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"sub"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1234567890"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"John Doe"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"admin"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="signature">Signature<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#signature" class="hash-link" aria-label="Direct link to Signature" title="Direct link to Signature" translate="no">​</a></h3>
<p>We use the following formula to calcalate signature</p>
<div class="language-js codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-js codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token constant" style="color:#36acaa">HMACSHA256</span><span class="token punctuation" style="color:#393A34">(</span><span class="token function" style="color:#d73a49">encodeBase64</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">header</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"."</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">encodeBase64</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">payload</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> secret</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>This then gives us something like:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">thiseyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9.gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI</span><br></div></code></pre></div></div>
<p>Let’s expand on the claims section of JWT. The following claims are part of the RFC document:</p>
<p><strong>iss</strong>: who is the issuer of this token auth.example.com <strong>sub</strong>: what is the subject of this token e.g. auth <strong>aud</strong>: who can use this token e.g ['client1.example.com','client2.example.com'] <strong>exp</strong>: Defines the expiration time as unix timestamp e.g. 1488192525 <strong>nbf</strong>: define how long after the issued token was generated we can use it e.g. 300 seconds (5 minutes) <strong>iat</strong>: issued at is a unix timestamp e.g. 1488192525 <strong>jti</strong>: JWT ID unique id. This can be used to prevent a token from being replayed e.g. "xa443D"</p>
<p>The key names are case sensitive and have been kept small to keep the JSON payload compact.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-does-the-authentication-flow-work">How does the Authentication Flow work?<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#how-does-the-authentication-flow-work" class="hash-link" aria-label="Direct link to How does the Authentication Flow work?" title="Direct link to How does the Authentication Flow work?" translate="no">​</a></h2>
<p>In authentication, when the user successfully logs in using their credentials, a JSON Web Token will be returned and must be saved locally (typically in local storage, but cookies can be also used), instead of the traditional approach of creating a session in the server and returning a cookie.</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">POST /login</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    email</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"username@example-domain.com"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    password</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"5£cUr3PA$$W0rd!"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">Response 201 Created</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    token</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9.TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<p>Any subsequent calls to the API would typically send the Authorization header using the Bearer schema.</p>
<div class="language-js codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-js codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token literal-property property" style="color:#36acaa">Authorization</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Bearer myToken"</span><br></div></code></pre></div></div>
<p>Therefore the content of the header should look like the following.</p>
<div class="language-js codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">GET /</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-js codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain"># </span><span class="token maybe-class-name">Headers</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token literal-property property" style="color:#36acaa">Authorization</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9.TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ"</span><br></div></code></pre></div></div>
<p>This is a stateless authentication mechanism as the user state is never saved in the server memory. The server’s protected routes will check for a valid JWT in the Authorization header, and if there is, the user will be allowed.</p>
<ul>
<li class="">signature valid?</li>
<li class="">client allowed? aud- expected issuer? iss- can this token be used? nbf</li>
</ul>
<p>As JWTs are self-contained, all the necessary information is there, reducing the need of going back and forward to the database. This allows us to fully rely on data APIs that are stateless and even make requests to downstream services. It doesn’t matter which domains are serving the APIs, as Cross-Origin Resource Sharing (CORS) won’t be an issue as it doesn’t use cookies.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="making-a-case-for-jwt">Making a case for JWT<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#making-a-case-for-jwt" class="hash-link" aria-label="Direct link to Making a case for JWT" title="Direct link to Making a case for JWT" translate="no">​</a></h2>
<ul>
<li class=""><strong>Portability</strong>: they work across many different platforms, having implementations in various programming languages.</li>
<li class=""><strong>Compact</strong>: Because of its size, it can be sent through an URL, POST parameter, or inside an HTTP header. Additionally, due to its size its transmission is fast.</li>
<li class=""><strong>Self-contained:</strong> The payload contains all the required information about the user, to avoid querying the database more than once.</li>
<li class=""><strong>Control:</strong> Allows fine grained control over types of permissions. You can specify detailed access control information within <em>the token itself</em> as part of its payload. For instance, in the same way that you can create AWS security policies with very specific permissions, you can limit the token to only give read/write access to a single resource. In contrast, API Keys tend to have a coarse all-or-nothing access.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="problems-with-jwt">Problems with JWT<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#problems-with-jwt" class="hash-link" aria-label="Direct link to Problems with JWT" title="Direct link to Problems with JWT" translate="no">​</a></h2>
<ul>
<li class="">Cannot be used in place of Sessions &amp; Cookies. If we want to use them in such a manner, then stick with Sessions and Cookies.</li>
<li class="">Data goes stale. For instance, an admin with a JWT token has had their access revoked but the token will keep on working because it was generated and verified correctly with the secret key.</li>
<li class="">There’s a critical vulnerability when using Asymmetric keys. The attackers know which algorithm was used to generate the token. This is open to abuse from the attackers. The server should already know which algorithm was used to generate/verify the integrity of this token.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>JSON Web Tokens offer many advantages but not without having some drawbacks. If you work on an extremely large-scale application, sessions could be the appropriate choice. It is completely reasonable to combine sessions and JWT — they each have their own purpose, and sometimes you need both. Just don’t use JWT for <em>persistent</em> data.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="further-reading">Further Reading<a href="https://works-in-prod.github.io/JSON-WEB-TOKEN/#further-reading" class="hash-link" aria-label="Direct link to Further Reading" title="Direct link to Further Reading" translate="no">​</a></h2>
<p>Thanks to the following:</p>
<ul>
<li class=""><a href="https://tools.ietf.org/html/rfc7519" target="_blank" rel="noopener noreferrer" title="https://tools.ietf.org/html/rfc7519" class="">https://tools.ietf.org/html/rfc7519</a></li>
<li class=""><a href="http://self-issued.info/docs/draft-ietf-oauth-json-web-token.html#RegisteredClaimName" target="_blank" rel="noopener noreferrer" title="http://self-issued.info/docs/draft-ietf-oauth-json-web-token.html#RegisteredClaimName" class="">http://self-issued.info/docs/draft-ietf-oauth-json-web-token.html#RegisteredClaimName</a></li>
<li class=""><a href="https://auth0.com/blog/critical-vulnerabilities-in-json-web-token-libraries/" target="_blank" rel="noopener noreferrer" title="https://auth0.com/blog/critical-vulnerabilities-in-json-web-token-libraries/" class="">https://auth0.com/blog/critical-vulnerabilities-in-json-web-token-libraries/</a></li>
<li class=""><a href="https://auth0.com/learn/json-web-tokens/" target="_blank" rel="noopener noreferrer" title="https://auth0.com/learn/json-web-tokens/" class="">https://auth0.com/learn/json-web-tokens/</a></li>
<li class=""><a href="https://www.slideshare.net/lcobucci/jwt-to-authentication-and-beyond" target="_blank" rel="noopener noreferrer" title="https://www.slideshare.net/lcobucci/jwt-to-authentication-and-beyond" class="">https://www.slideshare.net/lcobucci/jwt-to-authentication-and-beyond</a></li>
<li class=""><a href="https://www.slideshare.net/a_z_e_t/javascript-object-signing-encryption" target="_blank" rel="noopener noreferrer" title="https://www.slideshare.net/a_z_e_t/javascript-object-signing-encryption" class="">https://www.slideshare.net/a_z_e_t/javascript-object-signing-encryption</a></li>
<li class=""><a href="http://christhorntonsf.com/secure-your-apis-with-jwt/" target="_blank" rel="noopener noreferrer" title="http://christhorntonsf.com/secure-your-apis-with-jwt/" class="">http://christhorntonsf.com/secure-your-apis-with-jwt/</a></li>
<li class=""><a href="http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/" target="_blank" rel="noopener noreferrer" title="http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/" class="">http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/</a></li>
</ul>]]></content>
        <category label="jwt" term="jwt"/>
        <category label="auth" term="auth"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Blogging Like a Hacker]]></title>
        <id>https://works-in-prod.github.io/blogging-like-a-hacker/</id>
        <link href="https://works-in-prod.github.io/blogging-like-a-hacker/"/>
        <updated>2017-01-29T02:01:12.000Z</updated>
        <summary type="html"><![CDATA[First post]]></summary>
        <content type="html"><![CDATA[<p>Hello World <!-- -->🌏</p>
<p>This is my first post, hoping there's a lot more I can write, but for now, this is me getting started with blogging.</p>
<p>I am an experienced Software Developer from the UK. I started my first fulltime job in 2011, I never thought to share my thoughts &amp; experience. Through this blog, I am hoping to channel my thoughts and hopefully pay forward the knowledge in the same way I've found to be useful from other bloggers.</p>
<p>For now, I have a lot to learn about GitHub pages but I shall be adding more content over the coming future.</p>
<p>Stay tuned. <!-- -->⚠️<!-- --> <!-- -->🚧</p>]]></content>
    </entry>
</feed>