Blog

Field notes from production. Deep dives into Kubernetes, service mesh debugging, cloud-native infrastructure, and the pursuit of fully autonomous development workflows.

review.lens(correctness | security | perf | concurrency) verify.independent(n=3).vote() >= majority severity.classify(CVSS): Critical | High | Med | Low assume_hostile(input) && hunt(silent_failures)
RED
security code-review adversarial audit reliability
Audit Your Own Code As If You Were the Attacker
Reviewing your code looking for confirmation that it's fine finds typos, not design flaws. Adversarial review flips the goal: it assumes the code is broken and tries to prove it. Multiple lenses, independent verification, and why 'zero criticals' isn't security.
browser --httpOnly cookie--> BFF proxy BFF --Bearer (server-side)--> API refresh.onError(401).singleFlight() oidc.callback: code -> cookies (no token in URL)
BFF
authentication security bff nextjs jwt web
The JWT Should Never Touch the Browser: the BFF Pattern
Stashing the access token in localStorage is handing it to the first XSS that comes along. The Backend-for-Frontend pattern keeps the JWT on the server and leaves only an httpOnly cookie in the browser that JavaScript can't read. How it works, step by step.
loop.detect(identical | alternating | target-repeat) monitor.heartbeat(turn) => stale ? kill : continue gate.require(toolCalls > 0) else FAILED retry.classify(error) => retry | escalate | abort
AI
ai-agents llm reliability production observability
The 4 Failure Modes of AI Agents in Production (and How to Mitigate Them)
Leave an agent loop running unsupervised and the runaway token bill and corrupted state show up on their own. The four failures that recur in every agentic system —loops, stuck turns, no-op turns, and misclassified errors— and the engineering patterns that contain them.
task.state: ToDo -> InProgress -> InReview -> Done review.gate(QA) && review.gate(Security) transition.allow(Done) iff gates.passed && merged else escalate(lead) -> escalate(human)
GOV
ai-agents orchestration governance workflow reliability
Governing AI Agents: Why 'Done' Has to Be Earned
An autonomous agent that approves its own work isn't autonomy, it's a time bomb. How to govern teams of agents with separation of duties, independent review gates, and a 'done' invariant that nobody gets to skip.
kubeadm init --pod-network-cidr=10.244.0.0/16 kubectl apply -f calico-ebpf.yaml istioctl install --set profile=ambient helm install cert-manager jetstack/cert-manager
K8S
kubernetes k8s calico istio devops homelab
Production-Grade Kubernetes on a Single Server: The Complete Guide
How to build a full Kubernetes cluster on a single server with kubeadm, Calico eBPF, Istio ambient, cert-manager, Grafana, Loki, and the full security stack. No shortcuts.
pveceph install ceph osd create /dev/sda ceph osd pool create ssd-pool 64 ceph osd crush rule create-replicated ssd-rule default host ssd
PVE
proxmox ceph homelab storage backup infrastructure
Proxmox + Ceph + PBS: Complete Guide for a Production-Ready Homelab
How to set up Proxmox with Ceph using separate SSD and HDD pools, CRUSH rules by device class, and Proxmox Backup Server in a VM. From installation to a setup that actually works.