Chapter 9 — Cost Engineering for AI Workloads Full Book
"URGENT: Azure bill — $127,000 — please explain."
📖 This chapter is available in the full book
It's Monday morning. You're halfway through your coffee when an email from finance lands with the subject line: "URGENT: Azure bill — $127,000 — please explain." Last month's forecast was $42,000. Two ND96isr_H100_v5 VMs jump off the screen — provisioned three weeks ago for a "quick experiment" and never shut down. At roughly $98/hour each, running 24/7 for three weeks, that's approximately $33,000 in idle GPU time.
The ML engineer who provisioned those VMs wasn't being reckless — they were iterating fast, which is exactly what you want. The failure wasn't human; it was systemic. No auto-shutdown policy, no budget alerts, no tagging to trace the VMs back to a project or owner.
This chapter gives you the frameworks, formulas, and operational practices to make sure that email never lands in your inbox.
What You'll Learn in This Chapter
- The $127,000 Monday Morning
- Why AI Cost Engineering Is Different
- GPU Cost Modeling
- Spot and Low-Priority VMs for Training
- Right-Sizing Strategies
- Azure OpenAI Cost Optimization
- FinOps Practices for AI
- Cost Attribution in Shared Clusters (AKS)