Skip to content

Chapter 9 — Cost Engineering for AI Workloads Full Book

"URGENT: Azure bill — $127,000 — please explain."


📖 This chapter is available in the full book

It's Monday morning. You're halfway through your coffee when an email from finance lands with the subject line: "URGENT: Azure bill — $127,000 — please explain." Last month's forecast was $42,000. Two ND96isr_H100_v5 VMs jump off the screen — provisioned three weeks ago for a "quick experiment" and never shut down. At roughly $98/hour each, running 24/7 for three weeks, that's approximately $33,000 in idle GPU time.

The ML engineer who provisioned those VMs wasn't being reckless — they were iterating fast, which is exactly what you want. The failure wasn't human; it was systemic. No auto-shutdown policy, no budget alerts, no tagging to trace the VMs back to a project or owner.

This chapter gives you the frameworks, formulas, and operational practices to make sure that email never lands in your inbox.

What You'll Learn in This Chapter

  • The $127,000 Monday Morning
  • Why AI Cost Engineering Is Different
  • GPU Cost Modeling
  • Spot and Low-Priority VMs for Training
  • Right-Sizing Strategies
  • Azure OpenAI Cost Optimization
  • FinOps Practices for AI
  • Cost Attribution in Shared Clusters (AKS)