Chapter 9 — Cost Engineering for AI Workloads Full Book¶

"URGENT: Azure bill — $127,000 — please explain."

📖 This chapter is available in the full book¶

It's Monday morning. You're halfway through your coffee when an email from finance lands with the subject line: "URGENT: Azure bill — $127,000 — please explain." Last month's forecast was $42,000. Two ND96isr_H100_v5 VMs jump off the screen — provisioned three weeks ago for a "quick experiment" and never shut down. At roughly $98/hour each, running 24/7 for three weeks, that's approximately $33,000 in idle GPU time.

The ML engineer who provisioned those VMs wasn't being reckless — they were iterating fast, which is exactly what you want. The failure wasn't human; it was systemic. No auto-shutdown policy, no budget alerts, no tagging to trace the VMs back to a project or owner.

This chapter gives you the frameworks, formulas, and operational practices to make sure that email never lands in your inbox.

What You'll Learn in This Chapter¶

The $127,000 Monday Morning
Why AI Cost Engineering Is Different
GPU Cost Modeling
Spot and Low-Priority VMs for Training
Right-Sizing Strategies
Azure OpenAI Cost Optimization
FinOps Practices for AI
Cost Attribution in Shared Clusters (AKS)

Get the Full Book Read Free Chapters