Skip to content

Technical FAQ Full Book

"Practical answers for infrastructure engineers working with AI on Azure."


📖 This extra is available in the full book

Practical answers for infrastructure engineers working with AI on Azure. Every answer cross-references the relevant chapter so you can dive deeper when needed. Covering 20 of the most common questions from GPU memory math to model deployment strategies.

Questions Covered

  • Can I run AI workloads without a GPU?
  • What's the difference between training and inference from an infra perspective?
  • How do I calculate whether my model fits in GPU memory?
  • What causes GPU OOM errors and how do I fix them?
  • How should I set up auto-scaling for GPU inference?
  • What is a model registry and why should infra engineers care?
  • How do I monitor GPU workloads effectively?
  • How do I secure AI inference endpoints?
  • What are Spot VMs and when should I use them for AI?
  • How do I estimate and control Azure OpenAI costs?
  • What's the difference between PTU and Standard deployments?
  • How do I implement multi-tenancy for AI workloads on AKS?
  • How do I troubleshoot GPU driver issues on Azure VMs?
  • How do I handle Azure OpenAI 429 (throttling) errors?
  • What storage backend should I use for model files and training data?
  • How do I implement blue-green deployments for ML models?
  • How do I right-size GPU VMs for inference?
  • What should I include in an AI workload runbook?
  • How do I handle GPU quota limitations on Azure?
  • What's the recommended learning path for infra engineers getting into AI?