Technical FAQ Full Book
"Practical answers for infrastructure engineers working with AI on Azure."
Practical answers for infrastructure engineers working with AI on Azure. Every answer cross-references the relevant chapter so you can dive deeper when needed. Covering 20 of the most common questions from GPU memory math to model deployment strategies.
Questions Covered
- Can I run AI workloads without a GPU?
- What's the difference between training and inference from an infra perspective?
- How do I calculate whether my model fits in GPU memory?
- What causes GPU OOM errors and how do I fix them?
- How should I set up auto-scaling for GPU inference?
- What is a model registry and why should infra engineers care?
- How do I monitor GPU workloads effectively?
- How do I secure AI inference endpoints?
- What are Spot VMs and when should I use them for AI?
- How do I estimate and control Azure OpenAI costs?
- What's the difference between PTU and Standard deployments?
- How do I implement multi-tenancy for AI workloads on AKS?
- How do I troubleshoot GPU driver issues on Azure VMs?
- How do I handle Azure OpenAI 429 (throttling) errors?
- What storage backend should I use for model files and training data?
- How do I implement blue-green deployments for ML models?
- How do I right-size GPU VMs for inference?
- What should I include in an AI workload runbook?
- How do I handle GPU quota limitations on Azure?
- What's the recommended learning path for infra engineers getting into AI?