Skip to content

Technical FAQ Full Book¶

"Practical answers for infrastructure engineers working with AI on Azure."

Questions Covered¶

Can I run AI workloads without a GPU?
What's the difference between training and inference from an infra perspective?
How do I calculate whether my model fits in GPU memory?
What causes GPU OOM errors and how do I fix them?
How should I set up auto-scaling for GPU inference?
What is a model registry and why should infra engineers care?
How do I monitor GPU workloads effectively?
How do I secure AI inference endpoints?
What are Spot VMs and when should I use them for AI?
How do I estimate and control Azure OpenAI costs?
What's the difference between PTU and Standard deployments?
How do I implement multi-tenancy for AI workloads on AKS?
How do I troubleshoot GPU driver issues on Azure VMs?
How do I handle Azure OpenAI 429 (throttling) errors?
What storage backend should I use for model files and training data?
How do I implement blue-green deployments for ML models?
How do I right-size GPU VMs for inference?
What should I include in an AI workload runbook?
How do I handle GPU quota limitations on Azure?
What's the recommended learning path for infra engineers getting into AI?

Get the Full Book Read Free Chapters