AI Infrastructure Console
LLM VRAM Calculator
Fetch model config from Hugging Face, parse architecture parameters, and compute realistic VRAM usage for deployment and LoRA workflows.
1. Model Source
Quick Picks
Recent Models
Needed only for gated models such as Llama 3. Token is stored only in localStorage and never sent to any server except direct Hugging Face requests from your browser.
2. Scenario
If you are unsure about some architecture parameters, you can leave them empty. We will estimate when possible, but accuracy may decrease.
Context Length 8192 tokens
Architecture Signal
Auto detect from head ratio
MoE Signal
Shared Parameters Ratio
Dense mode (MoE disabled)
3. VRAM Matrix + Hardware Match
| Runtime | Quantization | Weights | KV / Activations | Total | Dense vs MoE |
|---|
Hardware Match