AI Infrastructure Console
LLM VRAM Calculator
Fetch model config from Hugging Face, parse architecture parameters, and compute realistic VRAM usage for deployment and LoRA workflows.
1. Model Source
Needed only for gated models such as Llama 3. Token is stored only in localStorage and never sent to any server except direct Hugging Face requests from your browser.
2. Scenario
If you are unsure about some architecture parameters, you can leave them empty. We will estimate when possible, but accuracy may decrease.
Context Length 8192 tokens
Architecture Signal
Auto detect from head ratio
3. VRAM Matrix + Hardware Match
| Runtime | Quantization | Weights | KV / Activations | Total |
|---|
Hardware Match