All Calculators
AI Infrastructure Console

LLM VRAM Calculator

Fetch model config from Hugging Face, parse architecture parameters, and compute realistic VRAM usage for deployment and LoRA workflows.

1. Model Source

Quick Picks

Recent Models

Needed only for gated models such as Llama 3. Token is stored only in localStorage and never sent to any server except direct Hugging Face requests from your browser.

2. Scenario

If you are unsure about some architecture parameters, you can leave them empty. We will estimate when possible, but accuracy may decrease.

Context Length 8192 tokens

Architecture Signal

Auto detect from head ratio

MoE Signal

Shared Parameters Ratio 25%

Dense mode (MoE disabled)

3. VRAM Matrix + Hardware Match

Runtime Quantization Weights KV / Activations Total Dense vs MoE

Hardware Match

Explore Other Calculators

Recently Visited Tools