All Calculators
AI Infrastructure Console

LLM VRAM Calculator

Fetch model config from Hugging Face, parse architecture parameters, and compute realistic VRAM usage for deployment and LoRA workflows.

1. Model Source

Needed only for gated models such as Llama 3. Token is stored only in localStorage and never sent to any server except direct Hugging Face requests from your browser.

2. Scenario

If you are unsure about some architecture parameters, you can leave them empty. We will estimate when possible, but accuracy may decrease.

Context Length 8192 tokens

Architecture Signal

Auto detect from head ratio

3. VRAM Matrix + Hardware Match

Runtime Quantization Weights KV / Activations Total

Hardware Match