AI Infrastructure Console

LLM VRAM Calculator

Fetch model config from Hugging Face, parse architecture parameters, and compute realistic VRAM usage for deployment and LoRA workflows.

1. Model Source

Hugging Face Model ID

Hugging Face Token (Optional)

Needed only for gated models such as Llama 3. Token is stored only in localStorage and never sent to any server except direct Hugging Face requests from your browser.

2. Scenario

If you are unsure about some architecture parameters, you can leave them empty. We will estimate when possible, but accuracy may decrease.

Context Length 8192 tokens

Parameters (Billions) Hidden Size Layers Attention Heads KV Heads Vocab Size

Architecture Signal

Auto detect from head ratio

Mode

3. VRAM Matrix + Hardware Match

Runtime	Quantization	Weights	KV / Activations	Total

Hardware Match