AI Infrastructure Console

LLM VRAM Calculator

Fetch model config from Hugging Face, parse architecture parameters, and compute realistic VRAM usage for deployment and LoRA workflows.

1. Model Source

Hugging Face Model ID

Quick Picks

Recent Models

Hugging Face Token (Optional)

Needed only for gated models such as Llama 3. Token is stored only in localStorage and never sent to any server except direct Hugging Face requests from your browser.

2. Scenario

If you are unsure about some architecture parameters, you can leave them empty. We will estimate when possible, but accuracy may decrease.

Context Length 8192 tokens

Parameters (Billions) Hidden Size Layers Attention Heads KV Heads Vocab Size

Architecture Signal

Auto detect from head ratio

Mode

MoE Signal

Enable MoE profile

Total Experts Active Experts / Token

Shared Parameters Ratio 25%

Dense mode (MoE disabled)

3. VRAM Matrix + Hardware Match

Runtime	Quantization	Weights	KV / Activations	Total	Dense vs MoE

Hardware Match

Explore Other Calculators

NAS & RAID Capacity Off-Grid Solar Sizing 3D Printing Profiles Satisfactory Planner

1. Model Source

2. Scenario

3. VRAM Matrix + Hardware Match

Explore Other Calculators

Recently Visited Tools