Advanced LLM VRAM Estimator

Search Model:

Parameters (B):

Layers:

Context Length:

Batch Size:

Quantization:

Sequence Length:

Advanced Options

Head Dimension:

Attention Heads:

KV Heads:

FFN Dimension:

0.00 GB

0.00 GB

0.00 GB

0.00 GB

Calculation Details

Parameters (in billions) × Quantization bytes × 1B / 1024³

Batch × Context × Hidden Size × Layers × 2 bytes (FP16) / 1024³

Batch × Seq Length × Layers × (2 × Head Dim × KV Heads) × 2 bytes / 1024³

√(Parameters / (6 × Layers)) or calculated from heads and head dimension

(Weights + Activations + KV Cache) × 1.20 (20% overhead)

Note: This is an estimation. Actual VRAM usage depends on implementation, optimizations, and framework overhead.