Training Configuration

Model Settings

Enter model size as number with optional suffix: 7B, 7000M, or 7000000000

Mixture of Experts (MoE)

Training Settings

Parallelism

Effective GPUs: 8

Training Engine

Hardware

Results

Memory Breakdown

Per GPU: -- GB
Total All GPUs: -- GB
CPU Memory: -- GB

Component Breakdown

Model Parameters: -- GB
Gradients: -- GB
Optimizer States: -- GB
Activations: -- GB
Overhead: -- GB
Params Grads Opt Act

Feasibility

Status: --
Utilization: --%

Formula Explanation

Run a calculation to see the formula breakdown.