Training Configuration
Model Settings
Enter model size as number with optional suffix: 7B, 7000M, or 7000000000
Mixture of Experts (MoE)
Training Settings
Parallelism
Effective GPUs: 8
Training Engine
Hardware
Results
Memory Breakdown
Per GPU:
-- GB
Total All GPUs:
-- GB
CPU Memory:
-- GB
Component Breakdown
Model Parameters:
-- GB
Gradients:
-- GB
Optimizer States:
-- GB
Activations:
-- GB
Overhead:
-- GB
Params
Grads
Opt
Act
Feasibility
Status:
--
Utilization:
--%
Formula Explanation
Run a calculation to see the formula breakdown.