Dystrio

GPU Placement Advisor for PyTorch/NCCL Workloads

📖 How It Works

What is Dystrio?

Dystrio analyzes your PyTorch distributed training communication patterns and generates Kubernetes pod affinity rules to co-locate GPUs that talk the most.

How do I get a PyTorch trace?

Add this to your training script:

from torch.profiler import profile, ProfilerActivity

with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    record_shapes=True,
    with_stack=True
) as prof:
    # Your training step here
    model(inputs)

prof.export_chrome_trace("trace.json")

Upload the resulting trace.json file.

What is Session ID / Multi-Run?

Single run: Leave Session ID empty. You'll get recommendations based on one trace.

Multi-run (recommended): Use the same Session ID across multiple uploads. Dystrio tracks which communication patterns are stable vs noisy, giving you higher-confidence recommendations.

Example: Upload 3 traces from different training runs with Session ID "llama-70b-training" → Dystrio identifies consistent patterns and escalates confidence from LOW → HIGH.

How do I use the output?

  1. Copy the generated Kubernetes YAML
  2. Add the affinity: block to your Pod spec
  3. Deploy – Kubernetes will schedule communicating pods together
1 Authenticate
2 Upload Trace