Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Task Guide

Use this page when you know what you want to do, but not yet which command or example should be your starting point.

First run

  • Read Quickstart.
  • Start from minimal-batch with hpc-compose init --template minimal-batch --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml.
  • Run hpc-compose submit --watch -f compose.yaml.

Migrate from Docker Compose

  • Read Docker Compose Migration.
  • Replace build: with image: plus x-enroot.prepare.commands.
  • Replace service-name networking with 127.0.0.1 or explicit allocation metadata where appropriate.

Single-node multi-service app

Multi-node distributed training

  • Start from multi-node-torchrun.yaml or multi-node-mpi.yaml.
  • Treat helper services as primary-node-only and the distributed job as the single allocation-wide step.
  • Use allocation metadata such as HPC_COMPOSE_PRIMARY_NODE instead of Docker-style service discovery.

Checkpoint and resume workflows

  • Start from training-checkpoints.yaml when you only need artifact output.
  • Start from training-resume.yaml when the run should resume from shared storage across retries or later submissions.
  • Keep the canonical resume source in x-slurm.resume.path, not in exported artifact bundles.

LLM serving workflows

Debug cluster readiness

  • Run hpc-compose validate -f compose.yaml.
  • Run hpc-compose inspect --verbose -f compose.yaml.
  • Run hpc-compose preflight -f compose.yaml.
  • Read the troubleshooting sections in Runbook.

Cache and artifact management

  • Use hpc-compose cache list to inspect imported/prepared artifacts.
  • Use hpc-compose cache inspect -f compose.yaml to see per-service reuse expectations.
  • Use hpc-compose artifacts -f compose.yaml after a run to export tracked payloads.

Automation and scripting with JSON output

  • Prefer --format json for machine-readable output on validate, render, prepare, preflight, inspect, status, stats, artifacts, and cache subcommands.
  • Use hpc-compose stats --format jsonl or --format csv when downstream tooling wants row-oriented metrics.
  • Treat --json as a compatibility alias on older machine-readable commands; new automation should prefer --format json.