Skip to content

Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Task Guide

Use this page when you know what you want to do, but not yet which command or example should be your starting point.

First run

  • Read Quickstart.
  • Run hpc-compose evolve --output compose.yaml if you want a guided progression from minimal through multi-node-placement.
  • Run hpc-compose new --list-templates if you want to inspect the built-in starter templates before choosing one.
  • Start from minimal-batch with hpc-compose new --template minimal-batch --name my-app --output compose.yaml.
  • Before running on a cluster, configure a shared cache with hpc-compose setup --cache-dir '<shared-cache-dir>' or explicit x-slurm.cache_dir. If you copy a repository example that uses CACHE_DIR, override it for your cluster before running.
  • Run hpc-compose plan -f compose.yaml before the first real run. Add --show-script when you want to inspect the generated launcher without writing a file.
  • Run hpc-compose up -f compose.yaml only from a supported Linux Slurm submission host.

Remember directory/data/env settings once

  • Run hpc-compose setup to create or update the project-local settings file (.hpc-compose/settings.toml).
  • Use hpc-compose --profile dev up so compose path, env files, env vars, and binary paths come from the selected profile.
  • Run hpc-compose context --format json to inspect resolved paths plus value sources. Interpolation variables are scoped to names referenced by the compose file and sensitive-looking values are redacted unless you add --show-values.
  • Use --settings-file <PATH> when you need an explicit settings file instead of upward discovery.

Migrate from Docker Compose

  • Read Docker Compose Migration.
  • Replace build: with image: plus x-runtime.prepare.commands.
  • Replace service-name networking with 127.0.0.1 or explicit allocation metadata where appropriate.

Single-node multi-service app

Multi-node distributed training

Checkpoint and resume workflows

  • Start from training-checkpoints.yaml when you only need artifact output.
  • Start from training-resume.yaml when the run should resume from shared storage across retries or later submissions.
  • Keep the canonical resume source in x-slurm.resume.path, not in exported artifact bundles.

LLM serving workflows

Debug cluster readiness

  • Run hpc-compose validate -f compose.yaml.
  • Run hpc-compose validate -f compose.yaml --strict-env when default interpolation fallbacks should be treated as failures.
  • Run hpc-compose plan --verbose -f compose.yaml.
  • Run hpc-compose preflight -f compose.yaml.
  • Run hpc-compose debug -f compose.yaml --preflight after a failed tracked run.
  • Read Troubleshooting.

Cache and artifact management

  • Use hpc-compose cache list to inspect imported/prepared artifacts.
  • Use hpc-compose cache inspect -f compose.yaml to see per-service reuse expectations.
  • Use hpc-compose --profile dev cache prune --age 14 when you want age-based cleanup to follow the active context cache dir.
  • Use hpc-compose cache prune --age 7 --cache-dir '<shared-cache-dir>' when you want a direct cache cleanup that does not depend on compose resolution.
  • Use hpc-compose artifacts -f compose.yaml after a run to export tracked payloads.

Find and clean tracked runs

  • Use hpc-compose jobs list to scan the current repo tree for tracked runs.
  • Use hpc-compose ps -f compose.yaml when you want a one-shot per-service runtime table.
  • Use hpc-compose watch -f compose.yaml to reconnect to the live watch UI for the latest tracked job.
  • Use hpc-compose jobs list --disk-usage when you need a quick size estimate before deleting old state.
  • Use hpc-compose clean -f compose.yaml --dry-run --age 7 to preview what a cleanup would remove.
  • Use hpc-compose clean -f compose.yaml --all --format json when automation needs a stable cleanup report for one compose context, including effective latest IDs plus stale-pointer diagnostics.

Automation and scripting with JSON output

  • Prefer --format json for machine-readable output on non-streaming commands such as new, plan, validate, render, prepare, preflight, config, inspect, debug, status, ps, stats, score, artifacts, down, cancel, setup, cache, clean, and context. For up, --format json requires --detach or --dry-run.
  • Include context --format json when automation needs resolved compose path, binaries, referenced interpolation vars, and runtime path roots.
  • Use hpc-compose stats --format jsonl or --format csv when downstream tooling wants row-oriented metrics.
  • Treat --json as a compatibility alias on older machine-readable commands; new automation should prefer --format json. Streaming commands such as logs --follow, watch, and completions keep their native text or script output.