Skip to content

Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

hpc-compose

Compose-style multi-service workflows, compiled into one inspectable Slurm job.

hpc-compose gives research and HPC teams a small YAML authoring model for services, startup order, readiness checks, runtime backends, logs, artifacts, and follow-up commands.

services:
  app:
    image: python:3.12-slim
    command: python train.py

$ hpc-compose plan --show-script -f compose.yaml
spec is valid
service order: app
#SBATCH --job-name=my-app

Use hpc-compose when you want Docker Compose-style authoring on Slurm without adding Kubernetes, a long-running control plane, or custom cluster-side services.

Start with the Support Matrix before planning a real runtime workflow. Linux is the maintained runtime target; macOS is intended for authoring, validation, rendering, and inspection.

Safe First Path

These commands are safe from a laptop, workstation, or login node because new writes a local starter spec and plan is purely static:

hpc-compose new --template minimal-batch --name my-app --output compose.yaml
hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml

For real cluster runs, configure a cache path visible from both the Slurm submission host and compute nodes, either in x-slurm.cache_dir, hpc-compose setup --cache-dir, or [defaults.cache] / [profiles.<name>.cache] settings. From a source checkout, you can also inspect the checked-in examples with hpc-compose plan -f examples/minimal-batch.yaml.

Expected output includes:

spec is valid
service order: app
Rendered script:

Run hpc-compose up -f compose.yaml only on a supported Linux Slurm submission host with the runtime backend your spec selects. If it fails, start with hpc-compose debug -f compose.yaml --preflight.

Download the asciinema-style quickstart demo cast if you want the same flow as a terminal recording.

Terms To Know

TermMeaning
specThe YAML file that describes services, runtime backend, and Slurm settings.
allocationThe Slurm job allocation where all planned services run.
runtime backendThe mechanism used to launch services: Pyxis/Enroot, Apptainer, Singularity, or host.
preflightChecks that inspect local tools, paths, backend support, and optional cluster profiles before a run.
prepareThe login-node image import/customization phase used before compute-node runtime.
tracked jobMetadata under .hpc-compose/<job-id>/ that lets status, ps, watch, logs, stats, and artifacts reconnect later.
x-slurmThe spec section for Slurm settings and hpc-compose runtime extensions.

What It Is For

  • model serving plus helper services inside one Slurm allocation
  • data and ETL pipelines with startup ordering or stage-completion dependencies
  • training jobs with checkpoint export, artifact tracking, and resume-aware reruns
  • explicit multi-node launch patterns that still fit inside one allocation

What It Is Not

hpc-compose is not a full Docker Compose runtime and is not a general cluster orchestrator.

Unsupported Compose features include:

  • build:
  • ports
  • networks / network_mode
  • Compose restart as a Docker key
  • deploy
  • dynamic node bin packing

For exact boundaries, read Execution Model, Supported Slurm Model, and Spec Reference.

  1. Quickstart for the shortest safe path.
  2. Examples to choose a starting spec.
  3. Runtime Backends before changing runtime.backend.
  4. Runbook when adapting a real workload on a cluster.
  5. Troubleshooting when the first cluster run fails.

Reference