hpc-compose
Compose-style multi-service workflows, compiled into one inspectable Slurm job.
hpc-compose gives research and HPC teams a small YAML authoring model for services, startup order, readiness checks, runtime backends, logs, artifacts, and follow-up commands.
services:
app:
image: python:3.12-slim
command: python train.py
$ hpc-compose plan --show-script -f compose.yaml
spec is valid
service order: app
#SBATCH --job-name=my-app
Use hpc-compose when you want Docker Compose-style authoring on Slurm without adding Kubernetes, a long-running control plane, or custom cluster-side services.
Start with the Support Matrix before planning a real runtime workflow. Linux is the maintained runtime target; macOS is intended for authoring, validation, rendering, and inspection.
Safe First Path
These commands are safe from a laptop, workstation, or login node because new writes a local starter spec and plan is purely static:
hpc-compose new --template minimal-batch --name my-app --output compose.yaml
hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml
For real cluster runs, configure a cache path visible from both the Slurm submission host and compute nodes, either in x-slurm.cache_dir, hpc-compose setup --cache-dir, or [defaults.cache] / [profiles.<name>.cache] settings. From a source checkout, you can also inspect the checked-in examples with hpc-compose plan -f examples/minimal-batch.yaml.
Expected output includes:
spec is valid
service order: app
Rendered script:
Run hpc-compose up -f compose.yaml only on a supported Linux Slurm submission host with the runtime backend your spec selects. If it fails, start with hpc-compose debug -f compose.yaml --preflight.
Download the asciinema-style quickstart demo cast if you want the same flow as a terminal recording.
Terms To Know
| Term | Meaning |
|---|---|
| spec | The YAML file that describes services, runtime backend, and Slurm settings. |
| allocation | The Slurm job allocation where all planned services run. |
| runtime backend | The mechanism used to launch services: Pyxis/Enroot, Apptainer, Singularity, or host. |
| preflight | Checks that inspect local tools, paths, backend support, and optional cluster profiles before a run. |
| prepare | The login-node image import/customization phase used before compute-node runtime. |
| tracked job | Metadata under .hpc-compose/<job-id>/ that lets status, ps, watch, logs, stats, and artifacts reconnect later. |
x-slurm | The spec section for Slurm settings and hpc-compose runtime extensions. |
What It Is For
- model serving plus helper services inside one Slurm allocation
- data and ETL pipelines with startup ordering or stage-completion dependencies
- training jobs with checkpoint export, artifact tracking, and resume-aware reruns
- explicit multi-node launch patterns that still fit inside one allocation
What It Is Not
hpc-compose is not a full Docker Compose runtime and is not a general cluster orchestrator.
Unsupported Compose features include:
build:portsnetworks/network_mode- Compose
restartas a Docker key deploy- dynamic node bin packing
For exact boundaries, read Execution Model, Supported Slurm Model, and Spec Reference.
Read Next
- Quickstart for the shortest safe path.
- Examples to choose a starting spec.
- Runtime Backends before changing
runtime.backend. - Runbook when adapting a real workload on a cluster.
- Troubleshooting when the first cluster run fails.