hpc-compose
hpc-compose turns a Compose-like spec into a single Slurm job that runs one or more services through Enroot and Pyxis.
hpc-compose is intentionally not a full Docker Compose implementation. It focuses on the subset that maps cleanly to one Slurm allocation, plus either single-node services or one allocation-wide distributed service without a separate orchestration layer.
Start Here
- Read Quickstart for the shortest install-and-run path.
- Read Support Matrix to confirm what is officially supported, CI-tested, or only release-built.
- Use Task Guide when you want the shortest path for a specific workflow.
- Read Execution model to understand what runs on the login node, what runs on the compute node, and which paths must be shared.
- Use Runbook when adapting a real workload to a real cluster.
- Use Examples when you want the closest known-good starting point.
- Use Spec reference when you need exact field behavior or validation rules.
- Use Supported Slurm model when you need the product boundary spelled out clearly.
What it is for
- One Slurm allocation per application
- Single-node jobs and constrained multi-node distributed runs
- Optional helper services pinned to the allocation’s primary node
- Remote images such as
redis:7or existing local.sqshimages - Optional image customization on the login node through
x-enroot.prepare - Shared cache management for imported and prepared images
- Readiness-gated startup across dependent services
What it does not support
- Compose
build: ports- custom Docker networks /
network_mode restartpoliciesdeploy- arbitrary multi-node orchestration or partial-node service placement
- mixed string/array
entrypoint+commandcombinations in ambiguous cases
If you need image customization, use image: plus x-enroot.prepare, not build:.
Fast path
name: hello
x-slurm:
time: "00:10:00"
mem: 4G
services:
app:
image: python:3.11-slim
command: python -c "print('Hello from Slurm!')"
hpc-compose submit --watch -f compose.yaml
submit --watch is the normal run. Break out inspect, preflight, or prepare as the debugging flow when you are validating a new spec for the first time or isolating a failure.
Read next
- Installation for release and source install paths
- Quickstart for the shortest working flow
- Support Matrix for platform and runtime support expectations
- Task Guide for goal-oriented workflow entry points
- Execution model for the login-node / compute-node split
- Runbook for real-cluster setup and debugging
- Examples for example selection and adaptation
- Spec Reference for the supported Compose subset
- Supported Slurm model for the first-class / pass-through / out-of-scope boundary
- Docker Compose Migration for feature mapping and conversion guidance
Installation
One-line installer
For supported Linux and macOS targets, the repo now ships a small installer script that picks the newest release and the matching archive for your machine:
curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | sh
By default this installs hpc-compose into ~/.local/bin and verifies the published SHA-256 checksum before placing the binary.
Installer availability does not imply full runtime support. Check the Support Matrix before assuming that a platform can run submission, prepare, or watch workflows end to end.
Useful overrides:
curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | env HPC_COMPOSE_INSTALL_DIR=/usr/local/bin sh
curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | env HPC_COMPOSE_VERSION=v0.1.12 sh
Supported targets match the release workflow:
- Linux x86_64
- Linux arm64
- macOS x86_64
- macOS arm64
Windows release archives are also published, but Windows is not part of the installer path and is not an officially supported runtime target.
Download a release build manually
Prebuilt archives are published on the project’s GitHub Releases.
Typical flow on Linux or macOS:
curl -L https://github.com/NicolasSchuler/hpc-compose/releases/latest/download/hpc-compose-v0.1.12-x86_64-unknown-linux-musl.tar.gz -o hpc-compose.tar.gz
tar -xzf hpc-compose.tar.gz
./hpc-compose --help
Pick the archive that matches your platform from the release page. Linux x86_64 releases use a musl target to avoid common cluster glibc mismatches.
Build from source
Requirements:
- Rust stable toolchain
- A normal local build machine for the CLI itself
- Slurm/Enroot tools only when you actually run
preflight,prepare, orsubmit
git clone https://github.com/NicolasSchuler/hpc-compose.git
cd hpc-compose
cargo build --release
./target/release/hpc-compose --help
Local docs commands
The repo ships two documentation layers:
mdbookfor the user manualcargo docfor contributor-facing crate internals
Useful commands:
mdbook build docs
mdbook serve docs
cargo doc --no-deps
Verification
Before using a local build on a cluster workflow, validate the binary and one example spec:
target/release/hpc-compose validate -f examples/minimal-batch.yaml
target/release/hpc-compose inspect --verbose -f examples/minimal-batch.yaml
Quickstart
This is the shortest install-and-run path from an empty shell to a submitted job.
1. Install a release binary
curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | sh
The installer selects the newest published release for the current Linux or macOS machine and installs hpc-compose into ~/.local/bin by default. Check the Support Matrix before assuming that a platform can run full cluster workflows.
2. Initialize a starter spec
hpc-compose init \
--template minimal-batch \
--name my-app \
--cache-dir /shared/$USER/hpc-compose-cache \
--output compose.yaml
If you already know the closest shipped example, copy it directly instead. The Examples page is the fastest way to choose one.
3. Normal run
hpc-compose submit --watch -f compose.yaml
submit --watch is the normal run. It runs preflight, prepares missing artifacts, renders the batch script, submits it through sbatch, then follows scheduler state and tracked logs.
4. Debugging flow
hpc-compose validate -f compose.yaml
hpc-compose inspect --verbose -f compose.yaml
hpc-compose preflight -f compose.yaml
hpc-compose prepare -f compose.yaml
Use the debugging flow when you want to confirm:
- service order
- normalized image references
- cache artifact paths
- whether prepare steps will rebuild every submit
Warning
inspect --verbose prints resolved environment values and final mount mappings. Treat its output as sensitive when the spec contains secrets.
5. Revisit a tracked run later
hpc-compose status -f compose.yaml
hpc-compose stats -f compose.yaml
hpc-compose logs -f compose.yaml --follow
From a source checkout
If you are running from a local checkout instead of an installed binary:
cargo build --release
target/release/hpc-compose init --template minimal-batch --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml
target/release/hpc-compose submit --watch -f compose.yaml
Read next
- Use the Execution model page to understand what runs where and which paths must be shared.
- Use the Support Matrix page to confirm what is officially supported versus only release-built.
- Use the Task Guide page when you want a goal-oriented starting point.
- Use the Runbook when adapting a real workload to a real cluster.
- Use the Examples page when you want the closest known-good template.
- Use the Spec Reference when changing fields or validation-sensitive values.
Support Matrix
This page separates what hpc-compose can build, what CI currently exercises, and what is officially supported for real workflows.
Support levels
| Level | Meaning |
|---|---|
| Officially supported | Maintained target for user-facing workflows and issue triage |
| CI-tested | Exercised in the repository’s automated checks today |
| Release-built | Prebuilt archive is published, but that is not a promise of full runtime support |
Officially supported
| Platform | Scope | Notes |
|---|---|---|
Linux x86_64 | Full CLI and runtime workflows | Requires Slurm client tools plus Enroot and Pyxis on the submission host/cluster |
Linux arm64 | Full CLI and runtime workflows | Same cluster requirements as Linux x86_64 |
macOS x86_64 | Authoring and local inspection only | Supported for init, validate, inspect, render, and completions; not for cluster runtime commands |
macOS arm64 | Authoring and local inspection only | Same scope as macOS x86_64 |
CI-tested
| Platform | What is tested today |
|---|---|
Ubuntu 24.04 x86_64 | formatting, clippy, unit/integration tests, docs build, link checks, installer smoke tests, and coverage |
Current CI validates project behavior on Ubuntu. Other published builds should be treated as lower-confidence until corresponding CI coverage exists.
Release-built
| Platform | Status |
|---|---|
Linux x86_64 | Release archive published |
Linux arm64 | Release archive published |
macOS x86_64 | Release archive published |
macOS arm64 | Release archive published |
Windows x86_64 | Release archive published, but runtime workflows are not officially supported |
Windows status
Windows archives are published so users can inspect the CLI surface or experiment with non-runtime commands, but Windows is currently release-built only:
- Slurm + Enroot + Pyxis runtime workflows are not an officially supported Windows target.
- Issues that are specific to Windows runtime execution may be closed as out of scope until the support policy changes.
Cluster assumptions for full support
For full runtime support on Linux, the target environment should provide:
sbatch,srun, and related Slurm client tools on the submission host- Pyxis container support in
srun - Enroot on the submission host for image import and prepare steps
- shared storage for
x-slurm.cache_dir
Use Runbook and Execution model before adapting a real workload to a cluster.
Task Guide
Use this page when you know what you want to do, but not yet which command or example should be your starting point.
First run
- Read Quickstart.
- Start from
minimal-batchwithhpc-compose init --template minimal-batch --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml. - Run
hpc-compose submit --watch -f compose.yaml.
Migrate from Docker Compose
- Read Docker Compose Migration.
- Replace
build:withimage:plusx-enroot.prepare.commands. - Replace service-name networking with
127.0.0.1or explicit allocation metadata where appropriate.
Single-node multi-service app
- Start from app-redis-worker.yaml.
- Add
depends_onandreadinessonly where ordering really matters. - Use Execution model to confirm which services can rely on localhost.
Multi-node distributed training
- Start from multi-node-torchrun.yaml or multi-node-mpi.yaml.
- Treat helper services as primary-node-only and the distributed job as the single allocation-wide step.
- Use allocation metadata such as
HPC_COMPOSE_PRIMARY_NODEinstead of Docker-style service discovery.
Checkpoint and resume workflows
- Start from training-checkpoints.yaml when you only need artifact output.
- Start from training-resume.yaml when the run should resume from shared storage across retries or later submissions.
- Keep the canonical resume source in
x-slurm.resume.path, not in exported artifact bundles.
LLM serving workflows
- Start from llm-curl-workflow.yaml, llm-curl-workflow-workdir.yaml, llama-uv-worker.yaml, or vllm-uv-worker.yaml.
- Use
volumesfor model directories and fast-changing code. - Use
x-enroot.prepare.commandsfor slower-changing dependencies.
Debug cluster readiness
- Run
hpc-compose validate -f compose.yaml. - Run
hpc-compose inspect --verbose -f compose.yaml. - Run
hpc-compose preflight -f compose.yaml. - Read the troubleshooting sections in Runbook.
Cache and artifact management
- Use
hpc-compose cache listto inspect imported/prepared artifacts. - Use
hpc-compose cache inspect -f compose.yamlto see per-service reuse expectations. - Use
hpc-compose artifacts -f compose.yamlafter a run to export tracked payloads.
Automation and scripting with JSON output
- Prefer
--format jsonfor machine-readable output onvalidate,render,prepare,preflight,inspect,status,stats,artifacts, andcachesubcommands. - Use
hpc-compose stats --format jsonlor--format csvwhen downstream tooling wants row-oriented metrics. - Treat
--jsonas a compatibility alias on older machine-readable commands; new automation should prefer--format json.
Related docs
Execution model
This page explains the few runtime rules that matter most when a Compose mental model meets Slurm, Enroot, and Pyxis.
What runs where
| Stage | Where it runs | What happens |
|---|---|---|
validate, inspect, preflight | login node or local shell | Parse the spec, resolve paths, and check prerequisites |
prepare | login node or local shell with Enroot access | Import base images and build prepared runtime artifacts |
submit | login node or local shell with Slurm access | Run preflight, prepare missing artifacts, render the batch script, and call sbatch |
| Batch script and services | compute-node allocation | Launch the planned services through srun and Pyxis |
status, stats, logs, artifacts | login node or local shell | Read tracked metadata and job outputs after submission |
The main consequence is simple: image preparation and validation happen before the job starts, but the containers themselves run later inside the Slurm allocation.
Which paths must be shared
x-slurm.cache_dirmust be visible from both the login node and the compute nodes.- Relative host paths in
volumes, local image paths, andx-enroot.prepare.mountsresolve against the compose file directory. - Each submitted job writes tracked state under
${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}on the host. - That per-job directory is mounted into every container at
/hpc-compose/job. - Multi-node jobs also populate
/hpc-compose/job/allocation/{primary_node,nodes.txt}and exportHPC_COMPOSE_PRIMARY_NODE,HPC_COMPOSE_NODE_COUNT,HPC_COMPOSE_NODELIST, andHPC_COMPOSE_NODELIST_FILE.
Use /hpc-compose/job for small shared state inside the allocation, such as ready files, request payloads, logs, metrics, or teardown signals.
Warning
Do not put x-slurm.cache_dir under /tmp, /var/tmp, /private/tmp, or /dev/shm. Those paths are not safe for login-node prepare plus compute-node reuse.
Networking inside the allocation
- Single-node services share the host network on one node.
- In a multi-node job, helper services stay on the allocation’s primary node by default.
- The one distributed service spans the full allocation and must use explicit non-localhost coordination.
ports, custom Docker networks, and service-name DNS are not part of the model.- Use
depends_onplusreadinesswhen a dependent service must wait for real availability rather than process start.
Use 127.0.0.1 only when both sides are intentionally on the same node. For multi-node distributed runs, derive rendezvous addresses from the allocation metadata files or environment variables instead of relying on localhost.
If a service binds its TCP port before it is actually ready, prefer HTTP or log-based readiness over plain TCP readiness.
volumes vs x-enroot.prepare
| Mechanism | Use it for | When it is applied | Reuse behavior |
|---|---|---|---|
volumes | fast-changing source code, model directories, input data, checkpoint paths | at runtime inside the allocation | reads live host content every normal run |
x-enroot.prepare.commands | slower-changing dependencies, tools, and image customization | before submission on the login node | cached until the prepared artifact changes |
Recommended default:
- keep active source trees in
volumes - keep slower-changing dependency installation in
x-enroot.prepare.commands - use
prepare.mountsonly when the prepare step truly needs host files
Warning
If a mounted file is a symlink, the symlink target must also be visible from inside the mounted directory. Otherwise the path can exist on the host but fail inside the container.
Command vocabulary
- The normal run is
hpc-compose submit –watch -f compose.yaml. - The debugging flow is
validate,inspect,preflight, andpreparerun separately when you need more visibility.
Read Runbook for the operational workflow, Examples for starting points, and Spec reference for exact field behavior.
Runbook
This runbook is for adapting hpc-compose to a real workload on a Slurm cluster with Enroot and Pyxis.
Commands below assume hpc-compose is on your PATH. If you are running from a local checkout, replace hpc-compose with target/release/hpc-compose.
All commands accept -f / --file to specify the compose spec path. When omitted, it defaults to compose.yaml in the current directory. (The cache prune --all-unused subcommand requires -f explicitly.)
Read the Execution model page first if you are still orienting on login-node prepare, compute-node runtime, shared cache paths, or localhost networking.
Before you start
Make sure you have:
- a login node with
enroot,srun, andsbatchavailable, scontrolavailable when you requestx-slurm.nodes > 1,- Pyxis support in
srun(srun --helpshould mention--container-image), - a shared filesystem path for
x-slurm.cache_dir, - any required local source trees or local
.sqshimages in place, - registry credentials available if your cluster or registry requires them.
Command cadence
| Command or step | When to use it |
|---|---|
install or build hpc-compose | once per checkout or upgrade |
init or copy a shipped example | once per new spec |
validate and inspect | early while adapting a spec |
submit --watch | normal run |
preflight, prepare, render | first-time cluster setup checks or the debugging flow |
Normal progression
For a new spec on a real cluster:
- Run
hpc-compose init --template <name> --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml, or copy the closest shipped example. - Set
x-slurm.cache_dirif you need an explicit shared cache path, and adjust any cluster-specific resource settings. - Run
hpc-compose validate -f compose.yamlandhpc-compose inspect --verbose -f compose.yamlwhile you are still adapting the file. - Run
hpc-compose submit --watch -f compose.yamlfor the normal run. - If that fails, or if you need more visibility, break out
preflight,prepare,render,status,stats, orlogsseparately.
Pick a starting example
| Example | Use it when you need | File |
|---|---|---|
| Dev app | mounted source tree plus a small prepare step | examples/dev-python-app.yaml |
| Redis worker stack | multi-service launch ordering and readiness checks | examples/app-redis-worker.yaml |
| LLM curl workflow | one GPU-backed LLM plus a one-shot curl request from a second service | examples/llm-curl-workflow.yaml |
| LLM curl workflow (home) | the same request flow, but anchored under $HOME/models for direct use on a login node | examples/llm-curl-workflow-workdir.yaml |
| GPU-backed app | one GPU service plus a dependent application | examples/llama-app.yaml |
| llama.cpp + uv worker | llama.cpp serving plus a source-mounted Python worker run through uv | examples/llama-uv-worker.yaml |
| Minimal batch | simplest single-service batch job | examples/minimal-batch.yaml |
| Multi-node MPI | one helper on the primary node plus one allocation-wide distributed step | examples/multi-node-mpi.yaml |
| Multi-node torchrun | allocation-wide GPU training with the primary node as rendezvous | examples/multi-node-torchrun.yaml |
| Training checkpoints | GPU training with checkpoints to shared storage | examples/training-checkpoints.yaml |
| Training resume | GPU training with a shared resume directory and attempt-aware checkpoints | examples/training-resume.yaml |
| Postgres ETL | PostgreSQL plus a Python data processing job | examples/postgres-etl.yaml |
| vLLM serving | vLLM with an in-job Python client | examples/vllm-openai.yaml |
| vLLM + uv worker | vLLM serving with a source-mounted Python worker run through uv | examples/vllm-uv-worker.yaml |
| MPI hello | MPI hello world with Open MPI | examples/mpi-hello.yaml |
| Multi-stage pipeline | two-stage pipeline with file-based handoff | examples/multi-stage-pipeline.yaml |
| Data preprocessing | CPU-heavy NLP preprocessing pipeline | examples/fairseq-preprocess.yaml |
The fastest path is usually to copy the closest example and adapt it instead of starting from scratch.
You can also let hpc-compose scaffold one of these examples directly:
hpc-compose init --template dev-python-app --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml
1. Choose x-slurm.cache_dir early
Set x-slurm.cache_dir to a path that is visible from both the login node and the compute nodes.
x-slurm:
cache_dir: /shared/$USER/hpc-compose-cache
Rules:
- Do not use
/tmp,/var/tmp,/private/tmp, or/dev/shm. - If you leave
cache_dirunset, the default is$HOME/.cache/hpc-compose. - The default is convenient for small or home-directory workflows, but a shared project or workspace path is usually safer on real clusters.
- The important constraint is visibility:
prepareruns on the login node, but the batch job later reuses those cached artifacts from compute nodes.
2. Adapt the example to your workload
Start with the nearest example and then change:
imagecommand/entrypointvolumesenvironmentx-slurmresource settingsx-enroot.preparecommands for dependencies or tooling
Recommended pattern:
- Put fast-changing application code in
volumes. - Put slower-changing dependency installation in
x-enroot.prepare.commands. - Add
readinessto any service that other services truly depend on.
3. Validate the spec
hpc-compose validate -f compose.yaml
Use validate first when you are changing:
- field names,
depends_onshape,command/entrypointform,- path values,
x-slurm/x-enrootblocks.
If validate fails, fix that before doing anything more expensive.
4. Inspect the normalized plan
hpc-compose inspect -f compose.yaml
hpc-compose inspect --verbose -f compose.yaml
Check:
- service order,
- allocation geometry and each service’s step geometry,
- how images were normalized,
- final host-to-container mount mappings,
- resolved environment values,
- where runtime artifacts will live,
- whether the planner expects a cache hit or miss,
- whether a prepared image will rebuild on every submit because
prepare.mountsare present.
inspect is the quickest way to confirm that the planner understood your spec the way you intended.
inspect --verbose is a debugging-oriented view and can print secrets from resolved environment values.
5. Normal run: submit the job and watch it
hpc-compose submit --watch -f compose.yaml
submit does the normal end-to-end flow:
- run preflight unless
--no-preflightis set, - prepare images unless
--skip-prepareis set, - render the script,
- call
sbatch.
With --watch, submit also:
- records the tracked job metadata under
.hpc-compose/, - polls scheduler state with
squeue/sacctwhen available, - streams tracked service logs as they appear.
Note
submit treats preflight warnings as non-fatal. If you want warnings to block submission, run preflight --strict separately before submit.
Useful options:
--script-out path/to/job.sbatchkeeps a copy of the rendered script.- When
--script-outis omitted, the script is written to<compose-file-dir>/hpc-compose.sbatch. --force-rebuildrefreshes imported and prepared artifacts during submit.--skip-preparereuses existing prepared artifacts.--keep-failed-prepkeeps the Enroot rootfs around when a prepare step fails.
For the shipped examples, submit --watch is usually the only command you need in the normal run. Use the other commands when you need more visibility into planning, environment checks, image preparation, tracked job state, or the generated script.
6. Run preflight checks when you need to debug cluster readiness
hpc-compose preflight -f compose.yaml
hpc-compose preflight --verbose -f compose.yaml
preflight checks:
- required binaries (
enroot,srun,sbatch), scontrolwhenx-slurm.nodes > 1,- Pyxis container support in
srun, - cache directory policy and writability,
- local mount and image paths,
- registry credentials,
- skip-prepare reuse safety when relevant.
If your cluster installs these tools in non-standard locations, pass explicit paths:
hpc-compose preflight -f compose.yaml --enroot-bin /opt/enroot/bin/enroot --srun-bin /usr/local/bin/srun --sbatch-bin /usr/local/bin/sbatch
The same override flags (--enroot-bin, --srun-bin, --sbatch-bin) are available on prepare and submit.
Use strict mode if you want warnings to fail the command:
hpc-compose preflight -f compose.yaml --strict
7. Prepare images on the login node when needed
hpc-compose prepare -f compose.yaml
Use this when you want to:
- build or refresh prepared images before submission,
- confirm cache reuse behavior,
- debug preparation separately from job submission.
Force a refresh of imported and prepared artifacts:
hpc-compose prepare -f compose.yaml --force
8. Render the batch script when you need to inspect it
hpc-compose render -f compose.yaml --output /tmp/job.sbatch
This is useful when:
- debugging generated
srunarguments, - checking mounts and environment passing,
- reviewing the launch order and readiness waits.
9. Read logs and submission output
After a successful submit, hpc-compose prints:
- the rendered script path,
- the cache directory,
- one log path per service.
- the tracked metadata location when a numeric Slurm job id was returned.
Use the tracked helpers for later inspection:
hpc-compose status -f compose.yaml
hpc-compose stats -f compose.yaml
hpc-compose stats -f compose.yaml --format csv
hpc-compose stats -f compose.yaml --format jsonl
hpc-compose artifacts -f compose.yaml
hpc-compose artifacts -f compose.yaml --bundle checkpoints --tarball
hpc-compose cancel -f compose.yaml
hpc-compose logs -f compose.yaml
hpc-compose logs -f compose.yaml --service app --follow
status also reports the tracked top-level batch log path so early job failures are visible even when a service log was never created. When services.<name>.x-slurm.failure_policy is used, status includes per-service policy state (failure_policy, restart counters, and last exit code) from tracked runtime state.
For multi-node jobs, status also reports tracked placement geometry (placement_mode, nodes, task counts, and expanded nodelist) for each service.
stats now prefers sampler data from ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/metrics when x-slurm.metrics is enabled. In v1 that sampler can collect:
- GPU snapshots and compute-process rows through
nvidia-smi - job-step CPU and memory snapshots through
sstat
If the sampler is absent, disabled, or only partially available, stats falls back to live sstat. It works best for running jobs, requires the cluster’s jobacct_gather plugin to be enabled for Slurm-side step metrics, and only shows GPU accounting fields from Slurm when the cluster exposes GPU TRES accounting.
In multi-node v1, GPU sampler collection remains primary-node-only. Slurm step metrics still cover the whole step through sstat, but nvidia-smi fan-in across nodes is intentionally out of scope.
Use --format json, --format csv, or --format jsonl when you want machine-friendly output for dashboards, plotting, or experiment tracking. --format json is the preferred interface for validate, render, prepare, preflight, inspect, status, stats, artifacts, and cache subcommands. --json remains supported as a compatibility alias on older machine-readable commands.
Runtime logs live under:
${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/logs/<service>.log
That same per-job directory is also mounted inside every container at /hpc-compose/job. Use it for small cross-service coordination files when a workflow needs shared ephemeral state.
When metrics sampling is enabled, the job also writes:
${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/metrics/
meta.json
gpu.jsonl
gpu_processes.jsonl
slurm.jsonl
Collector failures are best-effort: missing nvidia-smi, missing sstat, or unsupported queries do not fail the batch job itself.
When x-slurm.artifacts is enabled, teardown collection writes:
${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/artifacts/
manifest.json
payload/...
Use hpc-compose artifacts -f compose.yaml after the job finishes to copy the collected payload into the configured x-slurm.artifacts.export_dir. The export path is resolved relative to the compose file and expands ${SLURM_JOB_ID} from tracked metadata.
If the compose file defines named bundles under x-slurm.artifacts.bundles, hpc-compose artifacts --bundle <name> exports only the selected bundle(s). Named bundles are written under <export_dir>/bundles/<bundle>/, and every export writes provenance JSON under <export_dir>/_hpc-compose/bundles/<bundle>.json. Add --tarball to also create <bundle>.tar.gz archives during export. The bundle name default is reserved for top-level x-slurm.artifacts.paths.
Slurm may also write a top-level batch log such as slurm-<jobid>.out, or to the path configured with x-slurm.output. Check that file first when the job fails before any service log appears.
Service names containing non-alphanumeric characters are encoded in the log filename. For example, a service named my.app produces my_x2e_app.log. Prefer [a-zA-Z0-9_-] in service names for readability.
If you used --script-out, keep that script with the job logs when debugging cluster behavior.
When x-slurm.resume is enabled, hpc-compose also:
- mounts the shared resume path into every service at
/hpc-compose/resume, - injects
HPC_COMPOSE_RESUME_DIR,HPC_COMPOSE_ATTEMPT, andHPC_COMPOSE_IS_RESUME, - writes attempt-specific runtime outputs under
.hpc-compose/<jobid>/attempts/<attempt>/, - keeps
.hpc-compose/<jobid>/{logs,metrics,artifacts,state.json}pointed at the latest attempt for compatibility.
Use the shared resume directory for the canonical checkpoint a restarted run should load next. Treat exported artifacts as retrieval and provenance output after the attempt finishes, not as the primary live resume source.
10. Inspect and prune cache artifacts
List cached artifacts:
hpc-compose cache list
Inspect cache state for the current plan:
hpc-compose cache inspect -f compose.yaml
Inspect a single service:
hpc-compose cache inspect -f compose.yaml --service app
Prune old entries by age (in days):
hpc-compose cache prune --age 14
Prune artifacts not referenced by the current plan:
hpc-compose cache prune --all-unused -f compose.yaml
The two strategies (--age and --all-unused) are mutually exclusive — pick one per invocation.
Use cache inspect when you need to answer questions such as:
- which artifact is being reused,
- whether a prepared image came from a cached manifest,
- whether a service rebuilds on every submit because of prepare mounts.
After upgrading hpc-compose
Cache keys include the tool version, so upgrading hpc-compose invalidates all existing cached artifacts. You will see a full rebuild on the next prepare or submit. To clean up orphaned artifacts after an upgrade:
hpc-compose cache prune --age 0
What changed and what should I run?
| If you changed… | Typical next step |
|---|---|
| YAML planning/runtime settings only | hpc-compose validate -f compose.yaml, hpc-compose inspect --verbose -f compose.yaml, then hpc-compose submit --watch -f compose.yaml |
The base image, x-enroot.prepare.commands, or prepare env | hpc-compose submit --watch --force-rebuild -f compose.yaml for the normal run, or hpc-compose prepare --force -f compose.yaml when debugging prepare separately |
Only mounted runtime source such as app code under volumes | Usually just hpc-compose submit --watch -f compose.yaml |
| Cache entries you no longer want and this plan does not reference | hpc-compose cache prune --all-unused -f compose.yaml |
hpc-compose itself | Expect cache misses on the next prepare or submit, then optionally prune old entries |
Decision guide
When should I use volumes?
Use volumes for source code or other files you edit frequently.
When should I use x-enroot.prepare.commands?
Use prepare commands for slower-changing dependencies, tools, or image customization that you want baked into a cached runtime image.
When should I use --skip-prepare?
Only when the prepared artifact already exists and you want to reuse it. preflight can warn or fail if reuse is unsafe.
When should I use --force-rebuild or prepare --force?
Use them after changing:
- the base image,
- prepare commands,
- prepare environment,
- tooling or dependencies that should invalidate the cached runtime image.
When should I manually run enroot remove?
Treat manual enroot remove as a rare last resort.
Use it only when Enroot state is clearly broken or inconsistent and hpc-compose prepare --force plus cache pruning did not fix the problem. In the normal rebuild or refresh path, prefer submit --force-rebuild, prepare --force, and cache prune so hpc-compose stays in charge of artifact state.
Why does my service rebuild every time?
If x-enroot.prepare.mounts is non-empty, that service intentionally rebuilds on every prepare / submit.
Troubleshooting
required binary '...' was not found
Run on a node with the Slurm client tools and Enroot available, or pass the explicit binary path with --enroot-bin, --srun-bin, or --sbatch-bin.
srun does not advertise --container-image
Pyxis support appears unavailable on that node. Move to a supported login node or cluster environment.
Cache directory errors or warnings
- Errors usually mean the path is not shared or not writable.
- A warning under
$HOMEmeans the path may work on some clusters, but a shared workspace or project path is safer because prepare happens on the login node and runtime happens on compute nodes.
Missing local mount or image paths
Remember that relative paths resolve from the compose file directory, not from the shell’s current working directory.
A mounted file exists on the host but not inside the container
This is often a symlink issue. If you mount a directory such as $HOME/models:/models and model.gguf is a symlink whose target lives outside $HOME/models, the target may not be visible inside the container. Copy the real file into the mounted directory or mount the directory that contains the symlink target.
Warning
The mount itself can succeed while the symlink target is still invisible inside the container. Check the target path, not just the link path.
Anonymous pull or registry credential warnings
Add the required credentials before relying on private registries or heavily rate-limited public registries.
Services start in the wrong order
Use depends_on with condition: service_healthy when a dependent must wait for a dependency’s readiness probe. Plain list form still means service_started.
When a TCP port opens before the service is fully usable, prefer HTTP or log-based readiness over TCP readiness.
Preview a submission without running sbatch
Use submit --dry-run to run the full pipeline (preflight, prepare, render) without actually calling sbatch. The rendered script is written to disk so you can inspect it:
hpc-compose submit --dry-run -f compose.yaml
Combine with --skip-prepare for a pure validation-and-render dry run.
Clean up old job directories
Tracked job metadata and logs accumulate in .hpc-compose/. Use clean to remove old entries:
# Remove jobs older than 7 days
hpc-compose clean -f compose.yaml --age 7
# Remove all except the latest tracked job
hpc-compose clean -f compose.yaml --all
Shell completions
Generate completions for your shell and source them:
# bash
hpc-compose completions bash > ~/.local/share/bash-completion/completions/hpc-compose
# zsh
hpc-compose completions zsh > ~/.zfunc/_hpc-compose
# fish
hpc-compose completions fish > ~/.config/fish/completions/hpc-compose.fish
Related docs
Examples
These examples are the fastest way to understand the intended hpc-compose workflows and adapt them to a real application.
For almost every example, the normal run is:
hpc-compose submit --watch -f examples/<example>.yaml
Use the debugging flow (validate, inspect, preflight, prepare) when you are wiring up the example for the first time or isolating a failure.
If you want one of these files written straight to your working directory, use:
hpc-compose init --template dev-python-app --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml
Example matrix
| Example | What it demonstrates | When to start from it |
|---|---|---|
app-redis-worker.yaml | Multiple services, depends_on, and TCP readiness checks | You need service startup ordering or a small multi-service stack |
dev-python-app.yaml | Mounted source code plus x-enroot.prepare.commands for dependencies | You want an iterative development workflow |
llm-curl-workflow.yaml | End-to-end LLM request flow with a login-node prepare step and a curl client | You want the smallest concrete inference workflow |
llm-curl-workflow-workdir.yaml | Same LLM workflow, but anchored under $HOME/models for direct use on a login node | You want the lowest-overhead path from a login-node home directory |
llama-app.yaml | GPU-backed service, mounted model files, dependent app service | You need accelerator resources or a model-serving pattern |
llama-uv-worker.yaml | llama.cpp serving plus a source-mounted Python worker executed through uv | You want the GGUF server + mounted worker pattern |
minimal-batch.yaml | Single service, no dependencies, no GPU, no prepare | You want the simplest possible starting point |
multi-node-mpi.yaml | One primary-node helper plus one allocation-wide distributed CPU step | You want a minimal multi-node pattern without adding orchestration |
multi-node-torchrun.yaml | Allocation-wide torchrun launch using the primary node as rendezvous | You want a multi-node GPU training starting point |
training-checkpoints.yaml | GPU training with checkpoints written to shared storage | You need a batch training workflow with artifact collection |
training-resume.yaml | GPU training with a shared resume directory and attempt-aware checkpoints | You need restart-safe checkpoint semantics across requeues or repeated submissions |
postgres-etl.yaml | PostgreSQL plus a Python data processing job | You need a database-backed batch pipeline |
vllm-openai.yaml | vLLM serving with an in-job Python client | You want vLLM-based inference instead of llama.cpp |
vllm-uv-worker.yaml | vLLM serving plus a source-mounted Python worker executed through uv | You want a common LLM stack with mounted app code |
mpi-hello.yaml | MPI hello world compiled and run with Open MPI | You need an MPI workload |
multi-stage-pipeline.yaml | Two-stage pipeline coordinating through the shared job mount | You need file-based stage-to-stage handoff |
fairseq-preprocess.yaml | CPU-heavy NLP data preprocessing with parallel workers | You need a CPU-bound data preprocessing pipeline |
Which example should I start from?
- Start with
minimal-batch.yamlif you are new tohpc-composeand want the smallest possible file. - Start with
multi-node-mpi.yamlif you need one distributed step plus small helper services on the primary node. - Start with
multi-node-torchrun.yamlif you need a torchrun-style rendezvous pattern across multiple nodes. - Start with
dev-python-app.yamlif you want a source-mounted development loop. - Start with
llm-curl-workflow-workdir.yamlif you want the fastest real-cluster GPU inference example. - Start with
training-checkpoints.yamlif you need a GPU training job with checkpoint output. - Start with
training-resume.yamlif you need resume-aware checkpoints on shared storage. - Start with
app-redis-worker.yamlorpostgres-etl.yamlif your workload depends on multi-service startup ordering.
Companion notes for the more involved examples live alongside the example assets:
examples/llm-curl/README.mdexamples/llama-uv-worker/README.mdexamples/vllm-uv-worker/README.mdexamples/models/README.md
Adaptation checklist
- Copy the closest example to your own
compose.yaml, or runhpc-compose init --template <name> --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml. - Set
x-slurm.cache_dirto a path visible from both the login node and the compute nodes. - Replace the example
image,command,environment, andvolumeswith your workload. - Keep active source in
volumesand keep slower-changing dependency installation inx-enroot.prepare.commands. - Add
readinessto services that must be reachable before dependents continue. - Adjust top-level or per-service
x-slurmsettings for your cluster. - Run the debugging flow before the first submit when you need to confirm planning, prerequisites, or cache behavior.
Related docs
Spec reference
This page describes the Compose subset that hpc-compose accepts today. Unknown or unsupported fields are rejected unless this page explicitly says otherwise.
Top-level shape
name: demo
version: "3.9"
x-slurm:
time: "00:30:00"
cache_dir: /shared/$USER/hpc-compose-cache
services:
app:
image: python:3.11-slim
command: python -m main
Top-level fields
| Field | Shape | Default | Notes |
|---|---|---|---|
name | string | omitted | Used as the Slurm job name when x-slurm.job_name is not set. |
version | string | omitted | Accepted for Compose compatibility. Ignored by the planner. |
services | mapping | required | Must contain at least one service. |
x-slurm | mapping | omitted | Top-level Slurm settings and shared runtime defaults. |
x-slurm
These fields live under the top-level x-slurm block.
| Field | Shape | Default | Notes |
|---|---|---|---|
job_name | string | name when present | Rendered as #SBATCH --job-name. |
partition | string | omitted | Passed through to #SBATCH --partition. |
account | string | omitted | Passed through to #SBATCH --account. |
qos | string | omitted | Passed through to #SBATCH --qos. |
time | string | omitted | Passed through to #SBATCH --time. |
nodes | integer | omitted | Slurm allocation node count. Defaults to 1 when omitted. |
ntasks | integer | omitted | Passed through to #SBATCH --ntasks. |
ntasks_per_node | integer | omitted | Passed through to #SBATCH --ntasks-per-node. |
cpus_per_task | integer | omitted | Top-level Slurm CPU request. |
mem | string | omitted | Passed through to #SBATCH --mem. |
gres | string | omitted | Passed through to #SBATCH --gres. |
gpus | integer | omitted | Used only when gres is not set. |
constraint | string | omitted | Passed through to #SBATCH --constraint. |
output | string | omitted | Passed through to #SBATCH --output. |
error | string | omitted | Passed through to #SBATCH --error. |
chdir | string | omitted | Passed through to #SBATCH --chdir. |
cache_dir | string | $HOME/.cache/hpc-compose | Must resolve to shared storage visible from the login node and the compute nodes. |
metrics | mapping | omitted | Enables runtime metrics sampling. |
artifacts | mapping | omitted | Enables tracked artifact collection and export metadata. |
resume | mapping | omitted | Enables checkpoint-aware resume semantics with a shared host path mounted into every service. |
setup | list of strings | omitted | Raw shell lines inserted into the generated batch script before service launches. |
submit_args | list of strings | omitted | Extra raw Slurm arguments appended as #SBATCH ... lines. |
x-slurm.setup
x-slurm:
setup:
- module load enroot
- source /shared/env.sh
- Shape: list of strings
- Default: omitted
- Notes:
- Each line is emitted verbatim into the generated bash script.
- The script runs under
set -euo pipefail. - Shell quoting and escaping are the user’s responsibility.
x-slurm.submit_args
x-slurm:
submit_args:
- "--mail-type=END"
- "--mail-user=user@example.com"
- "--reservation=gpu-reservation"
- Shape: list of strings
- Default: omitted
- Notes:
- Each entry is emitted as
#SBATCH {arg}. - Entries are not validated against Slurm option syntax.
- Each entry is emitted as
x-slurm.cache_dir
- Shape: string
- Default:
$HOME/.cache/hpc-compose - Notes:
- Relative paths and environment variables are resolved against the compose file directory.
- Paths under
/tmp,/var/tmp,/private/tmp, and/dev/shmare rejected. - The path must be visible from both the login node and the compute nodes.
Multi-node placement rules
x-slurm.nodes > 1reserves a multi-node allocation.- Multi-node v1 supports at most one distributed service spanning the full allocation.
- Helper services remain single-node steps and are pinned to the allocation’s primary node.
- When a multi-node job has exactly one service, that service defaults to the distributed full-allocation step.
- Distributed services may use
readiness.type: sleeporreadiness.type: log, or TCP/HTTP readiness only with an explicit non-local host or URL.
x-slurm.metrics
x-slurm:
metrics:
interval_seconds: 5
collectors: [gpu, slurm]
- Shape: mapping
- Default: omitted
- Notes:
- Omitting the block disables runtime metrics sampling.
- If the block is present and
enabledis omitted, metrics sampling is enabled. interval_secondsdefaults to5and must be at least1.collectorsdefaults to[gpu, slurm].- Supported collectors:
gpusamples device and process telemetry throughnvidia-smislurmsamples job-step CPU and memory data throughsstat
- In multi-node v1,
gpusampling remains primary-node-only;slurmsampling still observes the full distributed step throughsstat. - Sampler files are written under
${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/metricson the host and are also visible inside containers at/hpc-compose/job/metrics. - Collector failures are best-effort and do not fail the batch job.
x-slurm.artifacts
x-slurm:
artifacts:
collect: always
export_dir: ./results/${SLURM_JOB_ID}
paths:
- /hpc-compose/job/metrics/**
bundles:
checkpoints:
paths:
- /hpc-compose/job/checkpoints/*.pt
- Shape: mapping
- Default: omitted
- Notes:
- Omitting the block disables tracked artifact collection.
collectdefaults toalways. Supported values arealways,on_success, andon_failure.export_diris required and is resolved relative to the compose file directory whenhpc-compose artifactsruns.${SLURM_JOB_ID}is preserved inexport_diruntilhpc-compose artifactsexpands it from tracked metadata.pathsremains supported as the implicitdefaultbundle.bundlesis optional. Bundle names must match[A-Za-z0-9_-]+, anddefaultis reserved for top-levelpaths.- At least one source path must be present in
pathsorbundles. - Every source path must be an absolute container-visible path rooted at
/hpc-compose/job. - Paths under
/hpc-compose/job/artifactsare rejected. - Collection happens during batch teardown and is best-effort.
- Collected payloads and
manifest.jsonare written under${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/artifacts/. hpc-compose artifacts --bundle <name>exports only the selected bundle or bundles.hpc-compose artifacts --tarballalso writes one<bundle>.tar.gzarchive per exported bundle.- Export writes per-bundle provenance metadata under
<export_dir>/_hpc-compose/bundles/<bundle>.json.
x-slurm.resume
x-slurm:
resume:
path: /shared/$USER/runs/my-run
- Shape: mapping
- Default: omitted
- Notes:
- Omitting the block disables resume semantics.
pathis required and must be an absolute host path./hpc-compose/...paths are rejected becausepathmust point at shared host storage, not a container-visible path./tmpand/var/tmptechnically validate, butpreflightwarns because those paths are not reliable resume storage.- When enabled,
hpc-composemountspathinto every service at/hpc-compose/resume. - Services also receive
HPC_COMPOSE_RESUME_DIR,HPC_COMPOSE_ATTEMPT, andHPC_COMPOSE_IS_RESUME. - The canonical resume source is the shared
path, not exported artifact bundles. - Attempt-specific runtime state moves under
${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/attempts/<attempt>/, and the top-levellogs,metrics,artifacts, andstate.jsonpaths continue to point at the latest attempt for compatibility.
Allocation metadata inside services
Every service receives:
HPC_COMPOSE_PRIMARY_NODEHPC_COMPOSE_NODE_COUNTHPC_COMPOSE_NODELISTHPC_COMPOSE_NODELIST_FILE
The same data is also written under /hpc-compose/job/allocation/primary_node and /hpc-compose/job/allocation/nodes.txt.
gres and gpus
When both gres and gpus are set at the same level, gres takes priority and gpus is ignored.
Service fields
| Field | Shape | Default | Notes |
|---|---|---|---|
image | string | required | Can be a remote image reference or a local .sqsh / .squashfs path. |
command | string or list of strings | omitted | Shell form or exec form. |
entrypoint | string or list of strings | omitted | Must use the same form as command when both are present. |
environment | mapping or list of KEY=VALUE strings | omitted | Both forms normalize to key/value pairs. |
volumes | list of host_path:container_path strings | omitted | Runtime bind mounts. Host paths resolve against the compose file directory. |
working_dir | string | omitted | Valid only when the service also has an explicit command or entrypoint. |
depends_on | list or mapping | omitted | Dependency list with service_started or service_healthy conditions. |
readiness | mapping | omitted | Post-launch readiness gate. |
healthcheck | mapping | omitted | Compose-compatible sugar for a subset of readiness. Mutually exclusive with readiness. |
x-slurm | mapping | omitted | Per-service Slurm overrides. |
x-enroot | mapping | omitted | Per-service Enroot preparation rules. |
Image rules
Remote images
- Any image reference without an explicit
://scheme is prefixed withdocker://. - Explicit schemes are allowed only for
docker://,dockerd://, andpodman://. - Other schemes are rejected.
- Shell variables in the image string are expanded at plan time.
- Unset variables expand to empty strings.
Local images
- Local image paths must point to
.sqshor.squashfsfiles. - Relative paths are resolved against the compose file directory.
- Paths that look like build contexts are rejected.
command and entrypoint
Both fields accept either:
- a string, interpreted as shell form
- a list of strings, interpreted as exec form
Rules:
- If both fields are present, they must use the same form.
- Mixed string/array combinations are rejected.
- If neither field is present, the image default entrypoint and command are used.
- If
working_diris set, at least one ofcommandorentrypointmust also be set.
environment
Accepted forms:
environment:
APP_ENV: prod
LOG_LEVEL: info
environment:
- APP_ENV=prod
- LOG_LEVEL=info
Rules:
- List items must use
KEY=VALUEsyntax. .envfrom the compose file directory is loaded automatically when present.- Shell environment variables override
.env;.envfills only missing variables. environmentandx-enroot.prepare.envvalues support$VAR,${VAR},${VAR:-default}, and${VAR-default}interpolation.- Missing variables without defaults are errors.
- Use
$$for a literal dollar sign in interpolated fields. - String-form shell snippets are still literal. For example,
$PATHinside a string-formcommandis not expanded at plan time.
volumes
Accepted form:
volumes:
- ./app:/workspace
- /shared/data:/data
Rules:
- Host paths are resolved against the compose file directory.
- Runtime mounts are passed through
srun --container-mounts=.... - Every service also gets an automatic shared mount at
/hpc-compose/job, backed by${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}on the host. /hpc-compose/jobis reserved and cannot be used as an explicit volume destination.
Warning
If a mounted file is a symlink, the symlink target must also be visible from inside the mounted directory. Otherwise the path can exist on the host but fail inside the container.
depends_on
Accepted forms:
depends_on:
- redis
depends_on:
redis:
condition: service_started
depends_on:
redis:
condition: service_healthy
Rules:
- List form means
condition: service_started. - Map form accepts
condition: service_startedandcondition: service_healthy. service_healthyrequires the dependency service to definereadiness.service_startedwaits only for the dependency process to be launched and still alive.service_healthywaits for the dependency readiness check to succeed.
readiness
Supported types:
Sleep
readiness:
type: sleep
seconds: 5
secondsis required.
TCP
readiness:
type: tcp
host: 127.0.0.1
port: 6379
timeout_seconds: 30
hostdefaults to127.0.0.1.timeout_secondsdefaults to60.
Log
readiness:
type: log
pattern: "Server started"
timeout_seconds: 60
timeout_secondsdefaults to60.
HTTP
readiness:
type: http
url: http://127.0.0.1:8080/health
status_code: 200
timeout_seconds: 30
status_codedefaults to200.timeout_secondsdefaults to60.- The readiness check polls the URL through
curl.
healthcheck
healthcheck is accepted as migration sugar and is normalized into the readiness model.
services:
redis:
image: redis:7
healthcheck:
test: ["CMD", "nc", "-z", "127.0.0.1", "6379"]
timeout: 30s
Rules:
healthcheckandreadinessare mutually exclusive.- Supported probe forms are a constrained subset:
["CMD", "nc", "-z", HOST, PORT]["CMD-SHELL", "nc -z HOST PORT"]- recognized
curlprobes againsthttp://orhttps://URLs - recognized
wget --spiderprobes againsthttp://orhttps://URLs
timeoutmaps totimeout_seconds.disable: truedisables readiness for that service.interval,retries, andstart_periodare parsed but rejected in v1.- HTTP-style healthchecks normalize to
readiness.type: httpwithstatus_code: 200.
Service-level x-slurm
These fields live under services.<name>.x-slurm.
| Field | Shape | Default | Notes |
|---|---|---|---|
nodes | integer | omitted | 1 for a helper step, or the full top-level allocation node count for the one distributed service. |
ntasks | integer | omitted | Adds --ntasks to that service’s srun. |
ntasks_per_node | integer | omitted | Adds --ntasks-per-node to that service’s srun. |
cpus_per_task | integer | omitted | Adds --cpus-per-task to that service’s srun. |
gpus | integer | omitted | Adds --gpus when gres is not set. |
gres | string | omitted | Adds --gres to that service’s srun. Takes priority over gpus. |
extra_srun_args | list of strings | omitted | Appended directly to the service’s srun command. |
failure_policy | mapping | omitted | Per-service failure handling (fail_job, ignore, restart_on_failure). |
services.<name>.x-slurm.failure_policy
services:
worker:
image: python:3.11-slim
x-slurm:
failure_policy:
mode: restart_on_failure
max_restarts: 3
backoff_seconds: 5
| Field | Shape | Default | Notes |
|---|---|---|---|
mode | fail_job | ignore | restart_on_failure | fail_job | fail_job keeps fail-fast behavior. ignore keeps the job running after non-zero exits. restart_on_failure restarts on non-zero exits only. |
max_restarts | integer | 3 when mode=restart_on_failure | Required to be at least 1 after defaults are applied. Valid only for restart_on_failure. |
backoff_seconds | integer | 5 when mode=restart_on_failure | Fixed delay between restart attempts. Required to be at least 1 after defaults are applied. Valid only for restart_on_failure. |
Rules:
- In a multi-node allocation, at most one service may resolve to distributed placement.
- Distributed placement requires
services.<name>.x-slurm.nodesto equal the top-level allocation node count when it is set explicitly. - Helper services in multi-node jobs are pinned to
HPC_COMPOSE_PRIMARY_NODE. max_restartsandbackoff_secondsare rejected unlessmode: restart_on_failure.- Restart attempts count relaunches after the initial launch.
- Restarts trigger only for non-zero exits.
- Services configured with
mode: ignorecannot be used as dependencies independs_on.
Unknown keys under top-level x-slurm or per-service x-slurm cause hard errors.
x-enroot.prepare
x-enroot.prepare lets a service build a prepared runtime image from its base image before submission.
services:
app:
image: python:3.11-slim
x-enroot:
prepare:
commands:
- pip install --no-cache-dir numpy pandas
mounts:
- ./requirements.txt:/tmp/requirements.txt
env:
PIP_CACHE_DIR: /tmp/pip-cache
root: true
| Field | Shape | Default | Notes |
|---|---|---|---|
commands | list of strings | required when prepare is present | Each command runs via enroot start ... /bin/sh -lc .... |
mounts | list of host_path:container_path strings | omitted | Visible only during prepare. Relative host paths resolve against the compose file directory. |
env | mapping or list of KEY=VALUE strings | omitted | Passed only during prepare. Values support the same interpolation rules as environment. |
root | boolean | true | Controls whether prepare commands run with --root. |
Rules:
- If
x-enroot.prepareis present,commandscannot be empty. - If
prepare.mountsis non-empty, the service rebuilds on everyprepareorsubmit. - Remote base images are imported under
cache_dir/base. - Prepared images are exported under
cache_dir/prepared. - Unknown keys under
x-enrootorx-enroot.preparecause hard errors.
Unsupported Compose keys
These keys are rejected with explicit messages:
buildportsnetworksnetwork_mode- Compose
restart(useservices.<name>.x-slurm.failure_policy) deploy
Any other unknown key at the service level is also rejected.
Supported Slurm Model
This page makes the hpc-compose Slurm boundary explicit. It is a tool for compiling one Compose-like application into one Slurm allocation with one or more containerized srun steps. It is not a general frontend for the full Slurm command surface.
First-class support
These capabilities are modeled, validated, and intentionally supported by the planner, renderer, and tracked-job workflow.
| Area | Support |
|---|---|
| Allocation model | One Slurm allocation per application |
| Submission flow | validate, inspect, preflight, prepare, render, submit, submit --watch |
| Tracked job workflow | status, stats, logs, cancel, artifacts, clean, cache inspection/pruning |
| Top-level Slurm fields | job_name, partition, account, qos, time, nodes, ntasks, ntasks_per_node, cpus_per_task, mem, gres, gpus, constraint, output, error, chdir |
| Service step fields | nodes, ntasks, ntasks_per_node, cpus_per_task, gres, gpus |
| Multi-node model | Single-node jobs and constrained multi-node runs with at most one distributed service spanning the allocation |
| Runtime orchestration | depends_on, readiness checks, service failure policies, primary-node helper placement |
| Container workflow | Remote images, local .sqsh images, x-enroot.prepare, shared cache handling |
| Job tracking | Scheduler state via squeue/sacct, step stats via sstat, tracked logs, runtime state, metrics, artifacts, resume metadata |
Raw pass-through
These capabilities are usable, but hpc-compose does not model or validate their semantics beyond passing them through to Slurm.
| Mechanism | What it allows |
|---|---|
x-slurm.submit_args | Raw #SBATCH ... lines for site-specific flags such as mail settings, reservations, or other submit-time options |
services.<name>.x-slurm.extra_srun_args | Raw srun arguments for site-specific launch flags such as MPI or exclusivity settings |
| Existing reservations | Joining an already-created reservation through raw submit args is supported as pass-through |
Pass-through is appropriate when a site-specific flag is useful but does not justify a first-class schema field. It is not a guarantee that hpc-compose understands the operational consequences of that flag.
Unsupported or out of scope
These capabilities are intentionally outside the product seam.
| Area | Status |
|---|---|
| Admin-plane Slurm management | Out of scope |
sacctmgr account administration | Out of scope |
| Reservation creation or lifecycle management | Out of scope |
| Federation / multi-cluster control | Out of scope |
Generic scontrol mutation | Out of scope |
Broad cluster inspection tools such as a full sinfo / sprio / sreport frontend | Out of scope |
| Arbitrary multi-node orchestration or partial-node service placement | Not supported in v1 |
| Heterogeneous jobs and job arrays as first-class workflow concepts | Not supported in v1 |
Compose build, ports, custom networks, restart, deploy | Not supported |
Non-goals
hpc-compose should not grow into a generic Slurm administration layer. In particular, it will not broaden into sacctmgr, reservation management, federation control, or generic scontrol mutation. Those are real Slurm features, but they do not fit the “one application, one allocation, tracked runtime workflow” seam this tool is built around.
Migrating from Docker Compose
This guide helps you convert an existing docker-compose.yaml into an hpc-compose spec for Slurm clusters with Enroot and Pyxis.
At a glance
| Docker Compose feature | hpc-compose equivalent |
|---|---|
image | image (same syntax, auto-prefixed with docker://) |
command | command (string or list, same syntax) |
entrypoint | entrypoint (string or list, same syntax) |
environment | environment (map or list, same syntax) |
volumes | volumes (host:container bind mounts, same syntax) |
depends_on | depends_on (list or map with condition: service_started / service_healthy) |
working_dir | working_dir (requires explicit command or entrypoint) |
build | Not supported. Use image + x-enroot.prepare.commands instead. |
ports | Not supported. Use host networking semantics instead. 127.0.0.1 works only when both sides run on the same node. |
networks / network_mode | Not supported. There is no Docker-style overlay network or service-name DNS layer. |
restart | Not supported as a Compose key. Use services.<name>.x-slurm.failure_policy. |
deploy | Not supported. Use x-slurm for resource allocation. |
healthcheck | Supported for a constrained TCP/HTTP subset and normalized into readiness; use explicit readiness for anything more complex. |
Resource limits (cpus, mem_limit) | Use x-slurm.cpus_per_task, x-slurm.mem, x-slurm.gpus |
Side-by-side: web app + Redis
Docker Compose
version: "3.9"
services:
redis:
image: redis:7
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
app:
build: .
ports:
- "8000:8000"
depends_on:
redis:
condition: service_healthy
environment:
REDIS_HOST: redis
volumes:
- ./app:/workspace
working_dir: /workspace
command: python -m main
hpc-compose
name: my-app
x-slurm:
job_name: my-app
time: "01:00:00"
mem: 8G
cpus_per_task: 4
cache_dir: /shared/$USER/hpc-compose-cache
services:
redis:
image: redis:7
command: redis-server --save "" --appendonly no
readiness:
type: tcp
host: 127.0.0.1
port: 6379
timeout_seconds: 30
app:
image: python:3.11-slim
depends_on:
redis:
condition: service_healthy
environment:
REDIS_HOST: 127.0.0.1
volumes:
- ./app:/workspace
working_dir: /workspace
command: python -m main
x-enroot:
prepare:
commands:
- pip install --no-cache-dir redis fastapi uvicorn
Key changes
build: .→image: python:3.11-slim+x-enroot.prepare.commandsfor dependencies.ports→ Removed. Services communicate via127.0.0.1because they run on the same node.REDIS_HOST: redis→REDIS_HOST: 127.0.0.1. No DNS service names; use localhost.healthcheck→readinesswithtype: tcp.- Added
x-slurmblock for Slurm resource allocation (time, memory, CPUs). - Added
x-slurm.cache_dirfor shared image storage.
Key differences
Networking
Docker Compose creates isolated networks where services find each other by name. In hpc-compose, helper services on the same node share the host network directly, and multi-node distributed steps must use explicit rendezvous addresses. Replace service hostnames with 127.0.0.1 only when both sides intentionally stay on one node. For multi-node runs, derive the rendezvous host from /hpc-compose/job/allocation/primary_node or HPC_COMPOSE_PRIMARY_NODE.
Building images
Docker Compose uses build: to run a Dockerfile. hpc-compose uses x-enroot.prepare.commands instead:
# Docker Compose
app:
build:
context: .
dockerfile: Dockerfile
# hpc-compose
app:
image: python:3.11-slim
x-enroot:
prepare:
commands:
- pip install --no-cache-dir -r /tmp/requirements.txt
mounts:
- ./requirements.txt:/tmp/requirements.txt
Prefer volumes for fast-changing source code and x-enroot.prepare.commands for slower-changing dependencies.
Health checks vs readiness
Docker Compose uses healthcheck with a test command, interval, timeout, and retries. hpc-compose now accepts a constrained healthcheck subset and normalizes it into readiness:
# TCP: wait for a port to accept connections
readiness:
type: tcp
host: 127.0.0.1
port: 6379
timeout_seconds: 30
# Log: wait for a pattern in service output
readiness:
type: log
pattern: "Server started"
timeout_seconds: 60
# Sleep: fixed delay
readiness:
type: sleep
seconds: 5
Supported healthcheck migration patterns:
["CMD", "nc", "-z", HOST, PORT]["CMD-SHELL", "nc -z HOST PORT"]- recognized
curlprobes againsthttp://orhttps://URLs - recognized
wget --spiderprobes againsthttp://orhttps://URLs
Still unsupported in v1:
- arbitrary custom command probes
intervalretriesstart_period
Resource allocation
Docker Compose uses deploy.resources or top-level cpus/mem_limit. hpc-compose uses Slurm-native resource settings:
x-slurm:
time: "02:00:00"
mem: 32G
cpus_per_task: 8
gpus: 1
services:
app:
x-slurm:
cpus_per_task: 4
gpus: 1
Restart policies
Docker Compose supports restart: always, on-failure, etc. hpc-compose does not accept the Compose restart: key, but it does support per-service restart behavior through services.<name>.x-slurm.failure_policy.
services:
app:
image: python:3.11-slim
x-slurm:
failure_policy:
mode: restart_on_failure
max_restarts: 3
backoff_seconds: 5
restart_on_failure retries only on non-zero exits. Use mode: fail_job (default) for fail-fast behavior, or mode: ignore for non-critical sidecars.
What to do about unsupported features
| Feature | Alternative |
|---|---|
build | Use image + x-enroot.prepare.commands. Mount build context files with x-enroot.prepare.mounts if needed. |
ports | Not needed. Services share 127.0.0.1 on one node. |
networks / network_mode | Not needed. All services are on the same host network. |
restart | Use services.<name>.x-slurm.failure_policy (fail_job, ignore, restart_on_failure). |
deploy | Use x-slurm for resources. |
| Service DNS names | Use 127.0.0.1 for same-node helpers, or explicit host metadata such as HPC_COMPOSE_PRIMARY_NODE for distributed runs. |
| Named volumes | Use host-path bind mounts in volumes. |
.env file | Supported. .env in the compose file directory is loaded automatically. |
Migration checklist
- Remove
build:— Replace withimage:pointing to a base image. Move dependency installation tox-enroot.prepare.commands. - Remove
ports:— Use host-network semantics instead of container port publishing. - Remove
networks:/network_mode:— There is no Docker-style overlay network or service-name DNS layer. - Remove Compose
restart:— useservices.<name>.x-slurm.failure_policywhen you need per-service restart behavior. - Remove
deploy:— Usex-slurmfor resource allocation. - Replace service hostnames — Change any service-name references (e.g.
redis,postgres) to127.0.0.1for same-node helpers, or to explicit allocation metadata for distributed runs. - Replace
healthcheck:— Convert toreadiness:withtype: tcp,type: log, ortype: sleep. - Add
x-slurm:— Settime,mem,cpus_per_task, and optionallygpus,partition,account. - Set
cache_dir— Pointx-slurm.cache_dirto shared storage visible from login and compute nodes. - Validate — Run
hpc-compose validate -f compose.yamlto check the converted spec. - Inspect — Run
hpc-compose inspect --verbose -f compose.yamlto confirm the planner understood your intent.
Related docs
Architecture for Contributors
The CLI is intentionally thin. Most behavior lives in the library crate so the binary, integration tests, and generated rustdoc all describe the same pipeline.
Module map
spec: parse, interpolate, and validate the supported Compose subsetplanner: normalize the parsed spec into a deterministic planpreflight: check login-node prerequisites and cluster policy issuesprepare: import base images and rebuild prepared runtime artifactsrender: generate the finalsbatchscript and service launch commandsjob: track submissions, logs, metrics, status, and artifact exportcache: persist cache manifests for imported and prepared imagesinit: expose the shipped example templates forhpc-compose init
Execution flow
ComposeSpec::loadparses YAML, validates supported keys, interpolates variables, and applies semantic validation.planner::build_planresolves paths, command shapes, dependencies, and prepare blocks into a normalized plan.prepare::build_runtime_plancomputes concrete cache artifact locations.preflight::runchecks cluster prerequisites before submission.prepare::prepare_runtime_planimports or rebuilds artifacts when needed.render::render_scriptemits the batch script consumed bysbatch.jobpersists tracked metadata under.hpc-compose/and powersstatus,stats,logs,cancel, and artifact export.
Contributor commands
cargo test
cargo test --test cli
cargo doc --no-deps
mdbook build docs
Documentation split
- Use this mdBook for user-facing workflows, examples, and reference material.
- Use rustdoc for contributor-facing internals and the library module map.
- Keep README short and point readers into the book instead of duplicating long-form guidance.