Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

hpc-compose

hpc-compose logo

hpc-compose turns a Compose-like spec into a single Slurm job that runs one or more services through Enroot and Pyxis.

hpc-compose is intentionally not a full Docker Compose implementation. It focuses on the subset that maps cleanly to one Slurm allocation, plus either single-node services or one allocation-wide distributed service without a separate orchestration layer.

Start Here

  1. Read Quickstart for the shortest install-and-run path.
  2. Read Support Matrix to confirm what is officially supported, CI-tested, or only release-built.
  3. Use Task Guide when you want the shortest path for a specific workflow.
  4. Read Execution model to understand what runs on the login node, what runs on the compute node, and which paths must be shared.
  5. Use Runbook when adapting a real workload to a real cluster.
  6. Use Examples when you want the closest known-good starting point.
  7. Use Spec reference when you need exact field behavior or validation rules.
  8. Use Supported Slurm model when you need the product boundary spelled out clearly.

What it is for

  • One Slurm allocation per application
  • Single-node jobs and constrained multi-node distributed runs
  • Optional helper services pinned to the allocation’s primary node
  • Remote images such as redis:7 or existing local .sqsh images
  • Optional image customization on the login node through x-enroot.prepare
  • Shared cache management for imported and prepared images
  • Readiness-gated startup across dependent services

What it does not support

  • Compose build:
  • ports
  • custom Docker networks / network_mode
  • restart policies
  • deploy
  • arbitrary multi-node orchestration or partial-node service placement
  • mixed string/array entrypoint + command combinations in ambiguous cases

If you need image customization, use image: plus x-enroot.prepare, not build:.

Fast path

name: hello

x-slurm:
  time: "00:10:00"
  mem: 4G

services:
  app:
    image: python:3.11-slim
    command: python -c "print('Hello from Slurm!')"
hpc-compose submit --watch -f compose.yaml

submit --watch is the normal run. Break out inspect, preflight, or prepare as the debugging flow when you are validating a new spec for the first time or isolating a failure.

Installation

One-line installer

For supported Linux and macOS targets, the repo now ships a small installer script that picks the newest release and the matching archive for your machine:

curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | sh

By default this installs hpc-compose into ~/.local/bin and verifies the published SHA-256 checksum before placing the binary.

Installer availability does not imply full runtime support. Check the Support Matrix before assuming that a platform can run submission, prepare, or watch workflows end to end.

Useful overrides:

curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | env HPC_COMPOSE_INSTALL_DIR=/usr/local/bin sh
curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | env HPC_COMPOSE_VERSION=v0.1.12 sh

Supported targets match the release workflow:

  • Linux x86_64
  • Linux arm64
  • macOS x86_64
  • macOS arm64

Windows release archives are also published, but Windows is not part of the installer path and is not an officially supported runtime target.

Download a release build manually

Prebuilt archives are published on the project’s GitHub Releases.

Typical flow on Linux or macOS:

curl -L https://github.com/NicolasSchuler/hpc-compose/releases/latest/download/hpc-compose-v0.1.12-x86_64-unknown-linux-musl.tar.gz -o hpc-compose.tar.gz
tar -xzf hpc-compose.tar.gz
./hpc-compose --help

Pick the archive that matches your platform from the release page. Linux x86_64 releases use a musl target to avoid common cluster glibc mismatches.

Build from source

Requirements:

  • Rust stable toolchain
  • A normal local build machine for the CLI itself
  • Slurm/Enroot tools only when you actually run preflight, prepare, or submit
git clone https://github.com/NicolasSchuler/hpc-compose.git
cd hpc-compose
cargo build --release
./target/release/hpc-compose --help

Local docs commands

The repo ships two documentation layers:

  • mdbook for the user manual
  • cargo doc for contributor-facing crate internals

Useful commands:

mdbook build docs
mdbook serve docs
cargo doc --no-deps

Verification

Before using a local build on a cluster workflow, validate the binary and one example spec:

target/release/hpc-compose validate -f examples/minimal-batch.yaml
target/release/hpc-compose inspect --verbose -f examples/minimal-batch.yaml

Quickstart

This is the shortest install-and-run path from an empty shell to a submitted job.

1. Install a release binary

curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | sh

The installer selects the newest published release for the current Linux or macOS machine and installs hpc-compose into ~/.local/bin by default. Check the Support Matrix before assuming that a platform can run full cluster workflows.

2. Initialize a starter spec

hpc-compose init \
  --template minimal-batch \
  --name my-app \
  --cache-dir /shared/$USER/hpc-compose-cache \
  --output compose.yaml

If you already know the closest shipped example, copy it directly instead. The Examples page is the fastest way to choose one.

3. Normal run

hpc-compose submit --watch -f compose.yaml

submit --watch is the normal run. It runs preflight, prepares missing artifacts, renders the batch script, submits it through sbatch, then follows scheduler state and tracked logs.

4. Debugging flow

hpc-compose validate -f compose.yaml
hpc-compose inspect --verbose -f compose.yaml
hpc-compose preflight -f compose.yaml
hpc-compose prepare -f compose.yaml

Use the debugging flow when you want to confirm:

  • service order
  • normalized image references
  • cache artifact paths
  • whether prepare steps will rebuild every submit

Warning

inspect --verbose prints resolved environment values and final mount mappings. Treat its output as sensitive when the spec contains secrets.

5. Revisit a tracked run later

hpc-compose status -f compose.yaml
hpc-compose stats -f compose.yaml
hpc-compose logs -f compose.yaml --follow

From a source checkout

If you are running from a local checkout instead of an installed binary:

cargo build --release
target/release/hpc-compose init --template minimal-batch --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml
target/release/hpc-compose submit --watch -f compose.yaml
  • Use the Execution model page to understand what runs where and which paths must be shared.
  • Use the Support Matrix page to confirm what is officially supported versus only release-built.
  • Use the Task Guide page when you want a goal-oriented starting point.
  • Use the Runbook when adapting a real workload to a real cluster.
  • Use the Examples page when you want the closest known-good template.
  • Use the Spec Reference when changing fields or validation-sensitive values.

Support Matrix

This page separates what hpc-compose can build, what CI currently exercises, and what is officially supported for real workflows.

Support levels

LevelMeaning
Officially supportedMaintained target for user-facing workflows and issue triage
CI-testedExercised in the repository’s automated checks today
Release-builtPrebuilt archive is published, but that is not a promise of full runtime support

Officially supported

PlatformScopeNotes
Linux x86_64Full CLI and runtime workflowsRequires Slurm client tools plus Enroot and Pyxis on the submission host/cluster
Linux arm64Full CLI and runtime workflowsSame cluster requirements as Linux x86_64
macOS x86_64Authoring and local inspection onlySupported for init, validate, inspect, render, and completions; not for cluster runtime commands
macOS arm64Authoring and local inspection onlySame scope as macOS x86_64

CI-tested

PlatformWhat is tested today
Ubuntu 24.04 x86_64formatting, clippy, unit/integration tests, docs build, link checks, installer smoke tests, and coverage

Current CI validates project behavior on Ubuntu. Other published builds should be treated as lower-confidence until corresponding CI coverage exists.

Release-built

PlatformStatus
Linux x86_64Release archive published
Linux arm64Release archive published
macOS x86_64Release archive published
macOS arm64Release archive published
Windows x86_64Release archive published, but runtime workflows are not officially supported

Windows status

Windows archives are published so users can inspect the CLI surface or experiment with non-runtime commands, but Windows is currently release-built only:

  • Slurm + Enroot + Pyxis runtime workflows are not an officially supported Windows target.
  • Issues that are specific to Windows runtime execution may be closed as out of scope until the support policy changes.

Cluster assumptions for full support

For full runtime support on Linux, the target environment should provide:

  • sbatch, srun, and related Slurm client tools on the submission host
  • Pyxis container support in srun
  • Enroot on the submission host for image import and prepare steps
  • shared storage for x-slurm.cache_dir

Use Runbook and Execution model before adapting a real workload to a cluster.

Task Guide

Use this page when you know what you want to do, but not yet which command or example should be your starting point.

First run

  • Read Quickstart.
  • Start from minimal-batch with hpc-compose init --template minimal-batch --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml.
  • Run hpc-compose submit --watch -f compose.yaml.

Migrate from Docker Compose

  • Read Docker Compose Migration.
  • Replace build: with image: plus x-enroot.prepare.commands.
  • Replace service-name networking with 127.0.0.1 or explicit allocation metadata where appropriate.

Single-node multi-service app

Multi-node distributed training

  • Start from multi-node-torchrun.yaml or multi-node-mpi.yaml.
  • Treat helper services as primary-node-only and the distributed job as the single allocation-wide step.
  • Use allocation metadata such as HPC_COMPOSE_PRIMARY_NODE instead of Docker-style service discovery.

Checkpoint and resume workflows

  • Start from training-checkpoints.yaml when you only need artifact output.
  • Start from training-resume.yaml when the run should resume from shared storage across retries or later submissions.
  • Keep the canonical resume source in x-slurm.resume.path, not in exported artifact bundles.

LLM serving workflows

Debug cluster readiness

  • Run hpc-compose validate -f compose.yaml.
  • Run hpc-compose inspect --verbose -f compose.yaml.
  • Run hpc-compose preflight -f compose.yaml.
  • Read the troubleshooting sections in Runbook.

Cache and artifact management

  • Use hpc-compose cache list to inspect imported/prepared artifacts.
  • Use hpc-compose cache inspect -f compose.yaml to see per-service reuse expectations.
  • Use hpc-compose artifacts -f compose.yaml after a run to export tracked payloads.

Automation and scripting with JSON output

  • Prefer --format json for machine-readable output on validate, render, prepare, preflight, inspect, status, stats, artifacts, and cache subcommands.
  • Use hpc-compose stats --format jsonl or --format csv when downstream tooling wants row-oriented metrics.
  • Treat --json as a compatibility alias on older machine-readable commands; new automation should prefer --format json.

Execution model

This page explains the few runtime rules that matter most when a Compose mental model meets Slurm, Enroot, and Pyxis.

What runs where

StageWhere it runsWhat happens
validate, inspect, preflightlogin node or local shellParse the spec, resolve paths, and check prerequisites
preparelogin node or local shell with Enroot accessImport base images and build prepared runtime artifacts
submitlogin node or local shell with Slurm accessRun preflight, prepare missing artifacts, render the batch script, and call sbatch
Batch script and servicescompute-node allocationLaunch the planned services through srun and Pyxis
status, stats, logs, artifactslogin node or local shellRead tracked metadata and job outputs after submission

The main consequence is simple: image preparation and validation happen before the job starts, but the containers themselves run later inside the Slurm allocation.

Which paths must be shared

  • x-slurm.cache_dir must be visible from both the login node and the compute nodes.
  • Relative host paths in volumes, local image paths, and x-enroot.prepare.mounts resolve against the compose file directory.
  • Each submitted job writes tracked state under ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID} on the host.
  • That per-job directory is mounted into every container at /hpc-compose/job.
  • Multi-node jobs also populate /hpc-compose/job/allocation/{primary_node,nodes.txt} and export HPC_COMPOSE_PRIMARY_NODE, HPC_COMPOSE_NODE_COUNT, HPC_COMPOSE_NODELIST, and HPC_COMPOSE_NODELIST_FILE.

Use /hpc-compose/job for small shared state inside the allocation, such as ready files, request payloads, logs, metrics, or teardown signals.

Warning

Do not put x-slurm.cache_dir under /tmp, /var/tmp, /private/tmp, or /dev/shm. Those paths are not safe for login-node prepare plus compute-node reuse.

Networking inside the allocation

  • Single-node services share the host network on one node.
  • In a multi-node job, helper services stay on the allocation’s primary node by default.
  • The one distributed service spans the full allocation and must use explicit non-localhost coordination.
  • ports, custom Docker networks, and service-name DNS are not part of the model.
  • Use depends_on plus readiness when a dependent service must wait for real availability rather than process start.

Use 127.0.0.1 only when both sides are intentionally on the same node. For multi-node distributed runs, derive rendezvous addresses from the allocation metadata files or environment variables instead of relying on localhost.

If a service binds its TCP port before it is actually ready, prefer HTTP or log-based readiness over plain TCP readiness.

volumes vs x-enroot.prepare

MechanismUse it forWhen it is appliedReuse behavior
volumesfast-changing source code, model directories, input data, checkpoint pathsat runtime inside the allocationreads live host content every normal run
x-enroot.prepare.commandsslower-changing dependencies, tools, and image customizationbefore submission on the login nodecached until the prepared artifact changes

Recommended default:

  • keep active source trees in volumes
  • keep slower-changing dependency installation in x-enroot.prepare.commands
  • use prepare.mounts only when the prepare step truly needs host files

Warning

If a mounted file is a symlink, the symlink target must also be visible from inside the mounted directory. Otherwise the path can exist on the host but fail inside the container.

Command vocabulary

  • The normal run is hpc-compose submit –watch -f compose.yaml.
  • The debugging flow is validate, inspect, preflight, and prepare run separately when you need more visibility.

Read Runbook for the operational workflow, Examples for starting points, and Spec reference for exact field behavior.

Runbook

This runbook is for adapting hpc-compose to a real workload on a Slurm cluster with Enroot and Pyxis.

Commands below assume hpc-compose is on your PATH. If you are running from a local checkout, replace hpc-compose with target/release/hpc-compose.

All commands accept -f / --file to specify the compose spec path. When omitted, it defaults to compose.yaml in the current directory. (The cache prune --all-unused subcommand requires -f explicitly.)

Read the Execution model page first if you are still orienting on login-node prepare, compute-node runtime, shared cache paths, or localhost networking.

Before you start

Make sure you have:

  • a login node with enroot, srun, and sbatch available,
  • scontrol available when you request x-slurm.nodes > 1,
  • Pyxis support in srun (srun --help should mention --container-image),
  • a shared filesystem path for x-slurm.cache_dir,
  • any required local source trees or local .sqsh images in place,
  • registry credentials available if your cluster or registry requires them.

Command cadence

Command or stepWhen to use it
install or build hpc-composeonce per checkout or upgrade
init or copy a shipped exampleonce per new spec
validate and inspectearly while adapting a spec
submit --watchnormal run
preflight, prepare, renderfirst-time cluster setup checks or the debugging flow

Normal progression

For a new spec on a real cluster:

  1. Run hpc-compose init --template <name> --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml, or copy the closest shipped example.
  2. Set x-slurm.cache_dir if you need an explicit shared cache path, and adjust any cluster-specific resource settings.
  3. Run hpc-compose validate -f compose.yaml and hpc-compose inspect --verbose -f compose.yaml while you are still adapting the file.
  4. Run hpc-compose submit --watch -f compose.yaml for the normal run.
  5. If that fails, or if you need more visibility, break out preflight, prepare, render, status, stats, or logs separately.

Pick a starting example

ExampleUse it when you needFile
Dev appmounted source tree plus a small prepare stepexamples/dev-python-app.yaml
Redis worker stackmulti-service launch ordering and readiness checksexamples/app-redis-worker.yaml
LLM curl workflowone GPU-backed LLM plus a one-shot curl request from a second serviceexamples/llm-curl-workflow.yaml
LLM curl workflow (home)the same request flow, but anchored under $HOME/models for direct use on a login nodeexamples/llm-curl-workflow-workdir.yaml
GPU-backed appone GPU service plus a dependent applicationexamples/llama-app.yaml
llama.cpp + uv workerllama.cpp serving plus a source-mounted Python worker run through uvexamples/llama-uv-worker.yaml
Minimal batchsimplest single-service batch jobexamples/minimal-batch.yaml
Multi-node MPIone helper on the primary node plus one allocation-wide distributed stepexamples/multi-node-mpi.yaml
Multi-node torchrunallocation-wide GPU training with the primary node as rendezvousexamples/multi-node-torchrun.yaml
Training checkpointsGPU training with checkpoints to shared storageexamples/training-checkpoints.yaml
Training resumeGPU training with a shared resume directory and attempt-aware checkpointsexamples/training-resume.yaml
Postgres ETLPostgreSQL plus a Python data processing jobexamples/postgres-etl.yaml
vLLM servingvLLM with an in-job Python clientexamples/vllm-openai.yaml
vLLM + uv workervLLM serving with a source-mounted Python worker run through uvexamples/vllm-uv-worker.yaml
MPI helloMPI hello world with Open MPIexamples/mpi-hello.yaml
Multi-stage pipelinetwo-stage pipeline with file-based handoffexamples/multi-stage-pipeline.yaml
Data preprocessingCPU-heavy NLP preprocessing pipelineexamples/fairseq-preprocess.yaml

The fastest path is usually to copy the closest example and adapt it instead of starting from scratch.

You can also let hpc-compose scaffold one of these examples directly:

hpc-compose init --template dev-python-app --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml

1. Choose x-slurm.cache_dir early

Set x-slurm.cache_dir to a path that is visible from both the login node and the compute nodes.

x-slurm:
  cache_dir: /shared/$USER/hpc-compose-cache

Rules:

  • Do not use /tmp, /var/tmp, /private/tmp, or /dev/shm.
  • If you leave cache_dir unset, the default is $HOME/.cache/hpc-compose.
  • The default is convenient for small or home-directory workflows, but a shared project or workspace path is usually safer on real clusters.
  • The important constraint is visibility: prepare runs on the login node, but the batch job later reuses those cached artifacts from compute nodes.

2. Adapt the example to your workload

Start with the nearest example and then change:

  • image
  • command / entrypoint
  • volumes
  • environment
  • x-slurm resource settings
  • x-enroot.prepare commands for dependencies or tooling

Recommended pattern:

  • Put fast-changing application code in volumes.
  • Put slower-changing dependency installation in x-enroot.prepare.commands.
  • Add readiness to any service that other services truly depend on.

3. Validate the spec

hpc-compose validate -f compose.yaml

Use validate first when you are changing:

  • field names,
  • depends_on shape,
  • command / entrypoint form,
  • path values,
  • x-slurm / x-enroot blocks.

If validate fails, fix that before doing anything more expensive.

4. Inspect the normalized plan

hpc-compose inspect -f compose.yaml
hpc-compose inspect --verbose -f compose.yaml

Check:

  • service order,
  • allocation geometry and each service’s step geometry,
  • how images were normalized,
  • final host-to-container mount mappings,
  • resolved environment values,
  • where runtime artifacts will live,
  • whether the planner expects a cache hit or miss,
  • whether a prepared image will rebuild on every submit because prepare.mounts are present.

inspect is the quickest way to confirm that the planner understood your spec the way you intended. inspect --verbose is a debugging-oriented view and can print secrets from resolved environment values.

5. Normal run: submit the job and watch it

hpc-compose submit --watch -f compose.yaml

submit does the normal end-to-end flow:

  1. run preflight unless --no-preflight is set,
  2. prepare images unless --skip-prepare is set,
  3. render the script,
  4. call sbatch.

With --watch, submit also:

  1. records the tracked job metadata under .hpc-compose/,
  2. polls scheduler state with squeue / sacct when available,
  3. streams tracked service logs as they appear.

Note

submit treats preflight warnings as non-fatal. If you want warnings to block submission, run preflight --strict separately before submit.

Useful options:

  • --script-out path/to/job.sbatch keeps a copy of the rendered script.
  • When --script-out is omitted, the script is written to <compose-file-dir>/hpc-compose.sbatch.
  • --force-rebuild refreshes imported and prepared artifacts during submit.
  • --skip-prepare reuses existing prepared artifacts.
  • --keep-failed-prep keeps the Enroot rootfs around when a prepare step fails.

For the shipped examples, submit --watch is usually the only command you need in the normal run. Use the other commands when you need more visibility into planning, environment checks, image preparation, tracked job state, or the generated script.

6. Run preflight checks when you need to debug cluster readiness

hpc-compose preflight -f compose.yaml
hpc-compose preflight --verbose -f compose.yaml

preflight checks:

  • required binaries (enroot, srun, sbatch),
  • scontrol when x-slurm.nodes > 1,
  • Pyxis container support in srun,
  • cache directory policy and writability,
  • local mount and image paths,
  • registry credentials,
  • skip-prepare reuse safety when relevant.

If your cluster installs these tools in non-standard locations, pass explicit paths:

hpc-compose preflight -f compose.yaml --enroot-bin /opt/enroot/bin/enroot --srun-bin /usr/local/bin/srun --sbatch-bin /usr/local/bin/sbatch

The same override flags (--enroot-bin, --srun-bin, --sbatch-bin) are available on prepare and submit.

Use strict mode if you want warnings to fail the command:

hpc-compose preflight -f compose.yaml --strict

7. Prepare images on the login node when needed

hpc-compose prepare -f compose.yaml

Use this when you want to:

  • build or refresh prepared images before submission,
  • confirm cache reuse behavior,
  • debug preparation separately from job submission.

Force a refresh of imported and prepared artifacts:

hpc-compose prepare -f compose.yaml --force

8. Render the batch script when you need to inspect it

hpc-compose render -f compose.yaml --output /tmp/job.sbatch

This is useful when:

  • debugging generated srun arguments,
  • checking mounts and environment passing,
  • reviewing the launch order and readiness waits.

9. Read logs and submission output

After a successful submit, hpc-compose prints:

  • the rendered script path,
  • the cache directory,
  • one log path per service.
  • the tracked metadata location when a numeric Slurm job id was returned.

Use the tracked helpers for later inspection:

hpc-compose status -f compose.yaml
hpc-compose stats -f compose.yaml
hpc-compose stats -f compose.yaml --format csv
hpc-compose stats -f compose.yaml --format jsonl
hpc-compose artifacts -f compose.yaml
hpc-compose artifacts -f compose.yaml --bundle checkpoints --tarball
hpc-compose cancel -f compose.yaml
hpc-compose logs -f compose.yaml
hpc-compose logs -f compose.yaml --service app --follow

status also reports the tracked top-level batch log path so early job failures are visible even when a service log was never created. When services.<name>.x-slurm.failure_policy is used, status includes per-service policy state (failure_policy, restart counters, and last exit code) from tracked runtime state.

For multi-node jobs, status also reports tracked placement geometry (placement_mode, nodes, task counts, and expanded nodelist) for each service.

stats now prefers sampler data from ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/metrics when x-slurm.metrics is enabled. In v1 that sampler can collect:

  • GPU snapshots and compute-process rows through nvidia-smi
  • job-step CPU and memory snapshots through sstat

If the sampler is absent, disabled, or only partially available, stats falls back to live sstat. It works best for running jobs, requires the cluster’s jobacct_gather plugin to be enabled for Slurm-side step metrics, and only shows GPU accounting fields from Slurm when the cluster exposes GPU TRES accounting.

In multi-node v1, GPU sampler collection remains primary-node-only. Slurm step metrics still cover the whole step through sstat, but nvidia-smi fan-in across nodes is intentionally out of scope.

Use --format json, --format csv, or --format jsonl when you want machine-friendly output for dashboards, plotting, or experiment tracking. --format json is the preferred interface for validate, render, prepare, preflight, inspect, status, stats, artifacts, and cache subcommands. --json remains supported as a compatibility alias on older machine-readable commands.

Runtime logs live under:

${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/logs/<service>.log

That same per-job directory is also mounted inside every container at /hpc-compose/job. Use it for small cross-service coordination files when a workflow needs shared ephemeral state.

When metrics sampling is enabled, the job also writes:

${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/metrics/
  meta.json
  gpu.jsonl
  gpu_processes.jsonl
  slurm.jsonl

Collector failures are best-effort: missing nvidia-smi, missing sstat, or unsupported queries do not fail the batch job itself.

When x-slurm.artifacts is enabled, teardown collection writes:

${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/artifacts/
  manifest.json
  payload/...

Use hpc-compose artifacts -f compose.yaml after the job finishes to copy the collected payload into the configured x-slurm.artifacts.export_dir. The export path is resolved relative to the compose file and expands ${SLURM_JOB_ID} from tracked metadata.

If the compose file defines named bundles under x-slurm.artifacts.bundles, hpc-compose artifacts --bundle <name> exports only the selected bundle(s). Named bundles are written under <export_dir>/bundles/<bundle>/, and every export writes provenance JSON under <export_dir>/_hpc-compose/bundles/<bundle>.json. Add --tarball to also create <bundle>.tar.gz archives during export. The bundle name default is reserved for top-level x-slurm.artifacts.paths.

Slurm may also write a top-level batch log such as slurm-<jobid>.out, or to the path configured with x-slurm.output. Check that file first when the job fails before any service log appears.

Service names containing non-alphanumeric characters are encoded in the log filename. For example, a service named my.app produces my_x2e_app.log. Prefer [a-zA-Z0-9_-] in service names for readability.

If you used --script-out, keep that script with the job logs when debugging cluster behavior.

When x-slurm.resume is enabled, hpc-compose also:

  • mounts the shared resume path into every service at /hpc-compose/resume,
  • injects HPC_COMPOSE_RESUME_DIR, HPC_COMPOSE_ATTEMPT, and HPC_COMPOSE_IS_RESUME,
  • writes attempt-specific runtime outputs under .hpc-compose/<jobid>/attempts/<attempt>/,
  • keeps .hpc-compose/<jobid>/{logs,metrics,artifacts,state.json} pointed at the latest attempt for compatibility.

Use the shared resume directory for the canonical checkpoint a restarted run should load next. Treat exported artifacts as retrieval and provenance output after the attempt finishes, not as the primary live resume source.

10. Inspect and prune cache artifacts

List cached artifacts:

hpc-compose cache list

Inspect cache state for the current plan:

hpc-compose cache inspect -f compose.yaml

Inspect a single service:

hpc-compose cache inspect -f compose.yaml --service app

Prune old entries by age (in days):

hpc-compose cache prune --age 14

Prune artifacts not referenced by the current plan:

hpc-compose cache prune --all-unused -f compose.yaml

The two strategies (--age and --all-unused) are mutually exclusive — pick one per invocation.

Use cache inspect when you need to answer questions such as:

  • which artifact is being reused,
  • whether a prepared image came from a cached manifest,
  • whether a service rebuilds on every submit because of prepare mounts.

After upgrading hpc-compose

Cache keys include the tool version, so upgrading hpc-compose invalidates all existing cached artifacts. You will see a full rebuild on the next prepare or submit. To clean up orphaned artifacts after an upgrade:

hpc-compose cache prune --age 0

What changed and what should I run?

If you changed…Typical next step
YAML planning/runtime settings onlyhpc-compose validate -f compose.yaml, hpc-compose inspect --verbose -f compose.yaml, then hpc-compose submit --watch -f compose.yaml
The base image, x-enroot.prepare.commands, or prepare envhpc-compose submit --watch --force-rebuild -f compose.yaml for the normal run, or hpc-compose prepare --force -f compose.yaml when debugging prepare separately
Only mounted runtime source such as app code under volumesUsually just hpc-compose submit --watch -f compose.yaml
Cache entries you no longer want and this plan does not referencehpc-compose cache prune --all-unused -f compose.yaml
hpc-compose itselfExpect cache misses on the next prepare or submit, then optionally prune old entries

Decision guide

When should I use volumes?

Use volumes for source code or other files you edit frequently.

When should I use x-enroot.prepare.commands?

Use prepare commands for slower-changing dependencies, tools, or image customization that you want baked into a cached runtime image.

When should I use --skip-prepare?

Only when the prepared artifact already exists and you want to reuse it. preflight can warn or fail if reuse is unsafe.

When should I use --force-rebuild or prepare --force?

Use them after changing:

  • the base image,
  • prepare commands,
  • prepare environment,
  • tooling or dependencies that should invalidate the cached runtime image.

When should I manually run enroot remove?

Treat manual enroot remove as a rare last resort.

Use it only when Enroot state is clearly broken or inconsistent and hpc-compose prepare --force plus cache pruning did not fix the problem. In the normal rebuild or refresh path, prefer submit --force-rebuild, prepare --force, and cache prune so hpc-compose stays in charge of artifact state.

Why does my service rebuild every time?

If x-enroot.prepare.mounts is non-empty, that service intentionally rebuilds on every prepare / submit.

Troubleshooting

required binary '...' was not found

Run on a node with the Slurm client tools and Enroot available, or pass the explicit binary path with --enroot-bin, --srun-bin, or --sbatch-bin.

srun does not advertise --container-image

Pyxis support appears unavailable on that node. Move to a supported login node or cluster environment.

Cache directory errors or warnings

  • Errors usually mean the path is not shared or not writable.
  • A warning under $HOME means the path may work on some clusters, but a shared workspace or project path is safer because prepare happens on the login node and runtime happens on compute nodes.

Missing local mount or image paths

Remember that relative paths resolve from the compose file directory, not from the shell’s current working directory.

A mounted file exists on the host but not inside the container

This is often a symlink issue. If you mount a directory such as $HOME/models:/models and model.gguf is a symlink whose target lives outside $HOME/models, the target may not be visible inside the container. Copy the real file into the mounted directory or mount the directory that contains the symlink target.

Warning

The mount itself can succeed while the symlink target is still invisible inside the container. Check the target path, not just the link path.

Anonymous pull or registry credential warnings

Add the required credentials before relying on private registries or heavily rate-limited public registries.

Services start in the wrong order

Use depends_on with condition: service_healthy when a dependent must wait for a dependency’s readiness probe. Plain list form still means service_started.

When a TCP port opens before the service is fully usable, prefer HTTP or log-based readiness over TCP readiness.

Preview a submission without running sbatch

Use submit --dry-run to run the full pipeline (preflight, prepare, render) without actually calling sbatch. The rendered script is written to disk so you can inspect it:

hpc-compose submit --dry-run -f compose.yaml

Combine with --skip-prepare for a pure validation-and-render dry run.

Clean up old job directories

Tracked job metadata and logs accumulate in .hpc-compose/. Use clean to remove old entries:

# Remove jobs older than 7 days
hpc-compose clean -f compose.yaml --age 7

# Remove all except the latest tracked job
hpc-compose clean -f compose.yaml --all

Shell completions

Generate completions for your shell and source them:

# bash
hpc-compose completions bash > ~/.local/share/bash-completion/completions/hpc-compose

# zsh
hpc-compose completions zsh > ~/.zfunc/_hpc-compose

# fish
hpc-compose completions fish > ~/.config/fish/completions/hpc-compose.fish

Examples

These examples are the fastest way to understand the intended hpc-compose workflows and adapt them to a real application.

For almost every example, the normal run is:

hpc-compose submit --watch -f examples/<example>.yaml

Use the debugging flow (validate, inspect, preflight, prepare) when you are wiring up the example for the first time or isolating a failure.

If you want one of these files written straight to your working directory, use:

hpc-compose init --template dev-python-app --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml

Example matrix

ExampleWhat it demonstratesWhen to start from it
app-redis-worker.yamlMultiple services, depends_on, and TCP readiness checksYou need service startup ordering or a small multi-service stack
dev-python-app.yamlMounted source code plus x-enroot.prepare.commands for dependenciesYou want an iterative development workflow
llm-curl-workflow.yamlEnd-to-end LLM request flow with a login-node prepare step and a curl clientYou want the smallest concrete inference workflow
llm-curl-workflow-workdir.yamlSame LLM workflow, but anchored under $HOME/models for direct use on a login nodeYou want the lowest-overhead path from a login-node home directory
llama-app.yamlGPU-backed service, mounted model files, dependent app serviceYou need accelerator resources or a model-serving pattern
llama-uv-worker.yamlllama.cpp serving plus a source-mounted Python worker executed through uvYou want the GGUF server + mounted worker pattern
minimal-batch.yamlSingle service, no dependencies, no GPU, no prepareYou want the simplest possible starting point
multi-node-mpi.yamlOne primary-node helper plus one allocation-wide distributed CPU stepYou want a minimal multi-node pattern without adding orchestration
multi-node-torchrun.yamlAllocation-wide torchrun launch using the primary node as rendezvousYou want a multi-node GPU training starting point
training-checkpoints.yamlGPU training with checkpoints written to shared storageYou need a batch training workflow with artifact collection
training-resume.yamlGPU training with a shared resume directory and attempt-aware checkpointsYou need restart-safe checkpoint semantics across requeues or repeated submissions
postgres-etl.yamlPostgreSQL plus a Python data processing jobYou need a database-backed batch pipeline
vllm-openai.yamlvLLM serving with an in-job Python clientYou want vLLM-based inference instead of llama.cpp
vllm-uv-worker.yamlvLLM serving plus a source-mounted Python worker executed through uvYou want a common LLM stack with mounted app code
mpi-hello.yamlMPI hello world compiled and run with Open MPIYou need an MPI workload
multi-stage-pipeline.yamlTwo-stage pipeline coordinating through the shared job mountYou need file-based stage-to-stage handoff
fairseq-preprocess.yamlCPU-heavy NLP data preprocessing with parallel workersYou need a CPU-bound data preprocessing pipeline

Which example should I start from?

Companion notes for the more involved examples live alongside the example assets:

Adaptation checklist

  1. Copy the closest example to your own compose.yaml, or run hpc-compose init --template <name> --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml.
  2. Set x-slurm.cache_dir to a path visible from both the login node and the compute nodes.
  3. Replace the example image, command, environment, and volumes with your workload.
  4. Keep active source in volumes and keep slower-changing dependency installation in x-enroot.prepare.commands.
  5. Add readiness to services that must be reachable before dependents continue.
  6. Adjust top-level or per-service x-slurm settings for your cluster.
  7. Run the debugging flow before the first submit when you need to confirm planning, prerequisites, or cache behavior.

Spec reference

This page describes the Compose subset that hpc-compose accepts today. Unknown or unsupported fields are rejected unless this page explicitly says otherwise.

Top-level shape

name: demo
version: "3.9"

x-slurm:
  time: "00:30:00"
  cache_dir: /shared/$USER/hpc-compose-cache

services:
  app:
    image: python:3.11-slim
    command: python -m main

Top-level fields

FieldShapeDefaultNotes
namestringomittedUsed as the Slurm job name when x-slurm.job_name is not set.
versionstringomittedAccepted for Compose compatibility. Ignored by the planner.
servicesmappingrequiredMust contain at least one service.
x-slurmmappingomittedTop-level Slurm settings and shared runtime defaults.

x-slurm

These fields live under the top-level x-slurm block.

FieldShapeDefaultNotes
job_namestringname when presentRendered as #SBATCH --job-name.
partitionstringomittedPassed through to #SBATCH --partition.
accountstringomittedPassed through to #SBATCH --account.
qosstringomittedPassed through to #SBATCH --qos.
timestringomittedPassed through to #SBATCH --time.
nodesintegeromittedSlurm allocation node count. Defaults to 1 when omitted.
ntasksintegeromittedPassed through to #SBATCH --ntasks.
ntasks_per_nodeintegeromittedPassed through to #SBATCH --ntasks-per-node.
cpus_per_taskintegeromittedTop-level Slurm CPU request.
memstringomittedPassed through to #SBATCH --mem.
gresstringomittedPassed through to #SBATCH --gres.
gpusintegeromittedUsed only when gres is not set.
constraintstringomittedPassed through to #SBATCH --constraint.
outputstringomittedPassed through to #SBATCH --output.
errorstringomittedPassed through to #SBATCH --error.
chdirstringomittedPassed through to #SBATCH --chdir.
cache_dirstring$HOME/.cache/hpc-composeMust resolve to shared storage visible from the login node and the compute nodes.
metricsmappingomittedEnables runtime metrics sampling.
artifactsmappingomittedEnables tracked artifact collection and export metadata.
resumemappingomittedEnables checkpoint-aware resume semantics with a shared host path mounted into every service.
setuplist of stringsomittedRaw shell lines inserted into the generated batch script before service launches.
submit_argslist of stringsomittedExtra raw Slurm arguments appended as #SBATCH ... lines.

x-slurm.setup

x-slurm:
  setup:
    - module load enroot
    - source /shared/env.sh
  • Shape: list of strings
  • Default: omitted
  • Notes:
    • Each line is emitted verbatim into the generated bash script.
    • The script runs under set -euo pipefail.
    • Shell quoting and escaping are the user’s responsibility.

x-slurm.submit_args

x-slurm:
  submit_args:
    - "--mail-type=END"
    - "--mail-user=user@example.com"
    - "--reservation=gpu-reservation"
  • Shape: list of strings
  • Default: omitted
  • Notes:
    • Each entry is emitted as #SBATCH {arg}.
    • Entries are not validated against Slurm option syntax.

x-slurm.cache_dir

  • Shape: string
  • Default: $HOME/.cache/hpc-compose
  • Notes:
    • Relative paths and environment variables are resolved against the compose file directory.
    • Paths under /tmp, /var/tmp, /private/tmp, and /dev/shm are rejected.
    • The path must be visible from both the login node and the compute nodes.

Multi-node placement rules

  • x-slurm.nodes > 1 reserves a multi-node allocation.
  • Multi-node v1 supports at most one distributed service spanning the full allocation.
  • Helper services remain single-node steps and are pinned to the allocation’s primary node.
  • When a multi-node job has exactly one service, that service defaults to the distributed full-allocation step.
  • Distributed services may use readiness.type: sleep or readiness.type: log, or TCP/HTTP readiness only with an explicit non-local host or URL.

x-slurm.metrics

x-slurm:
  metrics:
    interval_seconds: 5
    collectors: [gpu, slurm]
  • Shape: mapping
  • Default: omitted
  • Notes:
    • Omitting the block disables runtime metrics sampling.
    • If the block is present and enabled is omitted, metrics sampling is enabled.
    • interval_seconds defaults to 5 and must be at least 1.
    • collectors defaults to [gpu, slurm].
    • Supported collectors:
      • gpu samples device and process telemetry through nvidia-smi
      • slurm samples job-step CPU and memory data through sstat
    • In multi-node v1, gpu sampling remains primary-node-only; slurm sampling still observes the full distributed step through sstat.
    • Sampler files are written under ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/metrics on the host and are also visible inside containers at /hpc-compose/job/metrics.
    • Collector failures are best-effort and do not fail the batch job.

x-slurm.artifacts

x-slurm:
  artifacts:
    collect: always
    export_dir: ./results/${SLURM_JOB_ID}
    paths:
      - /hpc-compose/job/metrics/**
    bundles:
      checkpoints:
        paths:
          - /hpc-compose/job/checkpoints/*.pt
  • Shape: mapping
  • Default: omitted
  • Notes:
    • Omitting the block disables tracked artifact collection.
    • collect defaults to always. Supported values are always, on_success, and on_failure.
    • export_dir is required and is resolved relative to the compose file directory when hpc-compose artifacts runs.
    • ${SLURM_JOB_ID} is preserved in export_dir until hpc-compose artifacts expands it from tracked metadata.
    • paths remains supported as the implicit default bundle.
    • bundles is optional. Bundle names must match [A-Za-z0-9_-]+, and default is reserved for top-level paths.
    • At least one source path must be present in paths or bundles.
    • Every source path must be an absolute container-visible path rooted at /hpc-compose/job.
    • Paths under /hpc-compose/job/artifacts are rejected.
    • Collection happens during batch teardown and is best-effort.
    • Collected payloads and manifest.json are written under ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/artifacts/.
    • hpc-compose artifacts --bundle <name> exports only the selected bundle or bundles.
    • hpc-compose artifacts --tarball also writes one <bundle>.tar.gz archive per exported bundle.
    • Export writes per-bundle provenance metadata under <export_dir>/_hpc-compose/bundles/<bundle>.json.

x-slurm.resume

x-slurm:
  resume:
    path: /shared/$USER/runs/my-run
  • Shape: mapping
  • Default: omitted
  • Notes:
    • Omitting the block disables resume semantics.
    • path is required and must be an absolute host path.
    • /hpc-compose/... paths are rejected because path must point at shared host storage, not a container-visible path.
    • /tmp and /var/tmp technically validate, but preflight warns because those paths are not reliable resume storage.
    • When enabled, hpc-compose mounts path into every service at /hpc-compose/resume.
    • Services also receive HPC_COMPOSE_RESUME_DIR, HPC_COMPOSE_ATTEMPT, and HPC_COMPOSE_IS_RESUME.
    • The canonical resume source is the shared path, not exported artifact bundles.
    • Attempt-specific runtime state moves under ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/attempts/<attempt>/, and the top-level logs, metrics, artifacts, and state.json paths continue to point at the latest attempt for compatibility.

Allocation metadata inside services

Every service receives:

  • HPC_COMPOSE_PRIMARY_NODE
  • HPC_COMPOSE_NODE_COUNT
  • HPC_COMPOSE_NODELIST
  • HPC_COMPOSE_NODELIST_FILE

The same data is also written under /hpc-compose/job/allocation/primary_node and /hpc-compose/job/allocation/nodes.txt.

gres and gpus

When both gres and gpus are set at the same level, gres takes priority and gpus is ignored.

Service fields

FieldShapeDefaultNotes
imagestringrequiredCan be a remote image reference or a local .sqsh / .squashfs path.
commandstring or list of stringsomittedShell form or exec form.
entrypointstring or list of stringsomittedMust use the same form as command when both are present.
environmentmapping or list of KEY=VALUE stringsomittedBoth forms normalize to key/value pairs.
volumeslist of host_path:container_path stringsomittedRuntime bind mounts. Host paths resolve against the compose file directory.
working_dirstringomittedValid only when the service also has an explicit command or entrypoint.
depends_onlist or mappingomittedDependency list with service_started or service_healthy conditions.
readinessmappingomittedPost-launch readiness gate.
healthcheckmappingomittedCompose-compatible sugar for a subset of readiness. Mutually exclusive with readiness.
x-slurmmappingomittedPer-service Slurm overrides.
x-enrootmappingomittedPer-service Enroot preparation rules.

Image rules

Remote images

  • Any image reference without an explicit :// scheme is prefixed with docker://.
  • Explicit schemes are allowed only for docker://, dockerd://, and podman://.
  • Other schemes are rejected.
  • Shell variables in the image string are expanded at plan time.
  • Unset variables expand to empty strings.

Local images

  • Local image paths must point to .sqsh or .squashfs files.
  • Relative paths are resolved against the compose file directory.
  • Paths that look like build contexts are rejected.

command and entrypoint

Both fields accept either:

  • a string, interpreted as shell form
  • a list of strings, interpreted as exec form

Rules:

  • If both fields are present, they must use the same form.
  • Mixed string/array combinations are rejected.
  • If neither field is present, the image default entrypoint and command are used.
  • If working_dir is set, at least one of command or entrypoint must also be set.

environment

Accepted forms:

environment:
  APP_ENV: prod
  LOG_LEVEL: info
environment:
  - APP_ENV=prod
  - LOG_LEVEL=info

Rules:

  • List items must use KEY=VALUE syntax.
  • .env from the compose file directory is loaded automatically when present.
  • Shell environment variables override .env; .env fills only missing variables.
  • environment and x-enroot.prepare.env values support $VAR, ${VAR}, ${VAR:-default}, and ${VAR-default} interpolation.
  • Missing variables without defaults are errors.
  • Use $$ for a literal dollar sign in interpolated fields.
  • String-form shell snippets are still literal. For example, $PATH inside a string-form command is not expanded at plan time.

volumes

Accepted form:

volumes:
  - ./app:/workspace
  - /shared/data:/data

Rules:

  • Host paths are resolved against the compose file directory.
  • Runtime mounts are passed through srun --container-mounts=....
  • Every service also gets an automatic shared mount at /hpc-compose/job, backed by ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID} on the host.
  • /hpc-compose/job is reserved and cannot be used as an explicit volume destination.

Warning

If a mounted file is a symlink, the symlink target must also be visible from inside the mounted directory. Otherwise the path can exist on the host but fail inside the container.

depends_on

Accepted forms:

depends_on:
  - redis
depends_on:
  redis:
    condition: service_started
depends_on:
  redis:
    condition: service_healthy

Rules:

  • List form means condition: service_started.
  • Map form accepts condition: service_started and condition: service_healthy.
  • service_healthy requires the dependency service to define readiness.
  • service_started waits only for the dependency process to be launched and still alive.
  • service_healthy waits for the dependency readiness check to succeed.

readiness

Supported types:

Sleep

readiness:
  type: sleep
  seconds: 5
  • seconds is required.

TCP

readiness:
  type: tcp
  host: 127.0.0.1
  port: 6379
  timeout_seconds: 30
  • host defaults to 127.0.0.1.
  • timeout_seconds defaults to 60.

Log

readiness:
  type: log
  pattern: "Server started"
  timeout_seconds: 60
  • timeout_seconds defaults to 60.

HTTP

readiness:
  type: http
  url: http://127.0.0.1:8080/health
  status_code: 200
  timeout_seconds: 30
  • status_code defaults to 200.
  • timeout_seconds defaults to 60.
  • The readiness check polls the URL through curl.

healthcheck

healthcheck is accepted as migration sugar and is normalized into the readiness model.

services:
  redis:
    image: redis:7
    healthcheck:
      test: ["CMD", "nc", "-z", "127.0.0.1", "6379"]
      timeout: 30s

Rules:

  • healthcheck and readiness are mutually exclusive.
  • Supported probe forms are a constrained subset:
    • ["CMD", "nc", "-z", HOST, PORT]
    • ["CMD-SHELL", "nc -z HOST PORT"]
    • recognized curl probes against http:// or https:// URLs
    • recognized wget --spider probes against http:// or https:// URLs
  • timeout maps to timeout_seconds.
  • disable: true disables readiness for that service.
  • interval, retries, and start_period are parsed but rejected in v1.
  • HTTP-style healthchecks normalize to readiness.type: http with status_code: 200.

Service-level x-slurm

These fields live under services.<name>.x-slurm.

FieldShapeDefaultNotes
nodesintegeromitted1 for a helper step, or the full top-level allocation node count for the one distributed service.
ntasksintegeromittedAdds --ntasks to that service’s srun.
ntasks_per_nodeintegeromittedAdds --ntasks-per-node to that service’s srun.
cpus_per_taskintegeromittedAdds --cpus-per-task to that service’s srun.
gpusintegeromittedAdds --gpus when gres is not set.
gresstringomittedAdds --gres to that service’s srun. Takes priority over gpus.
extra_srun_argslist of stringsomittedAppended directly to the service’s srun command.
failure_policymappingomittedPer-service failure handling (fail_job, ignore, restart_on_failure).

services.<name>.x-slurm.failure_policy

services:
  worker:
    image: python:3.11-slim
    x-slurm:
      failure_policy:
        mode: restart_on_failure
        max_restarts: 3
        backoff_seconds: 5
FieldShapeDefaultNotes
modefail_job | ignore | restart_on_failurefail_jobfail_job keeps fail-fast behavior. ignore keeps the job running after non-zero exits. restart_on_failure restarts on non-zero exits only.
max_restartsinteger3 when mode=restart_on_failureRequired to be at least 1 after defaults are applied. Valid only for restart_on_failure.
backoff_secondsinteger5 when mode=restart_on_failureFixed delay between restart attempts. Required to be at least 1 after defaults are applied. Valid only for restart_on_failure.

Rules:

  • In a multi-node allocation, at most one service may resolve to distributed placement.
  • Distributed placement requires services.<name>.x-slurm.nodes to equal the top-level allocation node count when it is set explicitly.
  • Helper services in multi-node jobs are pinned to HPC_COMPOSE_PRIMARY_NODE.
  • max_restarts and backoff_seconds are rejected unless mode: restart_on_failure.
  • Restart attempts count relaunches after the initial launch.
  • Restarts trigger only for non-zero exits.
  • Services configured with mode: ignore cannot be used as dependencies in depends_on.

Unknown keys under top-level x-slurm or per-service x-slurm cause hard errors.

x-enroot.prepare

x-enroot.prepare lets a service build a prepared runtime image from its base image before submission.

services:
  app:
    image: python:3.11-slim
    x-enroot:
      prepare:
        commands:
          - pip install --no-cache-dir numpy pandas
        mounts:
          - ./requirements.txt:/tmp/requirements.txt
        env:
          PIP_CACHE_DIR: /tmp/pip-cache
        root: true
FieldShapeDefaultNotes
commandslist of stringsrequired when prepare is presentEach command runs via enroot start ... /bin/sh -lc ....
mountslist of host_path:container_path stringsomittedVisible only during prepare. Relative host paths resolve against the compose file directory.
envmapping or list of KEY=VALUE stringsomittedPassed only during prepare. Values support the same interpolation rules as environment.
rootbooleantrueControls whether prepare commands run with --root.

Rules:

  • If x-enroot.prepare is present, commands cannot be empty.
  • If prepare.mounts is non-empty, the service rebuilds on every prepare or submit.
  • Remote base images are imported under cache_dir/base.
  • Prepared images are exported under cache_dir/prepared.
  • Unknown keys under x-enroot or x-enroot.prepare cause hard errors.

Unsupported Compose keys

These keys are rejected with explicit messages:

  • build
  • ports
  • networks
  • network_mode
  • Compose restart (use services.<name>.x-slurm.failure_policy)
  • deploy

Any other unknown key at the service level is also rejected.

Supported Slurm Model

This page makes the hpc-compose Slurm boundary explicit. It is a tool for compiling one Compose-like application into one Slurm allocation with one or more containerized srun steps. It is not a general frontend for the full Slurm command surface.

First-class support

These capabilities are modeled, validated, and intentionally supported by the planner, renderer, and tracked-job workflow.

AreaSupport
Allocation modelOne Slurm allocation per application
Submission flowvalidate, inspect, preflight, prepare, render, submit, submit --watch
Tracked job workflowstatus, stats, logs, cancel, artifacts, clean, cache inspection/pruning
Top-level Slurm fieldsjob_name, partition, account, qos, time, nodes, ntasks, ntasks_per_node, cpus_per_task, mem, gres, gpus, constraint, output, error, chdir
Service step fieldsnodes, ntasks, ntasks_per_node, cpus_per_task, gres, gpus
Multi-node modelSingle-node jobs and constrained multi-node runs with at most one distributed service spanning the allocation
Runtime orchestrationdepends_on, readiness checks, service failure policies, primary-node helper placement
Container workflowRemote images, local .sqsh images, x-enroot.prepare, shared cache handling
Job trackingScheduler state via squeue/sacct, step stats via sstat, tracked logs, runtime state, metrics, artifacts, resume metadata

Raw pass-through

These capabilities are usable, but hpc-compose does not model or validate their semantics beyond passing them through to Slurm.

MechanismWhat it allows
x-slurm.submit_argsRaw #SBATCH ... lines for site-specific flags such as mail settings, reservations, or other submit-time options
services.<name>.x-slurm.extra_srun_argsRaw srun arguments for site-specific launch flags such as MPI or exclusivity settings
Existing reservationsJoining an already-created reservation through raw submit args is supported as pass-through

Pass-through is appropriate when a site-specific flag is useful but does not justify a first-class schema field. It is not a guarantee that hpc-compose understands the operational consequences of that flag.

Unsupported or out of scope

These capabilities are intentionally outside the product seam.

AreaStatus
Admin-plane Slurm managementOut of scope
sacctmgr account administrationOut of scope
Reservation creation or lifecycle managementOut of scope
Federation / multi-cluster controlOut of scope
Generic scontrol mutationOut of scope
Broad cluster inspection tools such as a full sinfo / sprio / sreport frontendOut of scope
Arbitrary multi-node orchestration or partial-node service placementNot supported in v1
Heterogeneous jobs and job arrays as first-class workflow conceptsNot supported in v1
Compose build, ports, custom networks, restart, deployNot supported

Non-goals

hpc-compose should not grow into a generic Slurm administration layer. In particular, it will not broaden into sacctmgr, reservation management, federation control, or generic scontrol mutation. Those are real Slurm features, but they do not fit the “one application, one allocation, tracked runtime workflow” seam this tool is built around.

Migrating from Docker Compose

This guide helps you convert an existing docker-compose.yaml into an hpc-compose spec for Slurm clusters with Enroot and Pyxis.

At a glance

Docker Compose featurehpc-compose equivalent
imageimage (same syntax, auto-prefixed with docker://)
commandcommand (string or list, same syntax)
entrypointentrypoint (string or list, same syntax)
environmentenvironment (map or list, same syntax)
volumesvolumes (host:container bind mounts, same syntax)
depends_ondepends_on (list or map with condition: service_started / service_healthy)
working_dirworking_dir (requires explicit command or entrypoint)
buildNot supported. Use image + x-enroot.prepare.commands instead.
portsNot supported. Use host networking semantics instead. 127.0.0.1 works only when both sides run on the same node.
networks / network_modeNot supported. There is no Docker-style overlay network or service-name DNS layer.
restartNot supported as a Compose key. Use services.<name>.x-slurm.failure_policy.
deployNot supported. Use x-slurm for resource allocation.
healthcheckSupported for a constrained TCP/HTTP subset and normalized into readiness; use explicit readiness for anything more complex.
Resource limits (cpus, mem_limit)Use x-slurm.cpus_per_task, x-slurm.mem, x-slurm.gpus

Side-by-side: web app + Redis

Docker Compose

version: "3.9"
services:
  redis:
    image: redis:7
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

  app:
    build: .
    ports:
      - "8000:8000"
    depends_on:
      redis:
        condition: service_healthy
    environment:
      REDIS_HOST: redis
    volumes:
      - ./app:/workspace
    working_dir: /workspace
    command: python -m main

hpc-compose

name: my-app

x-slurm:
  job_name: my-app
  time: "01:00:00"
  mem: 8G
  cpus_per_task: 4
  cache_dir: /shared/$USER/hpc-compose-cache

services:
  redis:
    image: redis:7
    command: redis-server --save "" --appendonly no
    readiness:
      type: tcp
      host: 127.0.0.1
      port: 6379
      timeout_seconds: 30

  app:
    image: python:3.11-slim
    depends_on:
      redis:
        condition: service_healthy
    environment:
      REDIS_HOST: 127.0.0.1
    volumes:
      - ./app:/workspace
    working_dir: /workspace
    command: python -m main
    x-enroot:
      prepare:
        commands:
          - pip install --no-cache-dir redis fastapi uvicorn

Key changes

  1. build: .image: python:3.11-slim + x-enroot.prepare.commands for dependencies.
  2. ports → Removed. Services communicate via 127.0.0.1 because they run on the same node.
  3. REDIS_HOST: redisREDIS_HOST: 127.0.0.1. No DNS service names; use localhost.
  4. healthcheckreadiness with type: tcp.
  5. Added x-slurm block for Slurm resource allocation (time, memory, CPUs).
  6. Added x-slurm.cache_dir for shared image storage.

Key differences

Networking

Docker Compose creates isolated networks where services find each other by name. In hpc-compose, helper services on the same node share the host network directly, and multi-node distributed steps must use explicit rendezvous addresses. Replace service hostnames with 127.0.0.1 only when both sides intentionally stay on one node. For multi-node runs, derive the rendezvous host from /hpc-compose/job/allocation/primary_node or HPC_COMPOSE_PRIMARY_NODE.

Building images

Docker Compose uses build: to run a Dockerfile. hpc-compose uses x-enroot.prepare.commands instead:

# Docker Compose
app:
  build:
    context: .
    dockerfile: Dockerfile

# hpc-compose
app:
  image: python:3.11-slim
  x-enroot:
    prepare:
      commands:
        - pip install --no-cache-dir -r /tmp/requirements.txt
      mounts:
        - ./requirements.txt:/tmp/requirements.txt

Prefer volumes for fast-changing source code and x-enroot.prepare.commands for slower-changing dependencies.

Health checks vs readiness

Docker Compose uses healthcheck with a test command, interval, timeout, and retries. hpc-compose now accepts a constrained healthcheck subset and normalizes it into readiness:

# TCP: wait for a port to accept connections
readiness:
  type: tcp
  host: 127.0.0.1
  port: 6379
  timeout_seconds: 30

# Log: wait for a pattern in service output
readiness:
  type: log
  pattern: "Server started"
  timeout_seconds: 60

# Sleep: fixed delay
readiness:
  type: sleep
  seconds: 5

Supported healthcheck migration patterns:

  • ["CMD", "nc", "-z", HOST, PORT]
  • ["CMD-SHELL", "nc -z HOST PORT"]
  • recognized curl probes against http:// or https:// URLs
  • recognized wget --spider probes against http:// or https:// URLs

Still unsupported in v1:

  • arbitrary custom command probes
  • interval
  • retries
  • start_period

Resource allocation

Docker Compose uses deploy.resources or top-level cpus/mem_limit. hpc-compose uses Slurm-native resource settings:

x-slurm:
  time: "02:00:00"
  mem: 32G
  cpus_per_task: 8
  gpus: 1

services:
  app:
    x-slurm:
      cpus_per_task: 4
      gpus: 1

Restart policies

Docker Compose supports restart: always, on-failure, etc. hpc-compose does not accept the Compose restart: key, but it does support per-service restart behavior through services.<name>.x-slurm.failure_policy.

services:
  app:
    image: python:3.11-slim
    x-slurm:
      failure_policy:
        mode: restart_on_failure
        max_restarts: 3
        backoff_seconds: 5

restart_on_failure retries only on non-zero exits. Use mode: fail_job (default) for fail-fast behavior, or mode: ignore for non-critical sidecars.

What to do about unsupported features

FeatureAlternative
buildUse image + x-enroot.prepare.commands. Mount build context files with x-enroot.prepare.mounts if needed.
portsNot needed. Services share 127.0.0.1 on one node.
networks / network_modeNot needed. All services are on the same host network.
restartUse services.<name>.x-slurm.failure_policy (fail_job, ignore, restart_on_failure).
deployUse x-slurm for resources.
Service DNS namesUse 127.0.0.1 for same-node helpers, or explicit host metadata such as HPC_COMPOSE_PRIMARY_NODE for distributed runs.
Named volumesUse host-path bind mounts in volumes.
.env fileSupported. .env in the compose file directory is loaded automatically.

Migration checklist

  1. Remove build: — Replace with image: pointing to a base image. Move dependency installation to x-enroot.prepare.commands.
  2. Remove ports: — Use host-network semantics instead of container port publishing.
  3. Remove networks: / network_mode: — There is no Docker-style overlay network or service-name DNS layer.
  4. Remove Compose restart: — use services.<name>.x-slurm.failure_policy when you need per-service restart behavior.
  5. Remove deploy: — Use x-slurm for resource allocation.
  6. Replace service hostnames — Change any service-name references (e.g. redis, postgres) to 127.0.0.1 for same-node helpers, or to explicit allocation metadata for distributed runs.
  7. Replace healthcheck: — Convert to readiness: with type: tcp, type: log, or type: sleep.
  8. Add x-slurm: — Set time, mem, cpus_per_task, and optionally gpus, partition, account.
  9. Set cache_dir — Point x-slurm.cache_dir to shared storage visible from login and compute nodes.
  10. Validate — Run hpc-compose validate -f compose.yaml to check the converted spec.
  11. Inspect — Run hpc-compose inspect --verbose -f compose.yaml to confirm the planner understood your intent.

Architecture for Contributors

The CLI is intentionally thin. Most behavior lives in the library crate so the binary, integration tests, and generated rustdoc all describe the same pipeline.

Module map

  • spec: parse, interpolate, and validate the supported Compose subset
  • planner: normalize the parsed spec into a deterministic plan
  • preflight: check login-node prerequisites and cluster policy issues
  • prepare: import base images and rebuild prepared runtime artifacts
  • render: generate the final sbatch script and service launch commands
  • job: track submissions, logs, metrics, status, and artifact export
  • cache: persist cache manifests for imported and prepared images
  • init: expose the shipped example templates for hpc-compose init

Execution flow

  1. ComposeSpec::load parses YAML, validates supported keys, interpolates variables, and applies semantic validation.
  2. planner::build_plan resolves paths, command shapes, dependencies, and prepare blocks into a normalized plan.
  3. prepare::build_runtime_plan computes concrete cache artifact locations.
  4. preflight::run checks cluster prerequisites before submission.
  5. prepare::prepare_runtime_plan imports or rebuilds artifacts when needed.
  6. render::render_script emits the batch script consumed by sbatch.
  7. job persists tracked metadata under .hpc-compose/ and powers status, stats, logs, cancel, and artifact export.

Contributor commands

cargo test
cargo test --test cli
cargo doc --no-deps
mdbook build docs

Documentation split

  • Use this mdBook for user-facing workflows, examples, and reference material.
  • Use rustdoc for contributor-facing internals and the library module map.
  • Keep README short and point readers into the book instead of duplicating long-form guidance.