Skip to content

Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

hpc-compose

Compose-style multi-service workflows, compiled into one inspectable Slurm job.

hpc-compose gives research and HPC teams a small YAML authoring model for services, startup order, readiness checks, runtime backends, logs, artifacts, and follow-up commands.

services:
  app:
    image: python:3.12-slim
    command: python train.py

$ hpc-compose plan --show-script -f compose.yaml
spec is valid
service order: app
#SBATCH --job-name=my-app

Use hpc-compose when you want Docker Compose-style authoring on Slurm without adding Kubernetes, a long-running control plane, or custom cluster-side services.

Start with the Support Matrix before planning a real runtime workflow. Linux is the maintained runtime target; macOS is intended for authoring, validation, rendering, and inspection.

Safe First Path

These commands are safe from a laptop, workstation, or login node because new writes a local starter spec and plan is purely static:

hpc-compose new --template minimal-batch --name my-app --output compose.yaml
hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml

For real cluster runs, configure a cache path visible from both the Slurm submission host and compute nodes, either in x-slurm.cache_dir, hpc-compose setup --cache-dir, or [defaults.cache] / [profiles.<name>.cache] settings. From a source checkout, you can also inspect the checked-in examples with hpc-compose plan -f examples/minimal-batch.yaml.

Expected output includes:

spec is valid
service order: app
Rendered script:

Run hpc-compose up -f compose.yaml only on a supported Linux Slurm submission host with the runtime backend your spec selects. If it fails, start with hpc-compose debug -f compose.yaml --preflight.

Download the asciinema-style quickstart demo cast if you want the same flow as a terminal recording.

Terms To Know

TermMeaning
specThe YAML file that describes services, runtime backend, and Slurm settings.
allocationThe Slurm job allocation where all planned services run.
runtime backendThe mechanism used to launch services: Pyxis/Enroot, Apptainer, Singularity, or host.
preflightChecks that inspect local tools, paths, backend support, and optional cluster profiles before a run.
prepareThe login-node image import/customization phase used before compute-node runtime.
tracked jobMetadata under .hpc-compose/<job-id>/ that lets status, ps, watch, logs, stats, and artifacts reconnect later.
x-slurmThe spec section for Slurm settings and hpc-compose runtime extensions.

What It Is For

  • model serving plus helper services inside one Slurm allocation
  • data and ETL pipelines with startup ordering or stage-completion dependencies
  • training jobs with checkpoint export, artifact tracking, and resume-aware reruns
  • explicit multi-node launch patterns that still fit inside one allocation

What It Is Not

hpc-compose is not a full Docker Compose runtime and is not a general cluster orchestrator.

Unsupported Compose features include:

  • build:
  • ports
  • networks / network_mode
  • Compose restart as a Docker key
  • deploy
  • dynamic node bin packing

For exact boundaries, read Execution Model, Supported Slurm Model, and Spec Reference.

  1. Quickstart for the shortest safe path.
  2. Examples to choose a starting spec.
  3. Runtime Backends before changing runtime.backend.
  4. Runbook when adapting a real workload on a cluster.
  5. Troubleshooting when the first cluster run fails.

Reference

Support Matrix

This page separates what hpc-compose can build, what CI currently exercises, and what is officially supported for real workflows.

Support levels

LevelMeaning
Officially supportedMaintained target for user-facing workflows and issue triage
CI-testedExercised in the repository’s automated checks today
Release-builtPrebuilt archive is published, but that is not a promise of full runtime support

Officially supported

PlatformScopeNotes
Linux x86_64Full CLI and runtime workflowsRequires Slurm client tools plus at least one supported runtime backend: Pyxis/Enroot, Apptainer, Singularity, or host software modules
Linux arm64Full CLI and runtime workflowsSame cluster requirements as Linux x86_64
macOS x86_64Authoring and local non-runtime commandsSuitable for project-local authoring flows such as new, setup, context, plan, validate, inspect, render, and completions; not for Slurm/Enroot runtime commands
macOS arm64Authoring and local non-runtime commandsSame scope as macOS x86_64

CI-tested

PlatformWhat is tested today
Ubuntu 24.04 x86_64formatting, clippy, unit/integration tests, docs build, link checks, installer smoke tests, and coverage
macOS arm64authoring-focused tests, validate/render/schema smoke tests, installer smoke tests, and Homebrew smoke tests
macOS x86_64authoring-focused tests, validate/render/schema smoke tests, and Homebrew smoke tests

Current CI validates full runtime-facing behavior on Ubuntu and authoring/distribution behavior on macOS. Other published builds should be treated as lower-confidence until corresponding CI coverage exists.

Release-built

PlatformStatus
Linux x86_64Release archive published
Linux arm64Release archive published
macOS x86_64Release archive published
macOS arm64Release archive published
Windows x86_64Release archive published, but runtime workflows are not officially supported

Windows status

Windows archives are published so users can inspect the CLI surface or experiment with non-runtime commands, but Windows is currently release-built only:

  • Slurm plus HPC runtime workflows are not an officially supported Windows target.
  • Issues that are specific to Windows runtime execution may be closed as out of scope until the support policy changes.

Cluster assumptions for full support

For full runtime support on Linux, the target environment should provide:

  • sbatch, srun, and related Slurm client tools on the submission host
  • one supported runtime path:
    • Pyxis container support in srun plus Enroot on the submission host,
    • Apptainer on the submission host and compute nodes,
    • Singularity on the submission host and compute nodes,
    • or module/vendor software available on the host runtime path
  • shared storage for the resolved cache directory

Use Runtime Backends, Runbook, and Execution Model before adapting a real workload to a cluster.

Installation

For normal use, install from a published GitHub Release. Build from source when you are developing the project or need to inspect a local checkout before using it on a cluster.

Install From A Published Release

Pick the release tag you want from the GitHub Releases page and pin it:

RELEASE_TAG=vX.Y.Z
curl -fsSL "https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/${RELEASE_TAG}/install.sh" \
  | env HPC_COMPOSE_VERSION="${RELEASE_TAG}" sh

The installer downloads the matching archive for the current Linux or macOS machine, verifies the published .sha256 sidecar, installs hpc-compose into ~/.local/bin by default, and installs shipped Unix manpages when present.

After installation, make sure the install directory is on your shell PATH and verify the binary:

export PATH="$HOME/.local/bin:$PATH"
command -v hpc-compose
hpc-compose --version

Useful overrides:

RELEASE_TAG=vX.Y.Z

curl -fsSL "https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/${RELEASE_TAG}/install.sh" \
  | env HPC_COMPOSE_INSTALL_DIR=/usr/local/bin HPC_COMPOSE_VERSION="$RELEASE_TAG" sh

Installer availability does not imply full runtime support. Check the Support Matrix before assuming a platform can run submission, prepare, or watch workflows end to end.

About The main Installer Script

Fetching install.sh from main without HPC_COMPOSE_VERSION does not install unreleased main:

curl -fsSL https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/main/install.sh | sh

That command runs the moving script from main, but the script resolves the latest published GitHub Release and downloads from releases/download/<tag>/.... Use the version-pinned command above for reproducible installs. Use a source checkout when you want unreleased code.

Manual Release Download

Prebuilt archives are published on the release page. Pick the archive that matches your platform.

Example for Linux x86_64:

RELEASE_TAG=vX.Y.Z

curl -L "https://github.com/NicolasSchuler/hpc-compose/releases/download/${RELEASE_TAG}/hpc-compose-${RELEASE_TAG}-x86_64-unknown-linux-musl.tar.gz" -o hpc-compose.tar.gz
tar -xzf hpc-compose.tar.gz
./hpc-compose --help

Linux x86_64 releases use a musl target to avoid common cluster glibc mismatches. Unix release archives also contain share/man/man1/.

Windows release archives are zip-only for inspection and checksum parity. The installer script and end-to-end Slurm runtime workflows target Unix-like systems; use Windows primarily through WSL or a remote Linux/macOS authoring environment.

Native Packages

Published Linux releases may include .deb and .rpm assets:

RELEASE_TAG=vX.Y.Z

sudo apt install "./hpc-compose-${RELEASE_TAG}-x86_64-unknown-linux-musl.deb"
sudo dnf install "./hpc-compose-${RELEASE_TAG}-x86_64-unknown-linux-musl.rpm"

Package availability does not change runtime support policy. Linux cluster workflows still need Slurm client tools, the selected runtime backend, and shared storage for the resolved cache directory.

Homebrew On macOS

The repository exposes a same-repo Homebrew tap:

brew install NicolasSchuler/hpc-compose/hpc-compose

The formula is refreshed by release automation when a Homebrew-published release is cut. Check brew info NicolasSchuler/hpc-compose/hpc-compose when you need to confirm the formula version before installing.

macOS support is for authoring and local non-runtime commands such as new, plan, validate, inspect, render, and completions; it is not a supported Slurm runtime target.

Verify A Release

Use GitHub-native verification as the primary trust path for published binaries.

  1. Verify the release:
RELEASE_TAG=vX.Y.Z
gh release verify "$RELEASE_TAG" -R NicolasSchuler/hpc-compose
  1. Verify a downloaded asset:
RELEASE_TAG=vX.Y.Z
ASSET="hpc-compose-${RELEASE_TAG}-x86_64-unknown-linux-musl.tar.gz"

gh release download "$RELEASE_TAG" -R NicolasSchuler/hpc-compose -p "$ASSET"
gh release verify-asset "$RELEASE_TAG" "./$ASSET" -R NicolasSchuler/hpc-compose
  1. Verify the artifact attestation directly:
gh attestation verify "./$ASSET" \
  --repo NicolasSchuler/hpc-compose \
  --signer-workflow NicolasSchuler/hpc-compose/.github/workflows/release.yml

Published releases also ship SHA256SUMS and per-asset .sha256 files. Those checksums are primarily for installer compatibility, mirroring, and corruption checks; attestations are the stronger authenticity signal.

Internal Mirrors And Cluster-Admin Installs

For internal mirrors, preserve release filenames exactly, including:

  • platform archives or native packages
  • SHA256SUMS
  • each per-asset .sha256 sidecar

Then point the installer at the mirrored base URL and pin the matching version:

RELEASE_TAG=vX.Y.Z
curl -fsSL "https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/${RELEASE_TAG}/install.sh" \
  | env HPC_COMPOSE_BASE_URL="https://mirror.example.org/hpc-compose/${RELEASE_TAG}" \
        HPC_COMPOSE_VERSION="$RELEASE_TAG" sh

HPC_COMPOSE_VERSION is required when HPC_COMPOSE_BASE_URL is set so the installer, mirrored assets, and checksum files stay aligned.

Build From Source

Use this path for development, unreleased testing, or local inspection:

git clone https://github.com/NicolasSchuler/hpc-compose.git
cd hpc-compose
cargo build --release
./target/release/hpc-compose --help

Before using a local build on a cluster workflow, validate the binary and one example spec:

env CACHE_DIR=/cluster/shared/hpc-compose-cache \
  target/release/hpc-compose validate -f examples/minimal-batch.yaml
env CACHE_DIR=/cluster/shared/hpc-compose-cache \
  target/release/hpc-compose plan --verbose -f examples/minimal-batch.yaml

Local Docs Commands

The repo ships two documentation layers:

  • mdbook for the user manual
  • cargo doc for contributor-facing crate internals

Useful commands:

mdbook build docs
mdbook serve docs
cargo doc --no-deps

Regenerate checked-in manpages from a checkout with:

cargo run --locked --features manpage-bin --bin gen-manpages
cargo test --locked --test release_metadata
man -l man/man1/hpc-compose.1

Quickstart

This is the shortest safe path from an empty shell to a static plan, a first real Slurm run, and one-command failure triage.

If Slurm terms such as sbatch, srun, allocation, job step, Pyxis, or Enroot are unfamiliar, read Slurm And Container Basics before the first real cluster run.

1. Install The CLI

For normal use, install from the latest published GitHub Release and pin the tag you selected:

RELEASE_TAG=vX.Y.Z
curl -fsSL "https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/${RELEASE_TAG}/install.sh" \
  | env HPC_COMPOSE_VERSION="${RELEASE_TAG}" sh

Replace vX.Y.Z with the published release tag shown on the release page.

The installer places hpc-compose in ~/.local/bin by default and verifies the release checksum sidecar before installing. Release verification, manual downloads, package-manager installs, and source-checkout builds are covered in Installation.

If your shell does not find the command immediately, add the default install directory to your PATH:

export PATH="$HOME/.local/bin:$PATH"
hpc-compose --version

2. Learn The Safe Authoring Path First

plan is the safe authoring command. It does not call sbatch, does not import images, and does not write a script file:

Create a starter spec first:

hpc-compose new \
  --template minimal-batch \
  --name my-app \
  --output compose.yaml

If you want a guided learning path instead of a single starter template, run the Spec Metamorphosis tutorial:

hpc-compose evolve --output compose.yaml

Then inspect the static plan:

hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml

Expected output includes:

spec is valid
service order: app

This is the right first path on macOS, a laptop, or any machine where you want to evaluate the authoring model before touching a real cluster. The same flow is also available as an asciinema-style demo cast, but the snippets above are the accessible reference output.

The normal workflow to remember is:

hpc-compose plan -f compose.yaml
hpc-compose up -f compose.yaml
hpc-compose debug -f compose.yaml --preflight

3. Choose A Starting Spec

Use the built-in starter templates when you want a fresh compose.yaml with your application name filled in:

hpc-compose new \
  --template minimal-batch \
  --name my-app \
  --output compose.yaml

Add --cache-dir '<shared-cache-dir>' when you want the generated file to include an explicit x-slurm.cache_dir. Otherwise the plan uses the active settings cache default or $HOME/.cache/hpc-compose.

From a source checkout, you can also inspect a known-good repository example:

hpc-compose plan -f examples/minimal-batch.yaml

The Examples page is the single selection guide for beginner, LLM, training, distributed, and pipeline workflows.

Use Spec Metamorphosis when you want to learn those concepts progressively in one evolving valid spec.

4. Pick And Test A Cache Directory

cache_dir is optional in the spec, but real clusters usually need a site-specific shared path because image preparation happens before the job starts and compute nodes must later see those artifacts.

Ask your cluster documentation or support team for a project scratch, work, or shared filesystem path, then test it:

export CACHE_DIR=/cluster/shared/hpc-compose-cache
mkdir -p "$CACHE_DIR"
test -w "$CACHE_DIR"

Persist it in project settings when you want the same value every time:

hpc-compose setup --profile-name dev --cache-dir "$CACHE_DIR" --default-profile dev --non-interactive

Or keep using an environment-backed explicit spec value and persist it next to your copied spec:

printf 'CACHE_DIR=%s\n' "$CACHE_DIR" > .env

Do not use /tmp, /var/tmp, /private/tmp, or /dev/shm for x-slurm.cache_dir. Validation may accept those strings, but preflight reports them as unsafe because prepare happens before runtime and compute nodes must later see the cached artifacts.

5. Before Your First Cluster Run

Command categoryWhere to run itRequired toolsNotes
Authoring: new, plan, validate, inspect, render, config, schemalaptop, workstation, or login nodehpc-composeplan is the recommended static pre-run check.
Prepare: prepareLinux host with selected runtime backendPyxis needs Enroot; Apptainer needs apptainer; Singularity needs singularity; host backend needs no container runtimeDoes not call sbatch, but needs runtime tools for image work.
Cluster checks: preflight, doctor cluster-reportLinux Slurm login nodeSlurm client tools plus selected backend toolsUse preflight --strict when warnings should block launch.
Run: up, runLinux Slurm login nodesbatch, srun, scheduler tools, selected backend toolsup is the normal cluster execution path.
Local launch: up --localLinux host onlyEnroot and runtime.backend: pyxisSingle-host only; not a distributed Slurm substitute.

For Pyxis, srun --help should mention --container-image.

6. Submit On A Real Cluster

When you move to a supported Linux submission host, the normal run is:

hpc-compose up -f compose.yaml

up runs preflight, prepares missing artifacts, renders the batch script, submits it through sbatch, then follows scheduler state and tracked logs. On an interactive TTY it opens the full-screen watch UI; otherwise it falls back to line-oriented output. Add --watch-queue when you want line-oriented queue polling until the Slurm job reaches RUNNING before the normal watch view opens; --queue-warn-after <DURATION> controls the one-time long-pending warning. The watch UI holds the final screen on failures by default; use --hold-on-exit never|failure|always to tune that behavior. Use hpc-compose up --detach -f compose.yaml when you want submit-and-return behavior.

Success looks like:

  • the job is submitted or launched
  • a tracked job id is recorded
  • the watch UI or text follower shows scheduler progress
  • status, ps, and logs can reconnect to the tracked run later

7. If The First Cluster Run Fails

SymptomBest next commandWhy
Missing sbatch, srun, enroot, apptainer, or singularityhpc-compose debug -f compose.yaml --preflightReruns prerequisite checks and keeps the latest tracked context in one report.
srun does not advertise --container-imagehpc-compose doctor cluster-reportPyxis support is unavailable or not loaded on that node.
Job submitted but no service log appearedhpc-compose debug -f compose.yamlShows scheduler state, batch log tail, service log hints, and the next command.
Cache path warning or errorhpc-compose debug -f compose.yaml --preflightConfirms whether x-slurm.cache_dir is shared and writable.
Services start in the wrong orderhpc-compose plan --explain --verbose -f compose.yamlShows normalized dependencies, readiness gates, and planner hints before running.

The longer symptom guide is Troubleshooting.

8. Revisit A Tracked Run Later

hpc-compose jobs list
hpc-compose status -f compose.yaml
hpc-compose ps -f compose.yaml
hpc-compose watch -f compose.yaml
hpc-compose stats -f compose.yaml
hpc-compose logs -f compose.yaml --follow

Use jobs list first when you need to rediscover tracked runs under the current repo tree. Use ps for a stable per-service snapshot, watch to reconnect to the live UI, and logs --follow for a text-only follower.

From A Source Checkout

If you are developing from a local checkout instead of an installed binary:

cargo build --release
target/release/hpc-compose validate -f examples/minimal-batch.yaml
target/release/hpc-compose plan -f examples/minimal-batch.yaml
target/release/hpc-compose plan --show-script -f examples/minimal-batch.yaml

Examples

These examples are the fastest way to understand the intended hpc-compose workflows and adapt them to a real application.

There are two starting points:

  • built-in starter templates generated by hpc-compose new
  • repository example files copied directly from examples/

Before launching anything, run the safe authoring path first:

hpc-compose new --template minimal-batch --name my-app --output compose.yaml
hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml

If you are reading from a source checkout, you can run the same static checks directly against examples/minimal-batch.yaml.

Some repository examples keep an explicit ${CACHE_DIR:-/cluster/shared/hpc-compose-cache} for portability, while starter examples rely on the settings/builtin cache default. Before running on a real cluster, configure a shared path visible from both the submission host and the compute nodes:

export CACHE_DIR=/cluster/shared/hpc-compose-cache
mkdir -p "$CACHE_DIR"
test -w "$CACHE_DIR"

Start Here: The Four Promoted Examples

These four examples are the intended conversion funnel.

minimal-batch.yaml

  • Demonstrates: one service, no dependencies, no image prepare step
  • Expected prerequisites: any machine for plan; a Linux Slurm login node plus the selected runtime backend for up
  • Cluster run, Linux Slurm login node only: hpc-compose up -f examples/minimal-batch.yaml
  • Success signal: the batch log prints Hello from Slurm!

app-redis-worker.yaml

  • Demonstrates: multi-service startup ordering plus TCP readiness inside one allocation
  • Expected prerequisites: a normal Slurm + Enroot submission host and shared CACHE_DIR
  • Cluster run, Linux Slurm login node only: hpc-compose up -f examples/app-redis-worker.yaml
  • Success signal: worker.log shows a successful Redis PING followed by repeated INCR jobs calls

llm-curl-workflow-workdir.yaml

  • Demonstrates: one GPU-backed LLM service plus one client service in the same job
  • Expected prerequisites: a GGUF model at $HOME/models/model.gguf, a GPU-capable Slurm target, and shared CACHE_DIR
  • Cluster run, Linux Slurm login node only: hpc-compose up -f examples/llm-curl-workflow-workdir.yaml
  • Success signal: curl_client.log contains a JSON response from /v1/chat/completions

training-resume.yaml

  • Demonstrates: checkpoint export, resume-aware reruns, and attempt-aware training state
  • Expected prerequisites: shared storage for x-slurm.resume.path plus shared CACHE_DIR
  • Cluster run, Linux Slurm login node only: hpc-compose up -f examples/training-resume.yaml
  • Success signal: results/<job-id>/ contains exported checkpoints and later attempts resume from the previously saved epoch

Beginner Ladder

Use this ordering when you are new to the project:

For a guided version of the first five concepts, run hpc-compose evolve --output compose.yaml. The progressive-complexity lesson walks through minimal, second-service, readiness, failure-policy, and multi-node-placement as one evolving valid spec.

StageStart hereWhy
Authoring onlyminimal-batch.yaml with plan and plan --show-scriptConfirms the tool understands a spec without touching Slurm.
First cluster runminimal-batch.yaml on a Linux Slurm login nodeSmallest real submission and log-check path.
Single-node multi-serviceapp-redis-worker.yamlShows depends_on plus TCP readiness.
GPU or LLM servingllm-curl-workflow-workdir.yaml, llama-app.yaml, or vllm-openai.yamlAdds accelerator resources and service/client coordination.
Durable trainingtraining-checkpoints.yaml or training-resume.yamlAdds artifacts, checkpoints, and resume semantics.
Distributed launchmulti-node-mpi.yaml, multi-node-torchrun.yaml, or framework-specific examples belowAdds allocation-wide or explicitly placed multi-node services.

Built-In Starter Templates

Use built-in templates when you want hpc-compose to write a fresh compose.yaml with your application name filled in for you.

hpc-compose new --list-templates
hpc-compose new --describe-template minimal-batch
hpc-compose new --template minimal-batch --name my-app --output compose.yaml
hpc-compose new --template minimal-batch --name my-app --cache-dir '<shared-cache-dir>' --output compose.yaml

If the workflow you want is not listed by --list-templates, copy the closest repository example directly from examples/.

Broader Example Matrix

The matrix below covers the broader set of runnable examples beyond the four promoted starts. “Built-in template” means hpc-compose new --template <name> can scaffold it; “repository file” means copy the YAML from examples/ directly.

ExampleAvailabilityWhat it demonstratesWhen to start from it
dev-python-app.yamlBuilt-in templateMounted source code plus x-runtime.prepare.commands for dependenciesYou want an iterative development workflow
dev-python-smoke.yamlRepository fileFinite smoke-test variant of the source-mounted Python appYou want to test a development spec without a long-running process
llm-curl-workflow.yamlBuilt-in templateRepo-local variant of the smallest concrete inference workflowYou want the same LLM stack but with models under the repository tree
llama-app.yamlBuilt-in templateGPU-backed service, mounted model files, dependent app serviceYou need accelerator resources or a model-serving pattern
llama-uv-worker.yamlBuilt-in templatellama.cpp serving plus a source-mounted Python worker executed through uvYou want the GGUF server plus mounted worker pattern
multi-node-mpi.yamlBuilt-in templateFirst-class MPI launch, generated MPI hostfile, and one primary-node helperYou want a minimal multi-node MPI pattern without extra orchestration
mpi-pmix-v4-host-mpi.yamlBuilt-in templateVersioned PMIx launch plus host MPI bind/env configurationYour site requires a host MPI stack inside containers
multi-node-partitioned.yamlRepository fileDisjoint node ranges, fractional node selection, and explicit co-locationYou want multiple distributed roles inside one allocation
multi-node-torchrun.yamlBuilt-in templateAllocation-wide torchrun launch using the primary node as rendezvousYou want a multi-node GPU training starting point
multi-node-deepspeed.yamlBuilt-in templateDeepSpeed no-SSH launch using generated rendezvous and hostfile envYou want distributed fine-tuning without hand-written rendezvous setup
multi-node-accelerate.yamlBuilt-in templateHugging Face Accelerate multi-machine launchYou want an Accelerate-based training or fine-tuning starting point
multi-node-horovod.yamlBuilt-in templateHorovod rank-per-GPU launch through Slurm MPIYou want Horovod without SSH fanout
multi-node-jax.yamlBuilt-in templateJAX coordinator/process metadata for jax.distributed.initializeYou want a JAX distributed starting point
nccl-tests.yamlBuilt-in templateMPI-backed NCCL all-reduce probeYou are debugging GPU fabric, NCCL, UCX, or OFI settings
ray-symmetric.yamlBuilt-in templateRay symmetric-run across one Slurm allocationYou want a modern Ray-on-Slurm starting point without an autoscaler
ray-head-workers.yamlBuilt-in templateRay head plus worker steps inside one allocationYou need explicit Ray head/worker control for an older or site-specific setup
dask-scheduler-workers.yamlBuilt-in templateDask scheduler on the primary node plus allocation workersYou want Dask CLI deployment inside one Slurm allocation
spark-standalone.yamlBuilt-in templateSpark standalone master, workers, and app submissionYou need a conservative Spark standalone pattern without external cluster management
flux-nested.yamlBuilt-in templateNested Flux launched through srun flux startYou want Flux scheduling inside an existing Slurm allocation
nextflow-bridge.yamlBuilt-in templateNextflow command wrapper inside one hpc-compose allocationYou want hpc-compose tracking around a workflow engine run without parsing Nextflow files
snakemake-bridge.yamlBuilt-in templateSnakemake command wrapper inside one hpc-compose allocationYou want hpc-compose tracking around a Snakemake run without replacing Snakemake scheduling semantics
postgres-etl.yamlBuilt-in templatePostgreSQL plus a Python data processing jobYou need a database-backed batch pipeline
restart-policy.yamlBuilt-in templatePer-service restart_on_failure with bounded retries and a rolling-window crash-loop guardYou need transient-failure retries without letting one service spin forever
training-checkpoints.yamlBuilt-in templateGPU training with checkpoints exported to shared storageYou need durable checkpoint outputs but not automatic resume semantics
training-sweep.yamlRepository fileEmbedded sweep parameters with interpolation defaults for dry-run and normal render workflowsYou want a small hyperparameter sweep starting point
vllm-openai.yamlBuilt-in templatevLLM serving with an in-job Python clientYou want vLLM-based inference instead of llama.cpp
vllm-uv-worker.yamlBuilt-in templatevLLM serving plus a source-mounted Python worker executed through uvYou want a common LLM stack with mounted app code
mpi-hello.yamlBuilt-in templateMPI hello world using service-level x-slurm.mpiYou need a small first-class MPI workload
multi-stage-pipeline.yamlBuilt-in templateTwo-stage pipeline coordinating through the shared job mountYou need file-based stage-to-stage handoff
pipeline-dag.yamlBuilt-in templateOne-shot preprocess -> train -> postprocess DAG using successful-completion dependenciesYou need stage completion, not service readiness, to gate downstream work
fairseq-preprocess.yamlBuilt-in templateCPU-heavy NLP data preprocessing with parallel workersYou need a CPU-bound data preprocessing pipeline
canary-right-size.yamlRepository fileA deliberately over-requested training probe for hpc-compose germinateYou want to practice right-sizing recommendations before changing a real spec
rendezvous-model-server.yamlRepository fileA provider job that registers a model-server endpoint in the shared cacheYou want one Slurm allocation to publish a service for later jobs
rendezvous-client.yamlRepository fileA separate client job that resolves HPC_COMPOSE_RDZV_MODEL_SERVER_URLYou want cross-job service discovery through shared storage

Which Example Should I Start From?

Companion notes for the more involved examples live alongside the example assets:

Development Workflow Recipe

examples/dev-python-app.yaml mounts examples/app/ and runs a long-lived Python process, so it is best for hot reload:

hpc-compose dev -f examples/dev-python-app.yaml
hpc-compose tmux -f examples/dev-python-app.yaml --no-attach

examples/dev-python-smoke.yaml keeps the same mounted-source shape but uses a finite command, so it is suitable for smoke tests:

hpc-compose test --local -f examples/dev-python-smoke.yaml
hpc-compose test --submit --time 00:01:00 -f examples/dev-python-smoke.yaml

Adaptation Checklist

  1. Copy the closest repository example to your own compose.yaml, or run hpc-compose new --template <name> --name my-app --output compose.yaml when a matching built-in template exists.
  2. Configure a cache path visible from both the login node and compute nodes through hpc-compose setup --cache-dir, x-slurm.cache_dir, or [defaults.cache] / [profiles.<name>.cache].
  3. Override CACHE_DIR before running repository examples that use ${CACHE_DIR:-...}, or replace the default cache path in your copied file.
  4. Replace the example image, command, environment, and volumes with your workload.
  5. Keep active source in volumes and keep slower-changing dependency installation in x-runtime.prepare.commands.
  6. Add readiness to services that must be reachable before dependents continue.
  7. Adjust top-level or per-service x-slurm settings for your cluster.
  8. Run hpc-compose plan -f compose.yaml before the first run, and hpc-compose debug -f compose.yaml --preflight if that run fails.
  9. Run cluster up only from a supported Linux Slurm submission host with the selected runtime backend available.

Spec Metamorphosis

hpc-compose evolve is an interactive authoring tutorial. It starts from a minimal valid spec and progressively rewrites the same output file through increasingly realistic HPC workflow features.

The command is safe to run on a laptop or login node:

  • it validates and plans candidate specs,
  • it writes only the selected compose file,
  • it does not prepare images,
  • it does not call sbatch,
  • it does not run preflight.

Canonical Lesson

V1 ships one lesson:

hpc-compose evolve --describe-lesson progressive-complexity

The progressive-complexity path contains five valid snapshots:

Step idWhat it teachesSafe follow-up
minimalOne service and one single-node Slurm allocationhpc-compose plan -f compose.yaml
second-serviceA dependent service and startup orderinghpc-compose plan -f compose.yaml
readinessreadiness plus depends_on.condition: service_healthyhpc-compose plan --show-script -f compose.yaml
failure-policyrestart_on_failure with bounded retries and a rolling crash-loop windowhpc-compose inspect -f compose.yaml
multi-node-placementA two-node allocation with explicit non-overlapping service placementhpc-compose plan -f compose.yaml

The final step can validate anywhere, but running it requires a Slurm target that can grant a two-node allocation and a runtime backend available on that cluster.

Interactive Flow

Start the tutorial:

hpc-compose evolve --output compose.yaml

At each step, the command prints:

  • a short explanation,
  • the concepts being introduced,
  • a compact diff from the last accepted spec,
  • and the validation summary for the candidate.

Controls:

  • Enter, y, or a accepts the step and writes compose.yaml.
  • s skips the current step.
  • q quits after the last accepted valid spec.
  • ? prints prompt help.

Transcript Example

$ hpc-compose evolve --output compose.yaml
Step 1/5: Minimal batch spec
Accept this step? [Y/a/s/q/?]
wrote /path/to/compose.yaml

Step 2/5: Add a dependent service
Accept this step? [Y/a/s/q/?]
wrote /path/to/compose.yaml

Step 3/5: Gate on readiness
Accept this step? [Y/a/s/q/?]
wrote /path/to/compose.yaml

Inspect the accepted readiness-gated spec:

hpc-compose plan -f compose.yaml

Then continue the tutorial to failure policies and multi-node placement:

Accept this step? [Y/a/s/q/?]

For automation or docs examples, accept through a specific step noninteractively:

hpc-compose evolve --yes --until readiness --format json --output compose.yaml

Non-Goals

  • V1 does not mutate arbitrary existing specs.
  • V1 is not a full-screen TUI.
  • V1 does not submit jobs.

For a fresh single-template scaffold, use hpc-compose new. For choosing among the broader runnable examples, use Examples.

Task Guide

Use this page when you know what you want to do, but not yet which command or example should be your starting point.

First run

  • Read Quickstart.
  • Run hpc-compose evolve --output compose.yaml if you want a guided progression from minimal through multi-node-placement.
  • Run hpc-compose new --list-templates if you want to inspect the built-in starter templates before choosing one.
  • Start from minimal-batch with hpc-compose new --template minimal-batch --name my-app --output compose.yaml.
  • Before running on a cluster, configure a shared cache with hpc-compose setup --cache-dir '<shared-cache-dir>' or explicit x-slurm.cache_dir. If you copy a repository example that uses CACHE_DIR, override it for your cluster before running.
  • Run hpc-compose plan -f compose.yaml before the first real run. Add --show-script when you want to inspect the generated launcher without writing a file.
  • Run hpc-compose up -f compose.yaml only from a supported Linux Slurm submission host.

Remember directory/data/env settings once

  • Run hpc-compose setup to create or update the project-local settings file (.hpc-compose/settings.toml).
  • Use hpc-compose --profile dev up so compose path, env files, env vars, and binary paths come from the selected profile.
  • Run hpc-compose context --format json to inspect resolved paths plus value sources. Interpolation variables are scoped to names referenced by the compose file and sensitive-looking values are redacted unless you add --show-values.
  • Use --settings-file <PATH> when you need an explicit settings file instead of upward discovery.

Migrate from Docker Compose

  • Read Docker Compose Migration.
  • Replace build: with image: plus x-runtime.prepare.commands.
  • Replace service-name networking with 127.0.0.1 or explicit allocation metadata where appropriate.

Single-node multi-service app

Multi-node distributed training

Checkpoint and resume workflows

  • Start from training-checkpoints.yaml when you only need artifact output.
  • Start from training-resume.yaml when the run should resume from shared storage across retries or later submissions.
  • Keep the canonical resume source in x-slurm.resume.path, not in exported artifact bundles.

LLM serving workflows

Debug cluster readiness

  • Run hpc-compose validate -f compose.yaml.
  • Run hpc-compose validate -f compose.yaml --strict-env when default interpolation fallbacks should be treated as failures.
  • Run hpc-compose plan --verbose -f compose.yaml.
  • Run hpc-compose preflight -f compose.yaml.
  • Run hpc-compose debug -f compose.yaml --preflight after a failed tracked run.
  • Read Troubleshooting.

Cache and artifact management

  • Use hpc-compose cache list to inspect imported/prepared artifacts.
  • Use hpc-compose cache inspect -f compose.yaml to see per-service reuse expectations.
  • Use hpc-compose --profile dev cache prune --age 14 when you want age-based cleanup to follow the active context cache dir.
  • Use hpc-compose cache prune --age 7 --cache-dir '<shared-cache-dir>' when you want a direct cache cleanup that does not depend on compose resolution.
  • Use hpc-compose artifacts -f compose.yaml after a run to export tracked payloads.

Find and clean tracked runs

  • Use hpc-compose jobs list to scan the current repo tree for tracked runs.
  • Use hpc-compose ps -f compose.yaml when you want a one-shot per-service runtime table.
  • Use hpc-compose watch -f compose.yaml to reconnect to the live watch UI for the latest tracked job.
  • Use hpc-compose jobs list --disk-usage when you need a quick size estimate before deleting old state.
  • Use hpc-compose clean -f compose.yaml --dry-run --age 7 to preview what a cleanup would remove.
  • Use hpc-compose clean -f compose.yaml --all --format json when automation needs a stable cleanup report for one compose context, including effective latest IDs plus stale-pointer diagnostics.

Automation and scripting with JSON output

  • Prefer --format json for machine-readable output on non-streaming commands such as new, plan, validate, render, prepare, preflight, config, inspect, debug, status, ps, stats, score, artifacts, down, cancel, setup, cache, clean, and context. For up, --format json requires --detach or --dry-run.
  • Include context --format json when automation needs resolved compose path, binaries, referenced interpolation vars, and runtime path roots.
  • Use hpc-compose stats --format jsonl or --format csv when downstream tooling wants row-oriented metrics.
  • Treat --json as a compatibility alias on older machine-readable commands; new automation should prefer --format json. Streaming commands such as logs --follow, watch, and completions keep their native text or script output.

Migrating from Docker Compose

This guide helps you convert an existing docker-compose.yaml into an hpc-compose spec for Slurm clusters using Pyxis/Enroot, Apptainer, Singularity, or host runtimes.

At a glance

Docker Compose featurehpc-compose equivalent
imageimage (same syntax, auto-prefixed with docker://)
commandcommand (string or list, same syntax)
entrypointentrypoint (string or list, same syntax)
environmentenvironment (map or list, same syntax)
volumesvolumes (host:container bind mounts, same syntax)
depends_ondepends_on (list or map with condition: service_started / service_healthy)
working_dirworking_dir (requires explicit command or entrypoint)
buildNot supported. Use image + x-runtime.prepare.commands instead.
portsNot supported. Use host networking semantics instead. 127.0.0.1 works only when both sides run on the same node.
networks / network_modeNot supported. There is no Docker-style overlay network or service-name DNS layer.
restartNot supported as a Compose key. Use services.<name>.x-slurm.failure_policy.
deployNot supported. Use x-slurm for resource allocation.
healthcheckSupported for a constrained TCP/HTTP subset and normalized into readiness; use explicit readiness for anything more complex.
Resource limits (cpus, mem_limit)Use x-slurm.cpus_per_task, x-slurm.mem, x-slurm.gpus

Side-by-side: web app + Redis

Docker Compose

version: "3.9"
services:
  redis:
    image: redis:7
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

  app:
    build: .
    ports:
      - "8000:8000"
    depends_on:
      redis:
        condition: service_healthy
    environment:
      REDIS_HOST: redis
    volumes:
      - ./app:/workspace
    working_dir: /workspace
    command: python -m main

hpc-compose

version: "1"
name: my-app

x-slurm:
  job_name: my-app
  time: "01:00:00"
  mem: 8G
  cpus_per_task: 4
  cache_dir: /cluster/shared/hpc-compose-cache

services:
  redis:
    image: redis:7
    command: redis-server --save "" --appendonly no
    readiness:
      type: tcp
      host: 127.0.0.1
      port: 6379
      timeout_seconds: 30

  app:
    image: python:3.11-slim
    depends_on:
      redis:
        condition: service_healthy
    environment:
      REDIS_HOST: 127.0.0.1
    volumes:
      - ./app:/workspace
    working_dir: /workspace
    command: python -m main
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir redis fastapi uvicorn

Key changes

  1. version: "3.9"version: "1" or remove the field. hpc-compose uses this as its own spec schema version, not a Docker Compose compatibility version.
  2. build: .image: python:3.11-slim + x-runtime.prepare.commands for dependencies.
  3. ports → Removed. Services communicate via 127.0.0.1 because they run on the same node.
  4. REDIS_HOST: redisREDIS_HOST: 127.0.0.1. No DNS service names; use localhost.
  5. healthcheckreadiness with type: tcp.
  6. Added x-slurm block for Slurm resource allocation (time, memory, CPUs).
  7. Configured a shared cache for image storage, either through x-slurm.cache_dir as shown or project settings.

Key differences

Networking

Docker Compose creates isolated networks where services find each other by name. In hpc-compose, helper services on the same node share the host network directly, and multi-node distributed steps must use explicit rendezvous addresses. Replace service hostnames with 127.0.0.1 only when both sides intentionally stay on one node. For multi-node runs, derive the rendezvous host from /hpc-compose/job/allocation/primary_node or HPC_COMPOSE_PRIMARY_NODE.

Building images

Docker Compose uses build: to run a Dockerfile. hpc-compose uses x-runtime.prepare.commands instead:

# Docker Compose
app:
  build:
    context: .
    dockerfile: Dockerfile

# hpc-compose
app:
  image: python:3.11-slim
  x-runtime:
    prepare:
      commands:
        - pip install --no-cache-dir -r /tmp/requirements.txt
      mounts:
        - ./requirements.txt:/tmp/requirements.txt

Prefer volumes for fast-changing source code and x-runtime.prepare.commands for slower-changing dependencies. x-enroot.prepare remains accepted as a Pyxis/Enroot compatibility spelling, but new specs should use x-runtime.prepare.

Health checks vs readiness

Docker Compose uses healthcheck with a test command, interval, timeout, and retries. hpc-compose now accepts a constrained healthcheck subset and normalizes it into readiness:

# TCP: wait for a port to accept connections
readiness:
  type: tcp
  host: 127.0.0.1
  port: 6379
  timeout_seconds: 30

# Log: wait for a pattern in service output
readiness:
  type: log
  pattern: "Server started"
  timeout_seconds: 60

# Sleep: fixed delay
readiness:
  type: sleep
  seconds: 5

Supported healthcheck migration patterns:

  • ["CMD", "nc", "-z", HOST, PORT]
  • ["CMD-SHELL", "nc -z HOST PORT"]
  • recognized curl probes against http:// or https:// URLs
  • recognized wget --spider probes against http:// or https:// URLs

Still unsupported in v1:

  • arbitrary custom command probes
  • interval
  • retries
  • start_period

Resource allocation

Docker Compose uses deploy.resources or top-level cpus/mem_limit. hpc-compose uses Slurm-native resource settings:

x-slurm:
  time: "02:00:00"
  mem: 32G
  cpus_per_task: 8
  gpus: 1

services:
  app:
    x-slurm:
      cpus_per_task: 4
      gpus: 1

Restart policies

Docker Compose supports restart: always, on-failure, etc. hpc-compose does not accept the Compose restart: key, but it does support per-service restart behavior through services.<name>.x-slurm.failure_policy.

services:
  app:
    image: python:3.11-slim
    x-slurm:
      failure_policy:
        mode: restart_on_failure
        max_restarts: 3
        backoff_seconds: 5
        window_seconds: 60
        max_restarts_in_window: 3

restart_on_failure retries only on non-zero exits. It enforces both a lifetime restart cap and a rolling-window crash-loop cap during one live batch-script execution. If you omit the rolling-window fields, hpc-compose defaults to window_seconds: 60 and max_restarts_in_window: <resolved max_restarts>. Use mode: fail_job (default) for fail-fast behavior, or mode: ignore for non-critical sidecars.

Practical mapping:

  • Compose restart: "no" -> omit failure_policy or use mode: fail_job
  • Compose restart: on-failure[:N] -> use mode: restart_on_failure with max_restarts: N when you want a similar lifetime retry budget
  • Compose restart: always / unless-stopped -> no direct equivalent; hpc-compose intentionally keeps restart handling bounded within one batch job

The rolling-window fields have no direct Docker Compose equivalent. They exist to stop fast crash loops inside one Slurm allocation without giving up a larger lifetime retry budget for transient failures.

What to do about unsupported features

FeatureAlternative
buildUse image + x-runtime.prepare.commands. Mount build context files with x-runtime.prepare.mounts if needed.
portsNot needed. Services share 127.0.0.1 on one node.
networks / network_modeNot needed. All services are on the same host network.
restartUse services.<name>.x-slurm.failure_policy (fail_job, ignore, restart_on_failure).
deployUse x-slurm for resources.
Service DNS namesUse 127.0.0.1 for same-node helpers, or explicit host metadata such as HPC_COMPOSE_PRIMARY_NODE for distributed runs.
Named volumesUse host-path bind mounts in volumes.
.env fileSupported. .env in the compose file directory is loaded automatically.

Migration checklist

  1. Replace Compose version: — Use version: "1" or omit the field; values like "3.9" are rejected by hpc-compose.
  2. Remove build: — Replace with image: pointing to a base image. Move dependency installation to x-runtime.prepare.commands.
  3. Remove ports: — Use host-network semantics instead of container port publishing.
  4. Remove networks: / network_mode: — There is no Docker-style overlay network or service-name DNS layer.
  5. Remove Compose restart: — use services.<name>.x-slurm.failure_policy when you need per-service restart behavior.
  6. Remove deploy: — Use x-slurm for resource allocation.
  7. Replace service hostnames — Change any service-name references (e.g. redis, postgres) to 127.0.0.1 for same-node helpers, or to explicit allocation metadata for distributed runs.
  8. Replace healthcheck: — Convert to readiness: with type: tcp, type: log, or type: sleep.
  9. Add x-slurm: — Set time, mem, cpus_per_task, and optionally gpus, partition, account.
  10. Set cache storage — Point x-slurm.cache_dir or setup --cache-dir to shared storage visible from login and compute nodes.
  11. Validate — Run hpc-compose validate -f compose.yaml to check the converted spec.
  12. Inspect — Run hpc-compose inspect --verbose -f compose.yaml to confirm the planner understood your intent.

Slurm And Container Basics

This page is for users who know shell scripts, Python jobs, or Docker images, but are new to Slurm and HPC container runtimes.

It is not a Slurm administration guide. The goal is to explain the vocabulary you will see in generated hpc-compose scripts and in cluster error messages.

The Short Mental Model

compose.yaml
  -> hpc-compose plan/render/up
  -> generated sbatch script
  -> sbatch creates one Slurm allocation
  -> srun launches one or more service steps
  -> Pyxis/Enroot, Apptainer, Singularity, or host software starts the process

The important point is that hpc-compose does not replace Slurm. It writes one inspectable Slurm batch script and uses Slurm to run the planned services inside one allocation.

Slurm Terms In Plain Language

TermMeaning for hpc-compose users
Login nodeThe machine where you edit files, run plan, run preflight, and submit jobs. Do not run long compute work here.
Compute nodeA worker machine where Slurm runs your job after it starts.
PartitionA named queue or resource pool. Sites often use partitions to separate CPU, GPU, debug, and large jobs.
JobA submitted unit of work managed by Slurm. hpc-compose up submits one job.
AllocationThe nodes, CPUs, memory, GPUs, and wall time reserved for a job.
Batch scriptA shell script submitted with sbatch. It contains #SBATCH directives and normal shell commands.
Job stepA launched process group inside the allocation. hpc-compose launches services as srun steps.
TaskUsually one process or rank. More ntasks means more processes, not more CPU threads per process.
cpus_per_taskCPU threads requested for each task. This is common for threaded Python, OpenMP, or data-loader-heavy jobs.
gresSlurm’s generic resource request field, commonly used for GPUs.

If you only remember one distinction: sbatch gets the allocation; srun starts work inside it.

A Minimal sbatch Script

A traditional Slurm script often looks like this:

#!/usr/bin/env bash
#SBATCH --job-name=hello-slurm
#SBATCH --partition=<partition>
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G

set -euo pipefail

hostname
python -c 'print("hello from a Slurm job")'

Submit it from a Slurm login node:

sbatch hello.sbatch

sbatch returns a job id. The job may wait in the queue before it starts, and Slurm normally writes batch output to a file such as slurm-<job-id>.out unless the script or site policy sets another output path.

Where hpc-compose Fits

The equivalent hpc-compose starting point is a spec:

name: hello-slurm

x-slurm:
  job_name: hello-slurm
  partition: <partition>
  time: "00:10:00"
  cpus_per_task: 2
  mem: 4G

services:
  app:
    image: python:3.11-slim
    command: python -c "import socket; print('hello from', socket.gethostname())"

Preview the generated Slurm script before submitting:

hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml

Run it on a supported Slurm login node:

hpc-compose up -f compose.yaml

up runs preflight checks, prepares missing runtime artifacts, renders the batch script, calls sbatch, records tracked job metadata, and follows scheduler/log output.

How YAML Maps To Slurm

In the specIn SlurmWhy it matters
Top-level x-slurm.partition#SBATCH --partitionSelects the site queue/resource pool.
Top-level x-slurm.time#SBATCH --timeSets the allocation wall-time limit.
Top-level x-slurm.nodes#SBATCH --nodesReserves the allocation node count.
Top-level x-slurm.ntasks#SBATCH --ntasksSets the default process/rank count for the allocation.
Top-level x-slurm.cpus_per_task#SBATCH --cpus-per-taskRequests CPU threads per task.
Top-level x-slurm.mem#SBATCH --memRequests memory for scheduling and enforcement. It is not disk space.
Top-level x-slurm.gres#SBATCH --gresRequests generic resources such as GPUs.
Service x-slurm.ntaskssrun --ntasksSets the process/rank count for that service step.
Service x-slurm.extra_srun_argsRaw srun argumentsEscape hatch for site-specific launch options.

Prefer first-class fields from Spec Reference when they exist. Use raw submit_args or extra_srun_args only for site-specific options that hpc-compose does not model directly.

sbatch vs srun vs hpc-compose up

CommandWhat it does
sbatch job.sbatchSubmits a batch script and creates a Slurm job when scheduled.
srun ...Launches a job step. Inside an sbatch allocation, this starts work on allocated resources.
hpc-compose render -f compose.yaml --output job.sbatchWrites the generated batch script without submitting it.
hpc-compose up -f compose.yamlRuns the normal end-to-end flow and submits through sbatch.
hpc-compose status, ps, logs, watchReconnects to tracked jobs after submission.

When debugging, inspect the generated script:

hpc-compose plan --show-script -f compose.yaml

If a job was submitted but failed before service logs appeared, inspect Slurm state and batch output through:

hpc-compose debug -f compose.yaml

Pyxis And Enroot Basics

Slurm itself is the scheduler. Container support depends on what the cluster installed.

For the default runtime.backend: pyxis path:

  • Pyxis is the Slurm plugin that adds --container-* flags to srun.
  • Enroot is the unprivileged container image/runtime layer used under Pyxis.
  • An imported image is commonly represented as a cacheable SquashFS artifact such as .sqsh.
  • hpc-compose maps service image, command, environment, working directory, and volumes into the generated srun --container-* launch.

Check Pyxis support on the target login node:

srun --help | grep container-image
hpc-compose preflight -f compose.yaml

If srun does not advertise --container-image, choose another backend or ask the site how Pyxis is enabled. Enroot being installed is not the same thing as Slurm supporting Pyxis flags.

Other supported runtime paths are covered in Runtime Backends.

Why Shared Storage Matters

hpc-compose prepare can run before the Slurm job starts, but services run later on compute nodes. That means the resolved runtime cache must be visible from both places. You can set it in project settings:

[profiles.dev.cache]
dir = "/cluster/shared/hpc-compose-cache"

Or directly in a spec:

x-slurm:
  cache_dir: /cluster/shared/hpc-compose-cache

Use a project, work, scratch, or workspace path that your site documents as shared. Do not use /tmp, /var/tmp, /private/tmp, or /dev/shm for the resolved cache directory.

The same rule applies to host paths mounted through volumes: the compute node must be able to read the path when the service starts.

Small Checks That Explain A Lot

These commands are useful in tiny smoke tests:

hostname
env | grep '^SLURM_' | sort
python -c 'import socket; print(socket.gethostname())'
cat /etc/os-release

Inside a container, cat /etc/os-release should describe the container image. Outside the container, it describes the host. That simple distinction helps diagnose whether a command is running where you expect.

Common Beginner Mistakes

SymptomLikely misunderstandingNext step
plan looks fine but up fails immediatelyStatic validation is not the same as cluster readiness.Run hpc-compose debug -f compose.yaml --preflight on the login node.
srun does not accept --container-imagePyxis is not available or not loaded in Slurm.Read Runtime Backends and use the site-supported backend.
Cache warnings mention local pathsThe cache path is not shared between login and compute nodes.Configure x-slurm.cache_dir or setup --cache-dir with shared storage.
A GPU job waits longer than expectedThe request may be larger than available idle resources.Check site queue policy and start with the smallest useful request.
More CPUs were requested but only one process appearscpus_per_task adds threads per task; it does not create more tasks.Use ntasks for more processes/ranks, and make the application use them.
Docker Compose ports or service DNS do not workThis is one Slurm allocation, not a Docker Compose network.Use host networking and Slurm/hpc-compose allocation metadata instead.

Further Reading

Execution model

This page explains the few runtime rules that matter most when a Compose mental model meets Slurm and HPC runtime backends.

What runs where

StageWhere it runsWhat happens
plan, validate, inspect, preflightlogin node or local shellParse the spec, resolve paths, preview the runtime plan, and check prerequisites
preparelogin node or local shell with the selected runtime backendImport base images and build prepared runtime artifacts
uplogin node or local shell with Slurm accessRun preflight, prepare missing artifacts, render the batch script, call sbatch, and watch by default
Batch script and servicescompute-node allocationLaunch the planned services through srun and the selected runtime backend
status, ps, watch, stats, logs, artifactslogin node or local shellRead tracked metadata and job outputs after submission

The main consequence is simple: image preparation and validation happen before the job starts, but the containers themselves run later inside the Slurm allocation.

Service failure policies inside one job

hpc-compose does not provide a separate long-running orchestrator. Service failure handling happens inside the rendered batch script for the current allocation.

  • mode: fail_job keeps fail-fast behavior and stops the job on the first non-zero service exit.
  • mode: ignore records the failure but allows the rest of the job to continue.
  • mode: restart_on_failure only reacts to non-zero process exits. It does not restart on successful exits, and it does not use cross-attempt or cross-requeue history.

For restart_on_failure, the batch script enforces two limits during one live execution:

  • a lifetime cap through max_restarts
  • a rolling-window cap through max_restarts_in_window within window_seconds

If a service omits the rolling-window fields, hpc-compose still enables crash-loop protection with window_seconds: 60 and max_restarts_in_window: <resolved max_restarts>.

Use status to inspect the tracked policy state after submission. The text view reports:

state service 'worker': failure_policy=restart_on_failure restarts=1/5 window=1/3@60s last_exit=42

Use logs to inspect the corresponding restart messages from the batch script when you need to distinguish lifetime-cap exhaustion from rolling-window exhaustion.

Use per-service x-slurm.hooks when you want host-side notifications around those policy transitions. on: restart runs before a granted relaunch; on: window_exhausted runs when the rolling-window guard blocks another restart. These hooks are best-effort and do not change the service policy outcome.

Which paths must be shared

  • The resolved cache directory must be visible from both the login node and the compute nodes. It may come from x-slurm.cache_dir, project settings, or the builtin $HOME/.cache/hpc-compose fallback.
  • Relative host paths in volumes, local image paths, and x-runtime.prepare.mounts resolve against the compose file directory.
  • Each submitted job writes tracked state under ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID} on the host.
  • That per-job directory is mounted into every container at /hpc-compose/job.
  • Multi-node jobs also populate /hpc-compose/job/allocation/{primary_node,nodes.txt} and export allocation-wide HPC_COMPOSE_NODE... variables plus service-scoped HPC_COMPOSE_SERVICE_NODE... variables.

Use /hpc-compose/job for small shared state inside the allocation, such as ready files, request payloads, logs, metrics, or teardown signals.

Enroot runtime paths

The generated batch script sets three Enroot runtime paths scoped per job under the resolved cache directory:

VariableValuePurpose
ENROOT_CACHE_PATH$CACHE_ROOT/runtime/$SLURM_JOB_ID/cacheEnroot image cache for the current job
ENROOT_DATA_PATH$CACHE_ROOT/runtime/$SLURM_JOB_ID/dataEnroot data directory for the current job
ENROOT_TEMP_PATH$CACHE_ROOT/runtime/$SLURM_JOB_ID/tmpEnroot temp directory for the current job

These paths are created at batch startup and are available inside the batch script and to tooling that reads Enroot environment variables. They are not injected into service containers.

Warning

Do not put the resolved cache directory under /tmp, /var/tmp, /private/tmp, or /dev/shm. Those paths are not safe for login-node prepare plus compute-node reuse.

Networking inside the allocation

  • Single-node services share the host network on one node.
  • In a multi-node job, helper services stay on the allocation’s primary node by default.
  • A distributed service may span the full allocation, or services may use x-slurm.placement to select explicit allocation node subsets.
  • Partitioned services should use service-scoped metadata such as HPC_COMPOSE_SERVICE_PRIMARY_NODE, HPC_COMPOSE_SERVICE_NODE_COUNT, HPC_COMPOSE_SERVICE_NODELIST, and HPC_COMPOSE_SERVICE_NODELIST_FILE.
  • ports, custom Docker networks, and service-name DNS are not part of the model.
  • Use depends_on plus readiness when a dependent service must wait for real availability rather than process start.
  • Use depends_on with condition: service_completed_successfully when a dependent service should wait for a one-shot stage to exit successfully.

Use 127.0.0.1 only when both sides are intentionally on the same node. For multi-node distributed or partitioned runs, derive rendezvous addresses from allocation or service metadata files and environment variables instead of relying on localhost.

If a service binds its TCP port before it is actually ready, prefer HTTP or log-based readiness over plain TCP readiness.

volumes vs x-runtime.prepare

MechanismUse it forWhen it is appliedReuse behavior
volumesfast-changing source code, model directories, input data, checkpoint pathsat runtime inside the allocationreads live host content every normal run
x-runtime.prepare.commandsslower-changing dependencies, tools, and image customizationbefore submission on the login nodecached until the prepared artifact changes

Recommended default:

  • keep active source trees in volumes
  • keep slower-changing dependency installation in x-runtime.prepare.commands
  • use prepare.mounts only when the prepare step truly needs host files

Warning

If a mounted file is a symlink, the symlink target must also be visible from inside the mounted directory. Otherwise the path can exist on the host but fail inside the container.

Command vocabulary

  • The normal run is hpc-compose up -f compose.yaml. See Quickstart for the full end-to-end description.
  • The tracked follow-up tools are status for scheduler/log summaries, ps for a stable per-service snapshot, and watch when you want to reconnect to the live TUI later.
  • The debugging flow is validate, inspect, preflight, and prepare run separately when you need more visibility.

Read Runtime Backends before changing runtime.backend, Runbook for the operational workflow, Examples for starting points, and Spec reference for exact field behavior.

Supported Slurm Model

This page makes the hpc-compose Slurm boundary explicit. It is a tool for compiling one Compose-like application into one Slurm allocation with one or more srun steps. Those steps can use Pyxis/Enroot, Apptainer, Singularity, or host runtime software. It is not a general frontend for the full Slurm command surface.

First-class support

These capabilities are modeled, validated, and intentionally supported by the planner, renderer, and tracked-job workflow.

AreaSupport
Allocation modelOne Slurm allocation per application
Submission flownew, plan, validate, config, inspect, preflight, prepare, render, up, when, alloc, run, debug
Tracked job workflowstatus, ps, watch, stats, score, logs, down, cancel, artifacts, clean, cache inspection/pruning
Top-level Slurm fieldsjob_name, partition, account, qos, time, nodes, ntasks, ntasks_per_node, cpus_per_task, mem, gres, gpus, GPU/CPU binding fields, constraint, output, error, chdir
Service step fieldsnodes, placement, ntasks, ntasks_per_node, cpus_per_task, gres, gpus, GPU/CPU binding fields, mpi
Multi-node modelSingle-node jobs, full-allocation distributed steps, and explicit node-index partitioning within one allocation
Runtime orchestrationdepends_on, readiness checks, one-shot completion dependencies, service failure policies, primary-node helper placement, explicit co-location through placement.share_with
Service hooksPer-service prologue and epilogue lifecycle hooks, plus host-side restart and window_exhausted event hooks
Runtime workflowPyxis/Enroot .sqsh, Apptainer/Singularity .sif, host runtime commands, x-runtime.prepare, shared cache handling
Scratch and stagingx-slurm.scratch, stage_in, stage_out, per-service scratch opt-out, raw #BB/#DW burst-buffer directives
Job trackingScheduler state via squeue/sacct, step stats via sstat, tracked logs, runtime state, metrics, artifacts, resume metadata
Advisory cluster weatherweather summarizes current node and queue conditions from read-only Slurm probes without reserving resources or changing submission behavior
Conditional submissionwhen actively monitors typed conditions, then submits one normal hpc-compose allocation
Canary right-sizinggerminate submits one short canary, writes latest-canary.json, and recommends resource settings without rewriting the spec
Hyperparameter sweepssweep submit expands one embedded sweep into many independent single-allocation jobs, then sweep status aggregates their tracked state
Cross-job rendezvousProvider/client discovery through shared-cache JSON records under one cluster-visible cache directory

Raw pass-through

These capabilities are usable, but hpc-compose does not model or validate their semantics beyond passing them through to Slurm.

MechanismWhat it allows
x-slurm.submit_argsRaw #SBATCH ... lines for site-specific flags such as mail settings, reservations, or other submit-time options
services.<name>.x-slurm.extra_srun_argsRaw srun arguments for site-specific launch flags such as exclusivity settings
Existing reservationsJoining an already-created reservation through raw submit args is supported as pass-through

Pass-through is appropriate when a site-specific flag is useful but does not justify a first-class schema field. hpc-compose rejects line breaks and null bytes in raw #SBATCH entries so one list entry cannot emit multiple directives, but it does not validate the Slurm semantics of those flags.

Unsupported or out of scope

These capabilities are intentionally outside the product seam.

AreaStatus
Admin-plane Slurm managementOut of scope
sacctmgr account administrationOut of scope
Reservation creation or lifecycle managementOut of scope
Federation / multi-cluster controlOut of scope
Cross-cluster service discoveryOut of scope; rendezvous is same-cluster shared-storage coordination only
Generic scontrol mutationOut of scope
Broad cluster inspection tools such as a full sinfo / sprio / sreport frontendOut of scope; weather is limited to a compact advisory snapshot
Background submit daemons or reservationsOut of scope; when is a foreground advisory monitor and does not reserve resources
Dynamic scheduling or bin packing across nodesNot supported; use explicit x-slurm.placement selectors
Heterogeneous jobs and job arrays as first-class workflow conceptsNot supported in v1; sweeps deliberately submit many normal allocations instead of Slurm arrays
Compose build, ports, custom networks, restart, deployNot supported

Non-goals

hpc-compose should not grow into a generic Slurm administration layer. In particular, it will not broaden into sacctmgr, reservation management, federation control, or generic scontrol mutation. Those are real Slurm features, but they do not fit the “one application, one allocation, tracked runtime workflow” seam this tool is built around.

Runtime Backends

runtime.backend selects how each service is launched inside the Slurm step. The default is pyxis.

For a beginner explanation of Slurm steps, Pyxis, Enroot, and shared runtime caches, start with Slurm And Container Basics.

runtime:
  backend: pyxis

Backend Summary

BackendLaunch shapeRequired toolsImage/artifact shapeNotes
pyxissrun --container-*Slurm with Pyxis support plus Enroot on the submission hostremote images or local .sqsh / .squashfsDefault path and the only backend supported by local development workflows.
apptainersrun plus apptainer exec/runapptainer on submission and compute nodesremote images prepared or reused as .sif; local .sif acceptedUse when the site standardizes on Apptainer instead of Pyxis.
singularitysrun plus singularity exec/runsingularity on submission and compute nodesremote images prepared or reused as .sif; local .sif acceptedSimilar to Apptainer for sites that still use Singularity.
hostdirect srun commandSlurm client tools and host software/modulesno container imageServices must set command or entrypoint; image prepare and container bind mounts are not applied.

For Pyxis, check support with:

srun --help | grep container-image

For all backends, preflight checks the selected backend tools:

hpc-compose preflight -f compose.yaml

Local Mode

up --local, test --local, dev, and tmux are intentionally narrow:

  • Linux only
  • runtime.backend: pyxis only
  • Pyxis-compatible Enroot tooling on the host
  • single-host specs only
  • no distributed or partitioned placement
  • no service-level MPI
  • no Slurm arrays or scheduler dependencies

Use local mode to inspect and debug a Pyxis/Enroot single-host launch path. dev adds file-change restart requests to the local supervisor, and tmux tails tracked local service logs in panes. Neither command changes the process-supervision model, and local mode is not a replacement for Slurm distributed execution.

Host Runtime Notes

runtime.backend: host runs service commands directly under srun. It is useful for module-based workflows or nested schedulers that already manage their own software environment.

Because there is no container:

  • image is optional
  • service volumes are rejected
  • x-runtime.prepare and x-enroot.prepare are rejected
  • x-slurm.mpi.host_mpi.bind_paths is not meaningful

Use top-level or service-level x-env for host modules, Spack views, and environment variables.

Running Compose-Style Multi-Service Workflows on Slurm

This is the canonical explainer for hpc-compose.

hpc-compose exists because two common approaches leave a gap:

  • plain sbatch scripts give you control, but multi-service coordination, startup ordering, and repeatability stay ad hoc
  • Docker Compose is familiar, but its networking and orchestration assumptions do not map cleanly to one Slurm allocation

hpc-compose takes the narrow path between them: a Compose-like authoring model that still produces one inspectable Slurm job.

The Pain in Current Slurm Workflows

Once a job stops being a single process, the friction climbs quickly:

  • helper services need explicit startup ordering
  • cluster-specific environment setup gets mixed into hand-written shell
  • debugging starts from generated state you never inspected beforehand
  • repeated workflows drift because the real behavior lives across scripts, notes, and local conventions

This is especially common in research ML and HPC-adjacent work where one job may need:

  • a serving process plus a client
  • a database plus a worker
  • a training step plus checkpoint export and resume handling

Why Docker Compose Does Not Fit Slurm Directly

Docker Compose is good at expressing a small multi-service application on one machine. Slurm solves a different problem: scheduling one batch allocation onto shared cluster resources.

That mismatch shows up in exactly the features hpc-compose leaves out:

  • ports
  • custom networks
  • Compose restart
  • deploy
  • broad runtime compatibility with arbitrary Compose features

Those omissions are deliberate. The point is not to emulate all of Compose on a cluster. The point is to keep a familiar authoring shape for the subset that maps cleanly to one Slurm job.

The Narrow Execution Model

hpc-compose keeps the execution model explicit:

compose-like spec
      |
      +--> plan / validate / render on the submission host
      |
      +--> one generated batch script
                |
                v
          one Slurm allocation
                |
                +--> primary-node helper services
                +--> optional allocation-wide distributed service
                +--> optional explicitly partitioned service steps
                +--> shared /hpc-compose/job scratch for coordination

This gives you a few important properties:

  • one inspectable unit of submission
  • one obvious place to look when the job fails
  • one explicit product boundary instead of hidden orchestration behavior

One Real Example

app-redis-worker.yaml is a good example of the intended shape:

  • one Redis service
  • one dependent worker service
  • TCP readiness gating before the worker starts
  • both services living inside the same allocation

That is awkward to hand-roll repeatedly with cluster scripts alone, but it does not justify a full orchestrator. This is the exact middle ground hpc-compose targets.

If you want the smallest possible first run, start with minimal-batch.yaml. If you want the smallest concrete inference flow, start with llm-curl-workflow-workdir.yaml.

Why the Inspectable Path Matters

The authoring flow is designed to answer the practical questions before you launch:

hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml

That lets you confirm:

  • whether the spec is valid
  • what service order will run
  • what image and cache behavior the planner inferred
  • what batch script you are actually handing to Slurm

For a Slurm-first tool, that inspectability matters more than feature breadth.

When Not To Use hpc-compose

Do not use hpc-compose when you need:

  • custom container networking
  • broad Docker Compose compatibility
  • a long-running orchestration control plane
  • dynamic cross-node scheduling instead of explicit x-slurm.placement node selectors

If that list rules out your workload, that is not a failure of the tool. It is the intended product boundary.

Runbook

This runbook is the normal real-cluster flow for adapting a hpc-compose spec on a supported Linux Slurm submission host.

If you are new to Slurm, read Slurm And Container Basics first. If you are adapting to HAICORE@KIT, read HAICORE Guide alongside this runbook.

Commands below assume hpc-compose is on your PATH. If you are running from a local checkout, replace hpc-compose with target/release/hpc-compose.

Compose-aware commands accept -f / --file. When omitted, hpc-compose uses the active context compose file from .hpc-compose/settings.toml, then falls back to compose.yaml in the current directory. Global context flags are available everywhere:

  • --profile <NAME> selects a profile from .hpc-compose/settings.toml.
  • --settings-file <PATH> uses an explicit settings file instead of upward auto-discovery.

Read Slurm And Container Basics, Execution Model, Runtime Backends, and Support Matrix before adapting a workflow to a new cluster.

Before You Start

Make sure you have:

  • a Linux submission host with srun and sbatch,
  • the runtime backend selected by runtime.backend,
  • scontrol when x-slurm.nodes > 1,
  • Pyxis support in srun when runtime.backend: pyxis (srun --help should mention --container-image),
  • shared storage for the resolved cache directory,
  • local source trees or local .sqsh / .sif images in place,
  • registry credentials when your cluster or registry requires them.

Backend-specific requirements are listed in Runtime Backends. Cluster profile generation and MPI smoke probes are covered in Cluster Profiles.

Normal Progression

For a new spec on a real cluster:

  1. Choose a starter from Examples, or run hpc-compose new --template <name> --name my-app --output compose.yaml.
  2. Run hpc-compose setup once if you want compose path, env files, env vars, and binary overrides stored in a project-local settings file.
  3. Run hpc-compose context --format json to verify resolved values and sources.
  4. Set or confirm the resolved cache directory, then adjust cluster-specific resource settings.
  5. Run hpc-compose plan -f compose.yaml and hpc-compose plan --verbose -f compose.yaml while adapting the file.
  6. Run hpc-compose up -f compose.yaml for the normal cluster run.
  7. If it fails, start with hpc-compose debug -f compose.yaml --preflight, then use Troubleshooting and break out preflight, prepare, render, status, ps, watch, stats, or logs separately.

For a minimal cluster smoke test from a checkout, set CACHE_DIR to shared storage and run scripts/cluster_smoke.sh. It validates, preflights, and renders by default; set HPC_COMPOSE_SMOKE_SUBMIT=1 only when you intentionally want it to launch the smoke job.

Project-Local Settings

hpc-compose can discover .hpc-compose/settings.toml by walking upward from the current directory. You can also pin a file with --settings-file.

Typical setup flow:

hpc-compose setup
hpc-compose context
hpc-compose --profile dev context --format json

Non-interactive setup is available for scripting:

hpc-compose setup --profile-name dev --compose-file compose.yaml --env-file .env --env-file .env.dev --cache-dir '<shared-cache-dir>' --default-profile dev --non-interactive

Settings file shape:

version = 1
default_profile = "dev"

[defaults]
compose_file = "compose.yaml"
env_files = [".env"]

[defaults.env]
CACHE_DIR = "/cluster/shared/hpc-compose-cache"

[defaults.cache]
dir = "/cluster/shared/hpc-compose-cache"

[profiles.dev]
compose_file = "compose.yaml"
env_files = [".env", ".env.dev"]

[profiles.dev.env]
RESUME_DIR = "/shared/$USER/runs/my-run"
MODEL_DIR = "$HOME/models"

[profiles.dev.cache]
dir = "/cluster/shared/dev-hpc-compose-cache"

[resource_profiles.cpu-small]
time = "00:30:00"
cpus_per_task = 4
mem = "16G"

[resource_profiles.gpu-small]
partition = "gpu"
time = "01:00:00"
gpus = 1
cpus_per_task = 8
mem = "32G"

Resolution precedence is fixed:

  1. CLI flags
  2. selected profile values
  3. shared settings defaults
  4. built-in CLI defaults

Use context whenever you want to inspect effective compose path, binaries, interpolation variables, runtime paths, and per-field sources.

Resource profiles are referenced from YAML with x-slurm.resources: gpu-small. They are Slurm resource defaults, not the same thing as the global --profile setting selector, and explicit x-slurm values in the spec override profile defaults.

Choose A Starting Example

The maintained selection guide is Examples. It includes:

  • four promoted beginner paths,
  • a novice ladder from authoring to distributed workloads,
  • the full repository example matrix,
  • companion notes for LLM worker examples,
  • an adaptation checklist.

Keep docs/src/examples.md as the single source of example selection truth. The embedded YAML source appendix is Example Source.

1. Choose A Cache Directory Early

Set the cache default to a path visible from both the login node and compute nodes:

[profiles.dev.cache]
dir = "/cluster/shared/hpc-compose-cache"

Or set x-slurm.cache_dir directly in the spec when the cache path should travel with that file:

x-slurm:
  cache_dir: /cluster/shared/hpc-compose-cache

Quick recipe:

export CACHE_DIR=/cluster/shared/hpc-compose-cache
mkdir -p "$CACHE_DIR"
test -w "$CACHE_DIR"

Rules:

  • Do not use /tmp, /var/tmp, /private/tmp, or /dev/shm.
  • If cache_dir is unset in the spec, resolution checks profile cache settings, then defaults cache settings, then $HOME/.cache/hpc-compose.
  • The default may work on some clusters, but a shared project/work/scratch path is safer.
  • Validation can accept unsafe local paths; preflight reports them as policy errors.

More cache details are in Cache Management.

2. Adapt The Example

Start with the nearest example and then change:

  • image
  • command / entrypoint
  • volumes
  • environment
  • x-slurm resource settings
  • x-runtime.prepare commands for dependencies or tooling

Recommended pattern:

  • Put fast-changing application code in volumes.
  • Put slower-changing dependency installation in x-runtime.prepare.commands.
  • Add readiness only to services that other services truly depend on.

3. Validate The Spec

hpc-compose validate -f compose.yaml
hpc-compose validate -f compose.yaml --strict-env

Use validate first when changing field names, dependency shape, command/entrypoint form, paths, x-slurm, x-runtime, or compatibility x-enroot blocks.

If validate fails, fix that before doing anything more expensive. Use --strict-env when missing interpolation variables should fail instead of consuming ${VAR:-default} or ${VAR-default} fallbacks.

4. Plan The Run

hpc-compose plan -f compose.yaml
hpc-compose plan --verbose -f compose.yaml
hpc-compose plan --show-script -f compose.yaml

Check:

  • service order,
  • allocation geometry and service step geometry,
  • normalized image references,
  • host-to-container mount mappings,
  • resolved environment values,
  • runtime artifact paths,
  • cache hit/miss expectations.

plan is purely static: it parses, validates, builds the normalized runtime plan, and can print the generated script to stdout, but it does not run preflight, prepare images, call sbatch, or write hpc-compose.sbatch. Add --explain for planner hints about cache paths, missing artifacts, resume/artifact settings, and the next command. plan --verbose can print secrets from resolved environment values.

5. Normal Run: Use up

hpc-compose up -f compose.yaml

up is the preferred end-to-end cluster flow. It runs preflight unless disabled, prepares images unless skipped, renders the script, calls sbatch, records tracked job metadata, polls scheduler state, and streams logs. It also uses a spec-scoped lock under .hpc-compose/locks/ so two concurrent up invocations against the same compose file do not race through prepare/render/submit.

Useful options:

  • --script-out path/to/job.sbatch keeps a copy of the rendered script.
  • --force-rebuild refreshes imported and prepared artifacts.
  • --skip-prepare reuses existing prepared artifacts.
  • --no-preflight skips the preflight phase.
  • --detach submits or launches, records tracking metadata, and returns without watching.
  • --format text|json is accepted with --detach or --dry-run.
  • --watch-queue waits in line-oriented queue output until the Slurm job reaches RUNNING, then opens the normal watch view.
  • --queue-warn-after <DURATION> warns once when --watch-queue stays PENDING longer than the threshold; the default is 10m, and 0 disables the warning.
  • --watch-mode auto|tui|line selects the live output mode; --no-tui is a line-mode alias.
  • --hold-on-exit never|failure|always controls whether the TUI stays open after the job reaches a terminal scheduler state.
  • --resume-diff-only prints resume-sensitive config diffs without launching.
  • --allow-resume-changes confirms intentional resume-coupled config drift.

up --local is Linux + Pyxis-only and single-host. See Runtime Backends.

Array jobs should be submitted with up --detach; use SLURM_ARRAY_TASK_ID in the service command and output patterns such as %A_%a for task-specific logs. Scheduler dependencies declared with x-slurm.after_job or x-slurm.dependency are passed to sbatch --dependency=... at submit time. Arrays and scheduler dependencies are not supported by up --local.

For conditional submission on a busy partition, use when:

hpc-compose when -f compose.yaml --partition gpu8 --free-nodes 4
hpc-compose when -f compose.yaml --after-job 12345
hpc-compose when -f compose.yaml --between 22:00-06:00

when is a foreground monitor. Interrupt it with Ctrl-C to stop waiting before the job is submitted. It runs preflight, image preparation, and script rendering before the wait begins, so submission is immediate once the conditions match; use --skip-prepare only when the required runtime artifacts already exist. --detach applies after submission: it still waits in the foreground for conditions, then returns after tracking metadata is written instead of opening the watch view.

Idle-node checks are advisory, not reservations. Another user can still submit first, and Slurm may queue the job after when calls sbatch. Keep polling gentle on shared login nodes: the default 60s interval is a good starting point, and intervals below 30s should be reserved for short, intentional watches.

For interactive development inside one allocation, use alloc:

hpc-compose alloc -f compose.yaml
hpc-compose run app -- python -m pytest

Inside the allocation shell, run SERVICE -- CMD reuses the active allocation with srun instead of submitting a new sbatch job. alloc exports HPC_COMPOSE_* metadata for the compose file, cache directory, runtime backend, and allocated nodes.

6. Run Preflight When Debugging Cluster Readiness

hpc-compose preflight -f compose.yaml
hpc-compose preflight --verbose -f compose.yaml
hpc-compose preflight -f compose.yaml --strict

preflight checks selected-backend tools, Slurm tools, cache path policy, local mounts/images, registry credentials, cluster profile compatibility, distributed-readiness hazards, metrics collector tools, and resume path safety.

Generate a cluster capability profile on the target login node when you want validation and preflight to catch partition/backend/QOS/GPU/MPI mismatches earlier:

hpc-compose doctor cluster-report

See Cluster Profiles for generated profile details, site policy packs, and MPI smoke probes.

7. Prepare Images Separately When Needed

hpc-compose prepare -f compose.yaml
hpc-compose prepare -f compose.yaml --force

Use this when you want to build or refresh prepared images before submission, confirm cache reuse behavior, or debug preparation separately from job submission.

prepare needs the selected runtime backend tools, but it does not call sbatch.

8. Render The Batch Script

hpc-compose render -f compose.yaml --output /tmp/job.sbatch

This is useful when debugging generated srun arguments, mounts, environment passing, launch order, and readiness waits.

9. Inspect A Tracked Run

hpc-compose jobs list
hpc-compose status -f compose.yaml
hpc-compose status -f compose.yaml --array
hpc-compose ps -f compose.yaml
hpc-compose watch -f compose.yaml
hpc-compose replay -f compose.yaml --speed 10
hpc-compose logs -f compose.yaml --service app --follow
hpc-compose stats -f compose.yaml --format jsonl

For a failed run, a practical investigation path is hpc-compose jobs list, then hpc-compose replay -f compose.yaml --job-id <job-id> to find the failure moment, then debug, logs, or stats for deeper evidence. Use Runtime Observability for tracked state, replay, logs, metrics, and machine-readable output. Use Artifacts and Resume for artifact bundles and resume-aware attempts.

10. Manage Cache And Old State

hpc-compose cache list
hpc-compose cache inspect -f compose.yaml
hpc-compose cache prune --all-unused -f compose.yaml
hpc-compose cache prune --age 7 --cache-dir '<shared-cache-dir>'
hpc-compose clean -f compose.yaml --age 7 --dry-run

Use Cache Management for cache reuse and pruning. Use Troubleshooting before deleting tracked job directories.

What Changed And What Should I Run?

If you changed…Typical next step
YAML planning/runtime settings onlyplan --verbose, then up
Base image, x-runtime.prepare.commands, or prepare envup --force-rebuild, or prepare --force when debugging separately
Mounted runtime source under volumesUsually just up
Cache entries this plan no longer referencescache prune --all-unused -f compose.yaml
hpc-compose itselfExpect cache misses on the next prepare or up, then optionally prune old entries

Development Workflow

test, dev, and tmux are the v1 local-development layer. They reuse the same prepare, render, local supervisor, runtime state, and tracking paths as up, so a run started by one command remains visible to status, ps, logs, stats, watch, and debug.

Smoke-Test Specs

Use test for finite specs that prove a workflow starts, satisfies readiness gates, and exits cleanly:

hpc-compose test --local -f examples/dev-python-smoke.yaml
hpc-compose test --submit --time 00:01:00 --timeout 180s -f compose.smoke.yaml
hpc-compose test --submit --format json -f compose.smoke.yaml

test requires exactly one execution mode:

  • --local runs the rendered local supervisor on the current host.
  • --submit calls sbatch; it defaults to --time 00:01:00 and --timeout 180s.

A smoke test passes only when every service:

  • appears in tracked runtime state,
  • launched at least once,
  • passed readiness when readiness is configured,
  • completed successfully.

Services with failure_policy.mode: ignore still have to complete successfully for test to pass. That makes smoke tests stricter than production runs by design: ignored sidecars are useful operationally, but they should not silently hide a broken spec test.

Making Long-Running Specs Finite

Production services often run forever. For smoke tests, create a finite variant of the spec or override the service command in a copied file:

services:
  app:
    image: python:3.11-slim
    working_dir: /workspace
    volumes:
      - ./app:/workspace
    command:
      - python
      - -c
      - "import main; print('smoke ok', flush=True)"

Keep the same image, mounts, environment, dependencies, and readiness where possible. Change only the command or entrypoint needed to prove startup and exit. If a dependent service uses condition: service_healthy, keep the upstream readiness probe real enough to catch wiring mistakes.

Hot Reload

dev is local-only:

hpc-compose dev -f examples/dev-python-app.yaml
hpc-compose dev -f compose.yaml --watch-path ./src --debounce-ms 500

It infers watch roots from host directories mounted through service volumes. File mounts, container-only paths, cache paths, missing paths, and non-directory paths are ignored. --watch-path adds an explicit directory and restarts every service when it changes.

File changes write restart requests into the tracked run’s dev control directory. The local supervisor handles those requests as development restarts, so readiness and completion state reset for the affected service without consuming failure_policy.restart_on_failure counters.

By default, Ctrl-C stops the local supervisor. Add --keep-running when you want to leave the tracked local run alive after exiting the watch loop.

Tmux Dashboard

tmux is a log dashboard, not a process supervisor:

hpc-compose tmux -f compose.yaml
hpc-compose tmux -f compose.yaml --job-id local-123
hpc-compose tmux -f compose.yaml --session demo --no-attach

Without --job-id, it launches a new local run. With --job-id, it attaches to an existing tracked local run. Each pane tails one service log with tail -F, and pane titles use service names. Use --no-attach when running from a non-interactive terminal or CI smoke check.

Shared Local Constraints

up --local, test --local, dev, and tmux share the same current constraints:

  • Linux hosts only
  • runtime.backend: pyxis only
  • Pyxis-compatible Enroot tooling on the host
  • single-host specs only
  • no distributed or partitioned placement
  • no service-level MPI
  • no Slurm arrays or scheduler dependencies

Use these commands to author and debug single-host launch behavior. Use test --submit or up on a Slurm login node for real scheduler behavior.

Example Recipe

The source-mounted app in examples/dev-python-app.yaml is intentionally long-running, so it is a good dev target:

hpc-compose dev -f examples/dev-python-app.yaml
hpc-compose tmux -f examples/dev-python-app.yaml --no-attach

The companion examples/dev-python-smoke.yaml keeps the same mounted source pattern but uses a finite command:

hpc-compose test --local -f examples/dev-python-smoke.yaml
hpc-compose test --submit --time 00:01:00 -f examples/dev-python-smoke.yaml

Troubleshooting

Use this page when the safe authoring path worked but the first real cluster run failed.

For background on Slurm allocations, sbatch, srun, Pyxis, and Enroot, see Slurm And Container Basics. For HAICORE-specific storage and runtime checks, see HAICORE Guide.

First Triage

hpc-compose validate -f compose.yaml
hpc-compose validate -f compose.yaml --strict-env
hpc-compose plan --verbose -f compose.yaml
hpc-compose debug -f compose.yaml --preflight

plan --verbose can print resolved environment values and final mount mappings. Treat its output as sensitive when the spec contains secrets. debug is read-only unless --preflight is passed; with --preflight, it reruns prerequisite checks and includes those findings in the triage report.

Common Symptoms

SymptomLikely causeNext step
required binary '...' was not foundSelected backend or Slurm client tool is not on PATH.Run debug --preflight; pass --enroot-bin, --apptainer-bin, --singularity-bin, --srun-bin, or --sbatch-bin as needed.
srun does not advertise --container-imagePyxis support is unavailable or not loaded.Move to a supported login node, load the site module, or choose another backend.
Cache directory warning/errorThe resolved cache directory is not shared, writable, or policy-safe.Choose a shared project/work/scratch path through x-slurm.cache_dir or setup --cache-dir, then rerun debug --preflight.
Missing local mount or image pathRelative paths are resolved from the compose file directory.Check paths relative to the copied compose.yaml.
Mounted symlink exists on the host but fails in the containerThe symlink target is outside the mounted directory.Copy the real file into the mounted directory or mount the target directory.
Anonymous pull or registry warningRegistry credentials are missing or rate limits apply.Configure credentials before relying on private or rate-limited images.
Services start in the wrong orderDependency condition or readiness is too weak.Use service_healthy with readiness, or service_completed_successfully for DAG stages.
No service logs existThe batch script failed before launching a service.Use debug to see scheduler state, the tracked top-level batch log tail, and missing-log hints.
dev reports no watchable source directoriesServices only mount files, missing paths, cache paths, or container-only paths.Mount the source as a host directory or pass hpc-compose dev --watch-path ./src -f compose.yaml.
Readiness never passesProbe target, pattern, host, or dependency timing does not match the real service.Inspect the service log with logs --service <name> and try a finite hpc-compose test --local or short test --submit spec.
Smoke test times outThe spec is long-running, readiness blocks forever, or the scheduler job never reaches terminal state.Make the smoke spec finite, lower service readiness timeouts, and use --format json to inspect the failed phase and service reason.
tmux is unavailable or attach failstmux is not installed or the shell is non-interactive.Install tmux, pass --tmux-bin <PATH>, or create the dashboard with --no-attach.
Local mode is unsupportedLocal workflows require a Linux host with Pyxis-compatible Enroot behavior.Use authoring commands on non-Linux hosts, then run test --submit or up on a supported Slurm login node.

Readiness Issues

Use depends_on with condition: service_healthy when a dependent must wait for a dependency’s readiness probe. Plain list form means service_started.

Use condition: service_completed_successfully for one-shot DAG stages where the next service should start only after the previous stage exits with status 0, such as preprocess -> train -> postprocess.

When a TCP port opens before the service is fully usable, prefer HTTP or log-based readiness over TCP readiness.

For hpc-compose test, readiness failures are terminal smoke-test failures. A service with configured readiness must become healthy and then complete successfully; ignored sidecars are still expected to pass in a smoke spec.

Preview A Run

Use plan for the static preview. It never prepares images, runs preflight, calls sbatch, or writes hpc-compose.sbatch:

hpc-compose plan --show-script -f compose.yaml

Use up --dry-run only when you intentionally want to exercise preflight, prepare, and render without calling sbatch:

hpc-compose up --dry-run -f compose.yaml

Clean Old Tracked Runs

Tracked job metadata and logs accumulate in .hpc-compose/. Preview cleanup before deleting:

hpc-compose jobs list --disk-usage
hpc-compose clean -f compose.yaml --age 7 --dry-run
hpc-compose clean -f compose.yaml --age 7

Cluster Profiles

Cluster profiles let validate and preflight compare a spec against site-specific Slurm, runtime, MPI, storage, and policy hints.

For HAICORE-specific resource, workspace, and container notes, see HAICORE Guide.

Generate a best-effort profile on the target login node:

hpc-compose doctor cluster-report

This writes .hpc-compose/cluster.toml by default. Use --out - to print TOML instead.

For a live advisory snapshot of current conditions, use:

hpc-compose weather

weather reads stable labels and hints from the discovered cluster profile when present, but live node, queue, fairshare, and priority data come from one-shot Slurm probes and are not persisted in .hpc-compose/cluster.toml.

What Gets Discovered

The profile generator uses available local tools and environment hints:

  • sinfo, scontrol, and srun --mpi=list
  • selected runtime binaries
  • shared-path environment hints
  • loaded MPI stack hints from PATH, MPI_HOME, MPI_DIR, I_MPI_ROOT, EBROOTOPENMPI, and EBROOTMPICH
  • editable distributed defaults such as rendezvous port and [distributed.env]

It does not run module avail. Module-only MPI installations can be added manually to the generated mpi_installations list.

Site Policy Packs

Support teams can edit optional sections such as:

  • [site]
  • [[software.modules]]
  • [[filesystems]]
  • [gpu]
  • [network]
  • [containers]
  • [slurm.defaults]
  • [slurm.required]

Policy sections warn and suggest snippets. They do not silently add modules, bind mounts, environment variables, or SBATCH directives to user specs.

MPI Smoke Probe

For MPI services, render a small rank-count probe against the service’s real runtime path:

hpc-compose doctor mpi-smoke -f compose.yaml --service trainer --script-out mpi-smoke.sbatch

Submit it only when you intentionally want to consume a Slurm allocation:

hpc-compose doctor mpi-smoke -f compose.yaml --service trainer --submit

The smoke plan keeps allocation and MPI launch settings but strips application workflow blocks such as setup, scratch staging, resume metadata, artifacts, and burst-buffer directives.

Fabric Smoke Probe

For distributed GPU or fabric-sensitive services, render a broader smoke probe:

hpc-compose doctor fabric-smoke -f compose.yaml --service trainer --checks auto --script-out fabric-smoke.sbatch

--checks auto always includes the MPI rank probe, adds NCCL when the selected service requests GPU resources, and collects UCX, OFI, and InfiniBand diagnostics when the corresponding tools are available. Use an explicit list such as --checks mpi,nccl when a missing tool should fail the probe instead of being reported as skipped.

HAICORE Guide

This page collects hpc-compose configuration notes for HAICORE@KIT. It is a practical starting point, not a replacement for the official NHR@KIT HAICORE documentation.

Before long or expensive runs, re-check current HAICORE policy pages for partitions, quotas, GPU limits, container requirements, and filesystem lifetime rules.

Where Commands Run

HAICORE is accessed through the login host documented by NHR@KIT:

ssh <username>@haicore.scc.kit.edu

Use the login node for editing, Git operations, hpc-compose plan, hpc-compose preflight, image preparation, and Slurm job management. Run compute work through Slurm with hpc-compose up, sbatch, or site-approved interactive Slurm commands.

Do not treat the login node as a place for long Python training, GPU work, data conversion, or large preprocessing jobs. Those belong inside a Slurm allocation.

HAICORE Slurm Settings To Know

The current HAICORE batch-system documentation describes Slurm partitions named normal and advanced. The normal partition is the general starting point; advanced requires special permission and allows larger jobs.

Common settings you will map into hpc-compose:

HAICORE / Slurm settinghpc-compose fieldNotes
Partitionx-slurm.partitionUsually start with the site-documented general partition.
Account/projectx-slurm.accountUse the account string assigned by the site or project.
Wall timex-slurm.timeKeep smoke tests short; request only what the run needs.
Nodesx-slurm.nodesnormal is documented for single-node jobs; confirm before multi-node runs.
Tasksx-slurm.ntasks, service x-slurm.ntasksProcess/rank count.
CPUs per taskx-slurm.cpus_per_task, service x-slurm.cpus_per_taskCPU threads per process/rank.
Memoryx-slurm.memScheduler/runtime memory request, not storage.
Full GPUsx-slurm.gres or service x-slurm.gresHAICORE examples use gpu:full:N style requests.
MIG GPUsx-slurm.gres or service x-slurm.gresHAICORE documents MIG profiles such as gpu:1g.5gb:1; confirm current names.
Constraintsx-slurm.constraint or x-slurm.submit_argsHAICORE documents constraints such as LSDF and BEEOND.

Example single-node GPU starting point:

name: haicore-smoke

x-slurm:
  job_name: haicore-smoke
  partition: normal
  account: <account>
  time: "00:10:00"
  nodes: 1
  cpus_per_task: 4
  mem: 16G
  gres: gpu:full:1
  cache_dir: <workspace-path>/hpc-compose-cache

services:
  app:
    image: python:3.11-slim
    command: python -c "import os, socket; print(socket.gethostname()); print(os.environ.get('SLURM_JOB_ID'))"

Preview before submitting:

hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml
hpc-compose preflight -f compose.yaml

Workspaces And Storage

HAICORE documents several storage types. For hpc-compose, the most important distinction is shared persistent-enough storage versus job-local temporary storage.

StorageUse with hpc-composeAvoid using it for
$HOMESmall configuration, source code, shell setup, credentials handled under site policy.Large image caches, datasets, checkpoints, or logs from many jobs.
Workspacex-slurm.cache_dir, Enroot data/cache, datasets, model files, run logs, artifacts, checkpoints.Data that must be backed up elsewhere; workspaces are documented as not backed up and time-limited.
$TMPDIRFast node-local temporary files created and consumed within one job.x-slurm.cache_dir or anything needed by login-node prepare and later compute-node runtime.
BeeONDJob-local shared scratch across nodes when explicitly requested.Long-term cache, persistent checkpoints, or files needed after the job unless copied out.

Create and locate a workspace with HAICORE’s workspace tools:

ws_allocate <workspace-name> <duration>
ws_find <workspace-name>
ws_list
ws_extend <workspace-name> <duration>

Use the path from ws_find for the cache:

export CACHE_DIR=<workspace-path>/hpc-compose-cache
mkdir -p "$CACHE_DIR"
test -w "$CACHE_DIR"

Then set it in your spec:

x-slurm:
  cache_dir: ${CACHE_DIR}

The official HAICORE filesystem page documents workspace lifetime, extension limits, quotas, and backup policy. Treat workspace expiration as operational risk: long-running projects should have a habit of checking ws_list and copying durable results to the correct long-term location.

Containers On HAICORE

The official HAICORE container documentation says native Docker and rootless Docker are not supported on the HPC systems. The relevant paths are site-supported HPC runtimes, including Enroot/Pyxis and Apptainer.

For the default hpc-compose backend:

runtime:
  backend: pyxis

Validate Pyxis support on the login node:

srun --help | grep container-image
hpc-compose preflight -f compose.yaml

HAICORE documents Pyxis as the Slurm integration for Enroot and lists container options such as --container-image, --container-name, --container-mounts, --container-mount-home, --container-writable, and --container-remap-root.

The HAICORE docs also list site-required Pyxis mounts for Slurm integration. Because mount paths are site policy and can change, inspect the current HAICORE container page before copying them into a spec. When needed, pass site-specific Pyxis flags through service-level extra_srun_args:

services:
  app:
    image: python:3.11-slim
    command: python -c "print('hello from HAICORE')"
    x-slurm:
      extra_srun_args:
        - "--container-mounts=<site-required-mounts>"

If the cluster recommends Apptainer for your workflow or Pyxis is not available in srun, choose the corresponding backend:

runtime:
  backend: apptainer

See Runtime Backends for the backend behavior and required tools.

Enroot Cache Placement

HAICORE documents Enroot as available by default, with default data paths under the user’s home directory. For repeated container jobs, large images, or quota-sensitive projects, place runtime cache/data under a workspace-backed x-slurm.cache_dir.

hpc-compose sets per-job Enroot runtime paths below the configured cache directory. That keeps image runtime state close to the job and avoids filling $HOME accidentally.

BeeOND And Job-Local Scratch

HAICORE documents BeeOND as a job-local filesystem requested through a Slurm constraint:

x-slurm:
  constraint: BEEOND

Use BeeOND for temporary high-throughput working data inside a job, then copy durable results back to a workspace or other approved persistent location. Do not put x-slurm.cache_dir on BeeOND because the cache must exist before the job and be reusable by later jobs.

Software Modules

HAICORE software is exposed through Lmod environment modules. For host-runtime or MPI workflows, keep module setup explicit in x-slurm.setup:

x-slurm:
  setup:
    - module purge
    - module avail
    - module load <module-name>

Do not leave module avail in production scripts if it produces too much output; it is useful while discovering the environment. Use module list in smoke tests when you need the batch log to record the active software stack.

Suggested First HAICORE Checklist

Run these on the HAICORE login node before the first real job:

ws_find <workspace-name>
sinfo
srun --help | grep container-image
hpc-compose plan --show-script -f compose.yaml
hpc-compose preflight -f compose.yaml
hpc-compose doctor cluster-report --out .hpc-compose/haicore-cluster.toml

Check the rendered script for:

  • the intended #SBATCH --partition,
  • the intended account/project,
  • a short wall time for smoke tests,
  • a workspace-backed cache_dir,
  • expected GPU or MIG request,
  • expected srun --container-* options when using Pyxis.

Submit only after the static plan and preflight output are understandable:

hpc-compose up --detach -f compose.yaml
hpc-compose status -f compose.yaml
hpc-compose logs -f compose.yaml --follow

Common HAICORE Failure Modes

SymptomLikely causeWhat to check
Workspace path is missingWorkspace expired or wrong name/path was used.ws_list and ws_find <workspace-name>.
Cache path fails preflightPath is not shared, writable, or policy-safe.Move x-slurm.cache_dir to a workspace path.
--container-image is unknownPyxis is not active in the current Slurm environment.`srun –help
Job is rejected for partition/accountSite policy or project/account mismatch.HAICORE batch docs, sacctmgr/support guidance, rendered #SBATCH lines.
GPU request is rejectedWrong gres name, too many GPUs, or partition limit.HAICORE batch docs and a tiny smoke job.
Job starts but cannot see dataData is on node-local storage or an unmounted path.Use workspace paths or explicit volumes.
Workspace fills or expiresContainer cache, datasets, checkpoints, or logs accumulated.ws_list, quota tools, cache cleanup, artifact retention policy.

Official HAICORE References

Runtime Observability

After a submission, hpc-compose records tracked metadata under:

${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/

That directory lets follow-up commands reconnect without resubmitting.

Common Commands

hpc-compose status -f compose.yaml
hpc-compose ps -f compose.yaml
hpc-compose watch -f compose.yaml
hpc-compose watch -f compose.yaml --hold-on-exit always
hpc-compose replay -f compose.yaml --speed 10
hpc-compose replay -f compose.yaml --no-tui
hpc-compose logs -f compose.yaml --follow
hpc-compose logs -f compose.yaml --grep 'error|oom' --since 30m
hpc-compose stats -f compose.yaml
hpc-compose stats -f compose.yaml --accounting
hpc-compose inspect -f compose.yaml --rightsize
hpc-compose score 12345
hpc-compose germinate -f compose.yaml
hpc-compose sweep status -f compose.yaml
hpc-compose sweep list -f compose.yaml
hpc-compose diff 12345 12346 -f compose.yaml
CommandUse it for
statusScheduler state, batch log path, runtime paths, and failure-policy state.
psStable per-service snapshot with readiness, status, restart counters, and log path.
watchLive terminal UI; falls back to line-oriented output on non-interactive terminals.
replayBest-effort DVR for a tracked run, reconstructed from existing runtime artifacts.
logsText log output, optionally focused, searched, or coarsely time-filtered.
statsTracked metrics, Slurm step statistics, and optional accounting rollups.
inspect --rightsizePost-run request-versus-usage recommendations for memory, CPUs, GPUs, and walltime.
score0-100 post-run efficiency score with GPU, memory, compute-time, and kWh components.
germinateOne-minute canary submission that writes latest-canary.json and recommends resource settings from fresh metrics.
sweep statusAggregate persisted sweep trials into completed, failed, running, pending, unknown, missing-tracking, and submit-failed counts.
sweep listList prior sweep manifests without querying the scheduler.
diffCompact comparison between two tracked submissions.

Use --format json on non-streaming commands when automation needs stable fields. stats also supports --format csv and --format jsonl.

Watch UI

On an interactive terminal, watch and the default up follow mode open a live view with service state on the left and log output on the right. The UI automatically switches to a compact single-column view on narrow or short terminals. It keeps a detailed status view while the job runs and, by default, holds the final screen on failures so the failing service, final scheduler state, and next diagnostic commands stay visible.

Keybindings:

KeyAction
j, Down, TabMove to the next service.
k, UpMove to the previous service.
g / GJump to the first or last service.
/Filter services by name; press Enter to apply or Esc to cancel.
SpacePause or resume log following.
PgUp / PgDnScroll the visible log pane while paused.
EndReturn to live-follow mode at the newest log lines.
aToggle between the selected service log and all tracked service logs.
?Toggle in-UI help.
qLeave the watch view without cancelling the job.

Use --hold-on-exit never|failure|always on up or watch to control whether the final TUI stays open after a terminal scheduler state. When the view is held, press d, l, or s to print the exact debug, logs, or stats command after leaving the alternate screen.

Use hpc-compose up --watch-queue when you want explicit queue polling before the watch view opens. It prints queue state changes, pending reason, and expected start time when Slurm exposes them; --queue-warn-after <DURATION> controls the one-time long-pending warning.

Use --watch-mode line or --no-tui when you are recording output, using a screen reader, running in CI, or working in a terminal where alternate-screen UIs are inconvenient. Line mode preserves detailed scheduler and log updates without alternate-screen control codes.

Replay

hpc-compose replay reconstructs a best-effort execution timeline after the run. It reuses the watch-style view, but reads only artifacts that already exist under the tracked job directory. This makes it useful for rewinding to the time a service failed, comparing the nearest prior metrics sample, or sharing a deterministic text/JSON summary without querying Slurm again.

hpc-compose replay -f compose.yaml
hpc-compose replay -f compose.yaml --speed 10
hpc-compose replay -f compose.yaml --job-id 12345 --service trainer
hpc-compose replay -f compose.yaml --no-tui
hpc-compose replay -f compose.yaml --format json

Replay controls:

KeyAction
SpacePause or play the replay.
+ / -Move between speed presets such as 1x, 10x, and 100x.
Left / RightSeek backward or forward by five seconds.
[ / ]Jump to the previous or next reconstructed event.
Home / EndJump to the first or final replay frame.
/, a, PgUp, PgDn, qSame filter, log-pane, scroll, and quit behavior as watch.

Replay data sources:

SourceWhat replay usesFidelity notes
state.jsonFinal per-service state, start/finish times, exit code fallback, placement metadataThis file is overwritten during the run, so intermediate readiness and scheduler transitions are not exact.
service-exits/*.jsonlAppend-only service exit markers and restart evidenceMultiple exits reconstruct failure/restart sequences, but accepted restart relaunch time is inferred.
metrics/*.jsonlHistorical GPU and Slurm sampler rowsReplay shows the latest metrics sample at or before the cursor and never displays future metrics as current.
logs/*.logService log tails in the replay UIService logs do not include guaranteed per-line timestamps, so log panes are contextual tails, not exact log-time scrubbing.
Scheduler commandsNot queried during replayHistorical queue state, pending reason changes, and accounting gaps are not reconstructed.

Use --no-tui for a static summary that exits immediately. Use --format json when notebooks, dashboards, or experiment records need the reconstructed events, frame summaries, artifact paths, and fidelity notes.

Logs

Runtime logs live under:

${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/logs/<service>.log

Slurm may also write a top-level batch log such as slurm-<jobid>.out, or to the path configured with x-slurm.output. Check the batch log first when a job fails before any service log appears.

Service names containing non-alphanumeric characters are encoded in log filenames. Prefer [a-zA-Z0-9_-] in service names for readability.

Use --grep <pattern> to print only matching raw log lines across selected service logs. Use --since <duration> for coarse time-bounded initial output, for example 30s, 15m, 2h, 1d, or 1h30m. Because service logs do not include line timestamps, --since filters by each log file’s modification time rather than by individual line time. Follow mode still starts from the current end of each selected log and applies --grep to appended lines.

Event Hooks

Per-service x-slurm.hooks can run host-side observability scripts when restart_on_failure accepts a restart or when the rolling restart window blocks a crash loop. Hook stdout/stderr is appended to that service’s log, and non-zero hook exits are logged without changing the restart or failure outcome.

Use on: restart for retry notifications and on: window_exhausted for crash-loop alerts. Event hooks receive service identity, exit code, Slurm attempt, and restart-window counters through HPC_COMPOSE_* environment variables; see Spec reference for the full list.

Metrics

When x-slurm.metrics is enabled, sampler files are written under:

${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/metrics/
  meta.json
  gpu.jsonl
  gpu_processes.jsonl
  slurm.jsonl
  diagnostics/

The sampler can collect GPU snapshots through nvidia-smi and job-step CPU/memory snapshots through sstat. Collector failures are best-effort: missing nvidia-smi, missing sstat, or unsupported queries do not fail the batch job itself.

Add --accounting to stats when you need post-run sacct rollups for reporting. The accounting summary includes allocated CPU-hours, total CPU-hours when available, allocated GPU-hours, allocation-based memory byte-seconds, and observed maximum RSS. Memory byte-seconds are labeled as allocation-based because Slurm’s standard accounting fields do not reliably provide true per-line memory-seconds across all clusters.

Use hpc-compose inspect --rightsize -f compose.yaml after a tracked Slurm run to convert those observations into conservative resource suggestions. The assistant requires tracked submission metadata and compares explicit requests such as x-slurm.mem, x-slurm.time, x-slurm.gpus, and service x-slurm.cpus_per_task against sacct, sstat, and nvidia-smi sampler evidence. It only reports suggestions; it does not rewrite the compose file.

Use hpc-compose score <job-id> after a tracked Slurm run when you want a compact efficiency grade. The score reuses sampler history, sacct, sstat, and right-sizing recommendations, then reports GPU utilization, memory utilization, active compute-time versus requested walltime, and a best-effort kWh estimate. Energy uses sampled GPU power when available, otherwise falls back to power limits or configured TDP assumptions through --gpu-tdp-w, --cpu-watts-per-core, and --pue; it does not claim carbon intensity or emissions.

Use hpc-compose germinate -f compose.yaml before a full run when you want a short canary to gather fresh evidence. Canary runs write .hpc-compose/latest-canary.json so normal up metadata remains the latest production submission.

Sweep Manifests

hpc-compose sweep submit stores sweep state under .hpc-compose/sweeps/<sweep-id>/sweep.json and refreshes .hpc-compose/sweeps/latest.json. The manifest records the matrix mode, persisted random seed, trial ids, trial variables, rendered script paths, job ids, per-trial job record paths, submit times, and any submit error.

Each submitted trial also writes a normal job record under .hpc-compose/jobs/<job-id>.json with kind: sweep_trial and a sweep metadata block. Sweep-trial records deliberately do not replace normal latest.json or latest-run.json, so hpc-compose status, watch, and logs continue to target ordinary runs unless you pass an explicit job id.

hpc-compose sweep status -f compose.yaml --format json loads the manifest and queries the same scheduler/tracking snapshot code used for ordinary jobs. It reports per-trial state plus aggregate counts for completed, failed, running, pending, unknown, missing_tracking, and submit_failed. V1 does not parse metric files or infer the best trial; keep metric summaries in your training output or external experiment tracker.

Diffing Runs

Use hpc-compose diff <job-id-1> <job-id-2> to compare two tracked submissions. The compact text view highlights outcome, resource, and config changes; --format json returns the full uncapped diff for notebooks or experiment records. Older tracked jobs without config snapshots still compare outcome metadata and report a note that config comparison is unavailable.

Hyperparameter Sweeps

hpc-compose sweep turns one compose file with an embedded sweep block into many independent tracked Slurm jobs. Each trial is a normal sbatch submission with its own allocation, rendered script, job record, and scheduler state. The sweep manifest ties those jobs together for listing and aggregate status.

Quickstart

Start from a spec that can run with ordinary defaults, then add a top-level sweep block:

name: training-sweep

x-slurm:
  time: "00:20:00"
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

sweep:
  parameters:
    lr: [0.001, 0.01, 0.1]
    batch_size: [32, 64]
  matrix: full

services:
  trainer:
    image: python:3.11-slim
    environment:
      LR: "${lr:-0.001}"
      BATCH_SIZE: "${batch_size:-32}"
    command: ["python", "train.py"]

Preview the expansion first:

hpc-compose sweep submit -f examples/training-sweep.yaml --dry-run

Then submit the trials:

hpc-compose sweep submit -f examples/training-sweep.yaml
hpc-compose sweep status -f examples/training-sweep.yaml
hpc-compose sweep list -f examples/training-sweep.yaml

Matrix Modes

matrix: full expands the full Cartesian product over sorted parameter names, so the example above produces six trials in stable t000, t001, … order.

Random sampling selects without replacement:

sweep:
  parameters:
    lr: [0.001, 0.01, 0.1]
    batch_size: [32, 64]
  matrix:
    random: 5
    seed: "paper-table-2"

With a seed, the selected trials are stable across machines. Without a seed, sweep submit derives one from the new sweep id and persists it in the manifest.

Interpolation Rules

Sweep parameter names are interpolation variable names. Values may be scalar strings, numbers, or booleans. For each trial, those variables override values from the environment and settings before planning, preparing, and rendering.

Reserved variables are also available:

VariableValue
HPC_COMPOSE_SWEEP_IDThe persisted sweep id.
HPC_COMPOSE_SWEEP_TRIALThe stable trial label such as t000.
HPC_COMPOSE_SWEEP_TRIAL_INDEXZero-based trial index.

Normal commands still treat sweep as metadata. If plan, up, or render encounters ${lr} without a default, it fails unless lr is provided in the environment or settings. Use defaults such as ${lr:-0.001} when the base spec should remain runnable, and use sweep submit --dry-run as the validation path for missing sweep-only variables.

Fanout Guard

By default, submitted sweeps are capped at 100 trials. Larger matrices fail before calling sbatch:

hpc-compose sweep submit -f train.yaml

Raise the explicit ceiling when the fanout is intentional:

hpc-compose sweep submit -f train.yaml --max-trials 500

The guard applies to real submissions. Dry runs can inspect any matrix size.

Status Output

sweep status loads the manifest, queries the tracked state for submitted jobs, and aggregates:

  • completed
  • failed
  • running
  • pending
  • unknown
  • missing_tracking
  • submit_failed

Use JSON for notebooks, dashboards, or CI automation:

hpc-compose sweep submit -f train.yaml --format json
hpc-compose sweep status -f train.yaml --format json
hpc-compose sweep status -f train.yaml --sweep-id sweep-123 --format json
hpc-compose sweep list -f train.yaml --format json

The JSON includes the sweep id, manifest path, matrix mode, persisted seed, trial variables, job ids, record paths, and per-trial status.

Manifest Layout

Sweep state is stored beside normal tracked jobs:

.hpc-compose/
  sweeps/
    latest.json
    <sweep-id>/
      sweep.json
      t000.sbatch
      t001.sbatch
  jobs/
    <job-id>.json

Sweep-trial records have kind: sweep_trial and include sweep metadata. They do not update the normal latest.json or latest-run.json pointers, so status, watch, and logs for ordinary runs keep their existing meaning.

V1 Limitations

  • Sweeps must be embedded in the same compose file. sweep.spec is rejected in v1.
  • Each trial is a separate Slurm allocation. Sweeps are not Slurm arrays.
  • x-slurm.array is rejected during sweep submit.
  • Trials submit sequentially. If a submission fails, later trials are not submitted and the partial manifest is kept.
  • sweep status summarizes scheduler/tracking state only. It does not parse metric files or pick a best trial.

Right-Sizing With Canary Runs

hpc-compose germinate submits a short Slurm canary for an existing compose spec, forces runtime metrics on, waits for the canary to finish, and prints conservative resource recommendations for the original spec.

Canaries are short probes, not benchmark truth. They are useful for catching obvious over-requests such as asking for many GPUs when only one device is touched, or requesting far more memory than the process ever approaches during startup. They are not a substitute for full-run profiling when a workload has long warmup, data-dependent memory, lazy model loading, or late training phases.

Basic Workflow

hpc-compose germinate -f compose.yaml
hpc-compose germinate -f compose.yaml --format json
hpc-compose germinate -f compose.yaml --canary-time 00:01:00 --metrics-interval 5

The canary keeps partition, account, QoS, constraints, cache, runtime backend, and service topology from the original plan. It minimizes CPU, memory, and GPU requests in memory only, writes latest-canary.json, and leaves normal latest.json untouched.

Dry-run the canary script without submitting:

hpc-compose germinate -f compose.yaml --dry-run --script-out canary.sbatch

Output

Text output includes the canary job id, the standard right-sizing observations, and a YAML patch you can apply manually:

x-slurm:
  mem: 16G
services:
  trainer:
    x-slurm:
      cpus_per_task: 4

JSON output includes the same patch plus the full right-sizing report:

hpc-compose germinate -f compose.yaml --format json

Recommendation Rules

  • CPU recommendations use observed CPU demand with conservative headroom and round up.
  • Memory recommendations use the strongest available evidence from sampler rows, sstat, and sacct, then round to Slurm-friendly units.
  • GPU recommendations shrink only when GPU sampler evidence shows fewer active devices.
  • Walltime is observed but not down-sized from a one-minute canary.

Caveats

  • Warmup-heavy jobs can look smaller than steady-state jobs.
  • Data-dependent memory may peak after the canary exits.
  • Lazy model loading can under-report memory and GPU use if no real request hits the model.
  • Distributed training may need full topology even when a canary only exercises startup.
  • Failed, OOM-like, time-limit, malformed-metrics, and missing-metrics cases are reported as diagnostics rather than YAML rewrites.

Start from examples/canary-right-size.yaml when you want a small, explicit spec to practice the workflow.

Cache Management

The resolved cache directory stores imported and prepared runtime artifacts. It comes from explicit x-slurm.cache_dir, then profile/default settings, then $HOME/.cache/hpc-compose. For real cluster runs, it must be visible from both the submission host and compute nodes.

Choose A Cache Path

Use a project scratch, work, or shared filesystem path:

export CACHE_DIR=/cluster/shared/hpc-compose-cache
mkdir -p "$CACHE_DIR"
test -w "$CACHE_DIR"

You can record that path in project settings instead of every compose file:

hpc-compose setup --profile-name dev --cache-dir "$CACHE_DIR" --default-profile dev --non-interactive

Do not use /tmp, /var/tmp, /private/tmp, or /dev/shm. Validation may accept those strings, but preflight reports them as unsafe because compute nodes must reuse artifacts prepared before submission.

Inspect Cache State

hpc-compose cache list
hpc-compose cache inspect -f compose.yaml
hpc-compose cache inspect -f compose.yaml --service app

Use cache inspect to answer:

  • which artifact is being reused
  • whether a prepared image came from a cached manifest
  • whether a service rebuilds on every prepare because prepare mounts are present

Prune Cache Entries

Prune old entries by age:

hpc-compose --profile dev cache prune --age 14

Prune artifacts not referenced by the current plan:

hpc-compose cache prune --all-unused -f compose.yaml

Prune one cache directory directly:

hpc-compose cache prune --age 7 --cache-dir '<shared-cache-dir>'

--age and --all-unused are mutually exclusive.

Rendezvous Records

Cross-job rendezvous uses the same shared cache root:

<cache_dir>/rendezvous/<name>/latest.json

These records are small endpoint descriptors, not runtime images. They are pruned separately:

hpc-compose rendezvous list --cache-dir "$CACHE_DIR"
hpc-compose rendezvous prune --cache-dir "$CACHE_DIR"

Provider cleanup removes latest.json only when the finishing job still owns it, so an older provider cannot erase a newer provider’s record.

After Upgrading

Cache keys include the tool version, so upgrading hpc-compose invalidates existing cached artifacts. Expect a full rebuild on the next prepare or up, then optionally prune old entries:

hpc-compose cache prune --age 0

Cross-Job Rendezvous

hpc-compose rendezvous lets independent Slurm jobs coordinate through the shared cache directory. A provider job registers an address under <cache_dir>/rendezvous/<name>/latest.json; a later client job resolves that record and receives stable HPC_COMPOSE_RDZV_* environment variables.

This is same-cluster shared-storage discovery. It does not create DNS, tunnels, authentication, authorization, or a service mesh. Use it only inside a same-user or trusted shared-project cache boundary.

Provider

name: model-server

x-slurm:
  cache_dir: ${CACHE_DIR}

services:
  model:
    image: python:3.12-slim
    command: python -m http.server 8000
    readiness:
      type: tcp
      port: 8000
    x-slurm:
      rendezvous:
        register:
          name: model-server
          port: 8000
          protocol: http
          path: /
          ttl_seconds: 3600

Provider registration is declarative. If readiness is configured, the rendered script registers after the readiness check succeeds. On cleanup, it removes latest.json only when the current job still owns the latest record.

Client

name: model-client

x-slurm:
  cache_dir: ${CACHE_DIR}
  rendezvous: model-server

services:
  client:
    image: curlimages/curl:8.10.1
    command: curl -fsS "$HPC_COMPOSE_RDZV_MODEL_SERVER_URL"

Clients receive generic variables such as HPC_COMPOSE_RDZV_URL, plus name-scoped variables such as HPC_COMPOSE_RDZV_MODEL_SERVER_URL, HPC_COMPOSE_RDZV_MODEL_SERVER_HOST, and HPC_COMPOSE_RDZV_MODEL_SERVER_PORT.

Debugging CLI

hpc-compose rendezvous list --cache-dir "$CACHE_DIR"
hpc-compose rendezvous resolve model-server --cache-dir "$CACHE_DIR"
hpc-compose rendezvous register model-server --host node01 --port 8000 --job-id 12345 --cache-dir "$CACHE_DIR"
hpc-compose rendezvous prune --cache-dir "$CACHE_DIR"

register is mainly for debugging and custom workflows. Normal provider jobs should use services.<name>.x-slurm.rendezvous.register.

TTL and Staleness

Records have a TTL. Resolution ignores expired records, and prune removes expired latest and historical JSON files. If the provider job exits cleanly, cleanup removes the latest pointer only if it still points at that job, so a newer provider is not deregistered by an older job finishing later.

Requirements

  • x-slurm.cache_dir must point to storage visible from the login node and compute nodes.
  • Provider and client jobs must use the same cache directory.
  • Names are single safe path components: ASCII letters, digits, ., _, and -.

See examples/rendezvous-model-server.yaml and examples/rendezvous-client.yaml for a runnable pair.

Artifacts And Resume

Artifacts are collected after a run for export and provenance. Resume state is the canonical live state a later attempt should load. Keep those roles separate.

Artifact Export

When x-slurm.artifacts is enabled, teardown collection writes:

${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/artifacts/
  manifest.json
  payload/...

Export collected payloads after the job finishes:

hpc-compose artifacts -f compose.yaml
hpc-compose artifacts -f compose.yaml --bundle checkpoints --tarball

export_dir is resolved relative to the compose file and expands ${SLURM_JOB_ID} from tracked metadata. Named bundles are written under <export_dir>/bundles/<bundle>/, and provenance JSON is written under <export_dir>/_hpc-compose/bundles/<bundle>.json.

The bundle name default is reserved for top-level x-slurm.artifacts.paths.

Resume-Aware Runs

When x-slurm.resume is enabled, hpc-compose:

  • mounts the shared resume path into every service at /hpc-compose/resume
  • injects HPC_COMPOSE_RESUME_DIR, HPC_COMPOSE_ATTEMPT, and HPC_COMPOSE_IS_RESUME
  • writes attempt-specific runtime outputs under .hpc-compose/<jobid>/attempts/<attempt>/
  • keeps .hpc-compose/<jobid>/{logs,metrics,artifacts,state.json} pointed at the latest attempt for compatibility

Use the shared resume directory for the canonical checkpoint a restarted run should load next. Treat exported artifacts as retrieval and provenance output after the attempt finishes, not as the primary live resume source.

Useful Commands

hpc-compose up --resume-diff-only -f compose.yaml
hpc-compose up --allow-resume-changes -f compose.yaml
hpc-compose artifacts -f compose.yaml

CLI Reference

This page maps the public hpc-compose CLI by workflow. Use Quickstart for the shortest install-and-run path, Runbook for real-cluster operations, and Spec Reference for YAML field behavior.

Common Flags

FlagUse it forNotes
--profile <NAME>Select a profile from the project-local settings fileApplies to every command.
--settings-file <PATH>Use an explicit settings fileBypasses upward discovery of .hpc-compose/settings.toml.
-f, --file <FILE>Select the compose file on compose-aware commandsWhen omitted, hpc-compose uses the active context compose file or falls back to compose.yaml.
`–color autoalwaysnever`
--quietSuppress non-essential progress labelsUseful when a wrapper only needs command output and errors.
--format jsonMachine-readable outputPreferred on non-streaming commands. --json remains available only as a compatibility alias on older machine-readable commands.

Authoring and Setup

CommandUse it forNotes
new (alias: init)Generate a starter compose file from a built-in templateUse --list-templates and --describe-template <name> to inspect templates before writing a file. --cache-dir is optional and writes an explicit x-slurm.cache_dir.
evolveLearn spec features through a progressive valid-spec tutorialUse --list-lessons, --describe-lesson <id>, and --until <step> to inspect or stop at a lesson step. --format json requires --yes.
setupCreate or update the project-local settings fileRecords compose path, env files, env vars, binary overrides, and an optional profile cache default.
contextPrint the resolved execution contextShows the selected profile, binaries, interpolation vars, runtime paths, and value sources.
completionsGenerate shell completion scriptsSupports Bash, Zsh, Fish, PowerShell, and Elvish through Clap’s completion generator.
hpc-compose new --list-templates
hpc-compose new --describe-template minimal-batch
hpc-compose new --template minimal-batch --name my-app --output compose.yaml
hpc-compose new --template minimal-batch --name my-app --cache-dir '<shared-cache-dir>' --output compose.yaml
hpc-compose evolve --list-lessons
hpc-compose evolve --describe-lesson progressive-complexity
hpc-compose evolve --output compose.yaml --name my-app
hpc-compose evolve --yes --until readiness --format json
hpc-compose setup
hpc-compose setup --profile-name dev --cache-dir '<shared-cache-dir>' --default-profile dev --non-interactive
hpc-compose context --format json
hpc-compose context --show-values --format json
hpc-compose completions zsh

evolve Options

evolve is authoring-only: it validates and writes candidate specs but does not prepare images, run preflight, or submit jobs. The default lesson is progressive-complexity, with steps minimal, second-service, readiness, failure-policy, and multi-node-placement.

  • --list-lessons prints shipped lessons.
  • --describe-lesson <LESSON> prints lesson steps and concepts.
  • --lesson <LESSON> selects the lesson to run.
  • --until <STEP> stops after a step id such as readiness.
  • --yes accepts steps noninteractively.
  • --format json is available for list/describe and for --yes runs.
  • --force allows overwriting the output file.

Plan and Run

CommandUse it forNotes
planValidate and preview the static runtime planRecommended before every first run. --show-script prints the generated launcher to stdout without writing a file; --explain adds actionable cache, resume, preflight, and next-command hints.
validateCheck YAML shape and field validationAdd --strict-env when interpolation fallbacks should fail.
lintRun stricter opinionated static checksFlags risky-but-valid specs such as weak dependency readiness, unusual memory/CPU ratios, and ignored services that can write shared paths. Warnings fail by default; add --allow-warnings to make warning-only results successful.
configShow the fully interpolated effective configUse --format json when you need stable machine-readable snapshots or resume diffs. config --variables reports only interpolation variables referenced by the compose file and redacts sensitive-looking names unless --show-values is passed.
schemaPrint the checked-in JSON SchemaUse it for editor integration and authoring tools. The same schema is published with the docs site for YAML Language Server and SchemaStore consumption. Rust validation remains the semantic source of truth.
inspectView the normalized runtime plan--verbose can reveal resolved secrets and final mount mappings. Add --dependencies for a service DAG in text, DOT, or JSON form.
preflightCheck host and cluster prerequisitesUse --strict when warnings should block a later run.
doctor cluster-reportGenerate a best-effort cluster capability profileWrites .hpc-compose/cluster.toml by default; use --out - to print the TOML profile.
doctor mpi-smokeRender or run a small MPI probe for one serviceReports requested/advertised MPI types, MPI profile metadata, discovered MPI installs, host MPI binds/env, and rendered srun; add --submit to consume a Slurm allocation.
doctor fabric-smokeRender or run MPI/NCCL/UCX/OFI smoke probes for one MPI serviceUse --checks auto or a comma-separated list such as mpi,nccl; render-only by default, --submit consumes a Slurm allocation.
weatherShow advisory live cluster conditionsOne-shot dashboard from sinfo, squeue, optional sshare, and optional sprio; does not reserve resources or change submission behavior.
prepareImport images and build prepared runtime artifactsUse --force when the base image or prepare inputs changed.
renderWrite the generated launcher script without submittingGood for reviewing the final batch script.
upRun the one-command launch/watch/logs workflowPreferred normal run on a real cluster. Uses a spec-scoped .hpc-compose/locks/*.up.lock to prevent concurrent up races.
testSmoke-test a finite spec end to endRequires explicit --local or --submit; every service must start, pass configured readiness, and complete successfully.
devRun local hot-reload modeWatches bind-mounted source directories and restarts affected services through the local supervisor.
tmuxOpen a multi-pane local service log dashboardTails one tracked local service log per pane; tmux does not own service processes.
germinateSubmit a one-minute canary and recommend resource settingsWrites latest-canary.json, keeps normal latest.json untouched, and prints a manual YAML patch.
sweep submitSubmit many independent trials from a top-level sweep blockEach trial is a tracked Slurm allocation. Use --dry-run first and --max-trials for intentional fanout above 100.
whenSubmit after cluster conditions are metPrepares and renders now, then monitors typed conditions such as idle nodes, prior job completion, or a local time window before calling sbatch.
allocOpen an interactive Slurm allocation for iterative service runsUses top-level x-slurm allocation settings, exports HPC_COMPOSE_*, and lets run SERVICE -- CMD reuse the active allocation.
runLaunch a one-off commandService mode uses an existing compose service. Image mode uses --image IMAGE -- CMD and builds an ephemeral one-service plan.
shellOpen an interactive Pyxis shellThin wrapper around srun --pty --container-image=<image> bash -l.
hpc-compose plan -f compose.yaml
hpc-compose plan --explain -f compose.yaml
hpc-compose plan --show-script -f compose.yaml
hpc-compose validate -f compose.yaml
hpc-compose lint -f compose.yaml
hpc-compose lint -f compose.yaml --allow-warnings
hpc-compose lint -f compose.yaml --format json
hpc-compose config -f compose.yaml
hpc-compose config -f compose.yaml --variables
hpc-compose schema > hpc-compose.schema.json
hpc-compose inspect --verbose -f compose.yaml
hpc-compose inspect --dependencies -f compose.yaml
hpc-compose inspect --dependencies --dependencies-format dot -f compose.yaml
hpc-compose preflight -f compose.yaml
hpc-compose doctor cluster-report
hpc-compose doctor mpi-smoke -f compose.yaml --service trainer --script-out mpi-smoke.sbatch
hpc-compose doctor mpi-smoke -f compose.yaml --service trainer --submit
hpc-compose doctor fabric-smoke -f compose.yaml --service trainer --checks auto --script-out fabric-smoke.sbatch
hpc-compose doctor fabric-smoke -f compose.yaml --service trainer --checks mpi,nccl --submit
hpc-compose weather
hpc-compose weather --format json
hpc-compose prepare -f compose.yaml
hpc-compose render -f compose.yaml --output job.sbatch
hpc-compose up -f compose.yaml
hpc-compose up --hold-on-exit always -f compose.yaml
hpc-compose up --watch-queue --queue-warn-after 15m -f compose.yaml
hpc-compose up --detach --format json -f compose.yaml
hpc-compose test --local -f compose.yaml
hpc-compose test --submit --time 00:01:00 -f compose.yaml
hpc-compose dev -f examples/dev-python-app.yaml
hpc-compose tmux -f examples/dev-python-app.yaml --no-attach
hpc-compose germinate -f compose.yaml
hpc-compose germinate -f compose.yaml --format json
hpc-compose germinate -f compose.yaml --dry-run --script-out canary.sbatch
hpc-compose sweep submit -f compose.yaml --dry-run
hpc-compose sweep submit -f compose.yaml --max-trials 200
hpc-compose sweep status -f compose.yaml --format json
hpc-compose sweep list -f compose.yaml
hpc-compose when -f compose.yaml --partition gpu8 --free-nodes 4
hpc-compose when -f compose.yaml --after-job 12345
hpc-compose when -f compose.yaml --between 22:00-06:00
hpc-compose when --detach --format json -f compose.yaml --partition gpu8 --free-nodes 4
hpc-compose alloc -f compose.yaml
hpc-compose run app -- python -m smoke_test
hpc-compose run --image docker://python:3.12 --resources cpu-small -- python -V
hpc-compose shell --image docker://ubuntu:24.04

Editor Schema

The checked-in schema is draft-07 JSON Schema and is published with the docs site at /schema/hpc-compose.schema.json. SchemaStore should associate it only with hpc-compose-specific filenames: hpc-compose.yaml, hpc-compose.yml, *.hpc-compose.yaml, and *.hpc-compose.yml. Generic compose.yaml remains a supported input file, but it is intentionally not claimed for zero-config editor association.

up Options

Useful workflow flags:

  • --local runs a Pyxis/Enroot plan on the current Linux host instead of calling sbatch.
  • --detach submits or launches and returns after tracking metadata is written.
  • --format text|json is accepted with --detach or --dry-run.
  • --watch-queue waits in line-oriented queue output until the Slurm job reaches RUNNING, then opens the normal watch view.
  • --queue-warn-after <DURATION> warns once when --watch-queue stays PENDING longer than the threshold; the default is 10m, and 0 disables the warning.
  • --watch-mode auto|tui|line selects the live output mode; --no-tui is a line-mode alias.
  • --hold-on-exit never|failure|always controls whether the TUI stays open after the job reaches a terminal scheduler state.
  • --allow-resume-changes acknowledges an intentional change to resume-coupled config between tracked runs.
  • --resume-diff-only prints the resume-sensitive config diff without submitting.
  • --script-out <PATH> keeps a copy of the rendered batch script.
  • --force-rebuild refreshes imported and prepared artifacts before launch.
  • --skip-prepare skips image import and prepare reuse checks.
  • --keep-failed-prep leaves the failed Enroot rootfs behind for inspection.
  • Array jobs (x-slurm.array) require --detach because live watch/log fan-out is not array-aware yet.
  • Scheduler dependencies from x-slurm.after_job and x-slurm.dependency are passed as sbatch --dependency=....

germinate Canary Runs

germinate is the conservative right-sizing workflow:

hpc-compose germinate -f compose.yaml
hpc-compose germinate -f compose.yaml --canary-time 00:01:00 --metrics-interval 5
hpc-compose germinate -f compose.yaml --pending-timeout 30m --format json

Useful options:

  • --canary-time <TIME> defaults to 00:01:00.
  • --metrics-interval <SECONDS> defaults to 5 and is forced on in the canary plan.
  • --pending-timeout <DURATION> defaults to 30m.
  • --min-cpus <N>, --min-mem <MEM>, and --min-gpus <N> set canary floors.
  • --dry-run renders the canary script without calling sbatch.
  • --skip-prepare, --force-rebuild, --keep-failed-prep, --no-preflight, and --script-out match the normal preparation flags.

The command rejects x-slurm.array in v1 and never rewrites your compose file automatically. See Right-Sizing With Canary Runs.

sweep Hyperparameter Sweeps

sweep expands the top-level sweep block in a compose file. Each generated trial is rendered and submitted as an independent tracked Slurm job; sweep status and sweep list read the persisted manifest under .hpc-compose/sweeps/.

hpc-compose sweep submit -f train.yaml --dry-run
hpc-compose sweep submit -f train.yaml --max-trials 200
hpc-compose sweep submit -f train.yaml --format json
hpc-compose sweep status -f train.yaml
hpc-compose sweep status -f train.yaml --sweep-id sweep-123 --format json
hpc-compose sweep list -f train.yaml --format json

sweep submit options:

OptionUse it for
-f, --file <FILE>Select the compose file containing the embedded sweep block.
--dry-runExpand and validate all trials without writing manifests, scripts, or job records.
--max-trials <N>Permit real submissions above the default 100-trial fanout guard.
--skip-prepareReuse existing prepared artifacts and skip image preparation.
--force-rebuildRefresh imported/prepared artifacts for each submitted trial.
--no-preflightSkip preflight checks before trial submission.
`–format textjson`

sweep status options:

OptionUse it for
-f, --file <FILE>Select the compose file whose sweep manifests should be read.
--sweep-id <ID>Inspect a specific sweep instead of .hpc-compose/sweeps/latest.json.
`–format textjson`

sweep list options:

OptionUse it for
-f, --file <FILE>Select the compose file whose sweep directory should be scanned.
`–format textjson`

See Hyperparameter Sweeps for the sweep spec shape, interpolation rules, status categories, and v1 limitations.

when Conditional Submission

when is a foreground monitor for constrained partitions and off-hour workflows. It runs the normal pre-submit work first, then polls until every supplied condition is true:

hpc-compose when -f compose.yaml --partition gpu8 --free-nodes 4
hpc-compose when -f compose.yaml --after-job 12345 --after-job-condition afterok
hpc-compose when -f compose.yaml --between 22:00-06:00

Conditions are ANDed. --free-nodes counts only idle rows from sinfo -h -p <partition> -o "%T|%D" and requires --partition to match x-slurm.partition. --after-job polls squeue first and then sacct; afterok and afternotok fail immediately when the prior job reaches a terminal state that can never satisfy the requested condition. --between uses local login-node wall-clock time and supports wraparound windows such as 22:00-06:00.

Useful options:

  • --poll-interval <DURATION> defaults to 60s; the minimum is 5s.
  • --timeout <DURATION> gives up if conditions are not met; 0s performs one check.
  • --detach returns after submission and tracking metadata are written.
  • --format json is accepted with --detach and returns the condition summaries plus normal submission metadata.
  • --skip-prepare, --force-rebuild, --keep-failed-prep, --no-preflight, and --script-out match the corresponding up preparation flags.

Example JSON automation:

hpc-compose when --detach --format json -f compose.yaml --partition gpu8 --free-nodes 4

V1 has no x-when YAML field. Conditional submission is intentionally a CLI workflow layered over the normal compose spec.

up --local

up --local launches a Pyxis/Enroot plan on the current host instead of calling sbatch. It is useful for local authoring and script inspection, not for distributed Slurm execution.

hpc-compose up --local --dry-run -f compose.yaml

Current constraints:

  • Linux hosts only
  • runtime.backend: pyxis only
  • single-host specs only
  • no distributed or partitioned placement
  • no services.<name>.x-slurm.extra_srun_args
  • no services.<name>.x-slurm.mpi
  • no x-slurm.array
  • no scheduler dependencies from x-slurm.after_job or x-slurm.dependency
  • reservation-related x-slurm.submit_args are ignored
  • x-slurm.error is ignored, and local batch stderr is written into the tracked local batch log

up --local follows the tracked local launch immediately, just like up does for a submitted job. Add --detach when you want to launch and return.

In local mode the batch script also exports HPC_COMPOSE_BACKEND_OVERRIDE=local, HPC_COMPOSE_LOCAL_ENROOT_BIN pointing to the resolved enroot binary, and HPC_COMPOSE_LOCAL_BIN_DIR containing a generated srun shim. These variables are internal to hpc-compose and not intended for direct use in compose specs.

Development Workflow

test, dev, and tmux are intentionally small workflows layered over the same render/prepare/tracking machinery as up. See Development Workflow for the smoke-test guide, hot-reload behavior, and local-mode constraints.

test is for finite smoke specs:

hpc-compose test --local -f compose.yaml
hpc-compose test --submit --time 00:01:00 --timeout 180s -f compose.yaml
hpc-compose test --submit --format json -f compose.yaml

Success means all tracked services appear in runtime state, launched at least once, passed readiness when readiness is configured, and completed successfully. Long-running application specs should use a smoke-test variant of the command or service entrypoint that exits after proving the workflow.

Useful test options:

OptionUse it for
--localRun the finite smoke spec through the local supervisor.
--submitSubmit the finite smoke spec to Slurm; required before any scheduler submission happens.
--time <TIME>Override Slurm wall time for --submit; defaults to 00:01:00.
--timeout <DURATION>Stop waiting and best-effort cancel/cleanup after the timeout; defaults to 180s.
--format jsonEmit phase status, job id, script path, per-service results, and failure reason for automation.

dev is local-only and watches host directories from service volumes:

hpc-compose dev -f examples/dev-python-app.yaml
hpc-compose dev -f compose.yaml --watch-path ./src --debounce-ms 500

Directory bind mounts are mapped back to affected services. File mounts, missing paths, container-only paths, cache paths, and non-directory paths are ignored. --watch-path adds an explicit directory and restarts all services when it changes. By default, leaving dev stops the local supervisor; use --keep-running when you want the tracked local job to continue.

Useful dev options:

OptionUse it for
--watch-path <PATH>Add an explicit watch root when mounted source directories cannot be inferred.
--debounce-ms <N>Coalesce rapid file changes before requesting a restart.
--keep-runningLeave the local supervisor alive when the watch loop exits.

tmux opens a log dashboard for local runs:

hpc-compose tmux -f compose.yaml
hpc-compose tmux -f compose.yaml --job-id local-123
hpc-compose tmux -f compose.yaml --session demo --no-attach

When --job-id is omitted, tmux launches a new local run first. Each pane runs tail -F against one tracked service log and uses the service name as the pane title.

Useful tmux options:

OptionUse it for
--job-id <ID>Attach the dashboard to an existing tracked local run.
--session <NAME>Choose the tmux session name instead of hpc-compose-<job-id>.
--no-attachCreate/update the dashboard without requiring an interactive terminal.
--lines <N>Set the initial tail -n history for each pane.

run and shell

run has two forms:

hpc-compose run [-f compose.yaml] SERVICE -- CMD [ARGS...]
hpc-compose run --image IMAGE [--resources NAME] [--time T] [--mem M] [--cpus-per-task N] [--gpus N] [--partition P] [--env K=V] [--local] -- CMD [ARGS...]

Service mode reuses the named service’s image, environment, mounts, working directory, and prepare rules, clears depends_on, and submits a fresh tracked run job. When launched inside hpc-compose alloc, service mode detects HPC_COMPOSE_ALLOCATION=1 and SLURM_JOB_ID, prints the active allocation id, runs the one-service launcher inside the allocation with srun, and records the latest run metadata against the allocation job id. Image mode creates an ephemeral one-service plan from CLI flags, then follows the normal render/prepare/submit path. --resources refers to [resource_profiles.<name>] in settings; it is not the global --profile selector.

alloc requests an interactive allocation through salloc:

hpc-compose alloc -f compose.yaml
hpc-compose alloc -f compose.yaml -- bash -lc 'hpc-compose run app -- python -m pytest'

It runs preflight and image preparation by default, accepts the matching up preparation flags (--no-preflight, --skip-prepare, --force-rebuild, and --keep-failed-prep), rejects x-slurm.array, and exports allocation metadata such as HPC_COMPOSE_COMPOSE_FILE, HPC_COMPOSE_CACHE_DIR, HPC_COMPOSE_NODELIST_FILE, and HPC_COMPOSE_PRIMARY_NODE.

shell is intentionally thinner:

hpc-compose shell --image IMAGE [--resources NAME] [--time T] [--mem M] [--cpus-per-task N] [--gpus N] [--partition P] [--env K=V]

It calls srun --pty directly with Pyxis --container-image and defaults to bash -l. It does not render an sbatch script or create tracked job metadata.

Accessible and Automation-Friendly Output

Use plain or structured output when terminal styling, progress labels, or alternate-screen interfaces make automation or assistive tooling harder:

hpc-compose --color never plan -f compose.yaml
hpc-compose --quiet validate -f compose.yaml
hpc-compose watch -f compose.yaml --watch-mode line
hpc-compose replay -f compose.yaml --no-tui
hpc-compose logs -f compose.yaml --service app --follow
hpc-compose logs -f compose.yaml --grep 'error|oom' --since 30m
hpc-compose status -f compose.yaml --format json

context and config --variables intentionally scope interpolation variables to names referenced by the compose file. Values whose names look secret-bearing, such as TOKEN, PASSWORD, SECRET, API_KEY, or PRIVATE_KEY, are shown as <redacted> by default; add --show-values only in trusted local diagnostics.

Tracked Runtime

CommandUse it forNotes
debugDiagnose the latest tracked runShows scheduler state, per-service state, batch and service log tails, missing-log hints, and a recommended next command. Add --preflight to rerun prerequisite checks.
statusSummarize scheduler state, the top-level batch log, per-service outcomes, and failure-policy statePrefer --format json for automation. Add --array to include merged squeue --array and sacct --array task rows.
psShow a stable per-service runtime snapshotUseful when you want a point-in-time view instead of the live TUI.
watchReconnect to the live watch UIFalls back to line-oriented output on non-interactive terminals.
replayReanimate a tracked job timeline from existing artifactsBest-effort DVR view built from final state, service-exit markers, metrics JSONL, and logs. Use --speed, --no-tui, or --format json as needed.
logsPrint tracked service logsAdd --follow, --grep <pattern>, or coarse --since <duration> as needed.
inspect --rightsizeSuggest conservative resource request reductions after a tracked runUses tracked sacct, sstat, and sampler evidence; supports --job-id and --format json.
statsReport tracked runtime metrics, step stats, and optional accountingSupports --accounting, --format json, --format jsonl, and --format csv.
scoreScore post-run resource efficiencySupports positional job ids, --format json, --pue, --gpu-tdp-w, and --cpu-watts-per-core.
diffCompare two tracked job submissionsCompact text by default; use --format json for full detail.
artifactsExport tracked artifact bundles after a runUse --bundle <name> and --tarball when needed.
cancelCancel the latest tracked job or an explicit job idUses tracked metadata instead of making you retype paths.
downCancel a tracked job and clean tracked stateSupports --purge-cache when the tracked snapshot names concrete cache artifacts.
jobs listScan the current repo tree for tracked runsStart here when you need to rediscover an older run.
cleanRemove old tracked job directories for one compose contextUse --dry-run first when you are unsure.
rendezvous listList live shared-cache service recordsDefaults to the resolved cache dir; --cache-dir inspects a specific cache.
rendezvous resolve NAMEResolve one provider recordPrints endpoint fields or JSON for automation.
rendezvous register NAMEManually register a provider recordIntended for debugging and custom workflows; declarative specs usually register providers.
rendezvous pruneRemove expired provider recordsCleans stale latest and historical rendezvous JSON files.
hpc-compose debug -f compose.yaml
hpc-compose debug -f compose.yaml --preflight
hpc-compose jobs list
hpc-compose status -f compose.yaml --format json
hpc-compose status -f compose.yaml --array
hpc-compose status -f compose.yaml --job-id 12345_7 --array
hpc-compose ps -f compose.yaml
hpc-compose watch -f compose.yaml --watch-mode line
hpc-compose watch -f compose.yaml --hold-on-exit always
hpc-compose replay -f compose.yaml
hpc-compose replay -f compose.yaml --speed 10
hpc-compose replay -f compose.yaml --job-id 12345 --service app
hpc-compose replay -f compose.yaml --no-tui
hpc-compose replay -f compose.yaml --format json
hpc-compose logs -f compose.yaml --service app --follow
hpc-compose logs -f compose.yaml --grep 'error|oom' --since 30m
hpc-compose inspect -f compose.yaml --rightsize
hpc-compose stats -f compose.yaml --format jsonl
hpc-compose stats -f compose.yaml --accounting --format csv
hpc-compose score 12345
hpc-compose diff 12345 12346 -f compose.yaml
hpc-compose artifacts -f compose.yaml --bundle checkpoints --tarball
hpc-compose down -f compose.yaml
hpc-compose cancel -f compose.yaml
hpc-compose clean -f compose.yaml --age 7 --dry-run
hpc-compose rendezvous list
hpc-compose rendezvous resolve model-server
hpc-compose rendezvous register model-server --host node01 --port 8000 --job-id 12345
hpc-compose rendezvous prune

Cache Maintenance

CommandUse it forNotes
cache listInspect cached imported and prepared image artifactsWorks without a compose file.
cache inspectShow cache reuse expectations for the current planSupports --service <name> for one service.
cache pruneRemove old or unused cache entries--age and --all-unused are mutually exclusive.
hpc-compose cache list
hpc-compose cache inspect -f compose.yaml --service app
hpc-compose cache prune --age 7 --cache-dir '<shared-cache-dir>'
hpc-compose cache prune --all-unused -f compose.yaml

Spec reference

This page describes the Compose subset that hpc-compose accepts today. Unknown or unsupported fields are rejected unless this page explicitly says otherwise.

How To Use This Reference

This page is intentionally complete. If you are new, start with Quickstart, Examples, and Runtime Backends, then use the table below to jump into the field group you need.

NeedSection
Overall YAML shapeTop-level shape and Top-level fields
Shared templates and overridesextends
Runtime backend choiceruntime and Runtime Backends
Slurm allocation settingsx-slurm
Hyperparameter sweepssweep and Hyperparameter Sweeps
Service command, image, env, and mountsService fields, Image rules, command and entrypoint, environment, volumes
Startup orderingdepends_on, readiness, and healthcheck
Multi-node placement and MPIMulti-node placement rules, services.<name>.x-slurm.placement, and services.<name>.x-slurm.mpi
Prepared imagesx-runtime.prepare and x-enroot.prepare
Metrics, artifacts, and resumex-slurm.metrics, x-slurm.artifacts, and x-slurm.resume
Unsupported Compose featuresUnsupported Compose keys

Top-level shape

name: demo
version: "1"

runtime:
  backend: pyxis

x-slurm:
  time: "00:30:00"

services:
  app:
    image: python:3.11-slim
    command: python -m main

Top-level fields

FieldShapeDefaultNotes
extendsstringomittedTop-level authoring-only path to a base spec. The base is resolved before interpolation, validation, planning, and config output.
namestringomittedUsed as the Slurm job name when x-slurm.job_name is not set.
versionstring "1" or integer 11hpc-compose spec schema version. Omit for v1 or set explicitly to "1"; Docker Compose values such as "3.9" are rejected after migration.
runtimemappingbackend: pyxisSelects the service runtime backend and GPU passthrough policy.
servicesmappingrequiredMust contain at least one service.
stepsmappingalias for servicesUse either services or steps, not both.
moduleslist of stringsomittedList-only shorthand for top-level x-env.modules.load; cannot be combined with x-env.modules.
x-envmappingomittedStructured host-side module, Spack view, and environment setup shared by all services.
x-slurmmappingomittedTop-level Slurm settings and shared runtime defaults.
sweepmappingomittedEmbedded hyperparameter sweep metadata consumed by hpc-compose sweep submit/status/list. Normal commands treat it as metadata.

extends

extends is an authoring feature for sharing base specs and service templates without copying large cluster-specific blocks. It is resolved before interpolation, validation, planning, rendering, tracked metadata, and hpc-compose config; the effective config no longer contains any extends keys.

Top-level extends points at a base YAML file:

extends: cluster-base.yaml

x-slurm:
  time: "02:00:00"

services:
  trainer:
    command: python train.py

Service-level extends supports three forms:

services:
  api:
    extends: base-service

  worker:
    extends: service-templates.yaml

  trainer:
    extends:
      file: ml-templates.yaml
      service: gpu-worker

Rules:

  • Top-level extends must be a file path string.
  • A service string that looks like a YAML file path, such as base.yaml, ../base.yml, or a path with a separator, uses the same service name from that file. Other strings refer to a service in the same file.
  • A service mapping can select { file, service }; omit file to select a service from the same file.
  • Extends references are recursive and cycles are rejected.
  • Maps merge recursively. Sequences append base-first. Child scalars replace base scalars.
  • Service volumes merge by container target, so a child mount for /data replaces the base mount for /data while unrelated base mounts are kept.
  • Relative host paths in the final plan still resolve against the leaf compose file passed with -f.
  • There is no delete or unset syntax in this version.

sweep

sweep defines trial variables for hpc-compose sweep submit. It is a top-level metadata block; every generated trial is still planned, rendered, submitted, and tracked as a normal one-allocation job.

Full Cartesian product:

sweep:
  parameters:
    lr: [0.001, 0.01, 0.1]
    batch_size: [32, 64]
  matrix: full

Random sample without replacement:

sweep:
  parameters:
    lr: [0.001, 0.01, 0.1]
    batch_size: [32, 64]
  matrix:
    random: 5
    seed: "optional-stable-seed"

Rules:

  • parameters must contain at least one key, and every value list must contain at least one scalar.
  • Parameter keys must be valid interpolation variable names: [A-Za-z_][A-Za-z0-9_]*.
  • Parameter keys must not use the reserved HPC_COMPOSE_SWEEP_ prefix.
  • Parameter values may be strings, numbers, or booleans. They are passed to interpolation as strings.
  • matrix: full expands the Cartesian product deterministically over sorted parameter names.
  • matrix.random must be at least 1 and cannot exceed the total number of combinations.
  • matrix.seed is optional. If omitted, sweep submit derives a seed from the new sweep id and persists it.
  • sweep.spec is rejected in v1; embed the sweep in the same compose file.

For each trial, sweep variables override existing interpolation variables from .env, environment, settings, or --env. These reserved variables are also available:

VariableMeaning
HPC_COMPOSE_SWEEP_IDPersisted sweep id.
HPC_COMPOSE_SWEEP_TRIALTrial label such as t000.
HPC_COMPOSE_SWEEP_TRIAL_INDEXZero-based trial index.

Normal commands do not expand the sweep matrix. If the runnable spec contains ${lr} with no default, ordinary plan, up, and render still fail unless lr is provided. Use defaults such as ${lr:-0.001} when the base spec should remain runnable, or use hpc-compose sweep submit --dry-run to validate sweep-only variables.

hpc-compose sweep submit rejects x-slurm.array, because every sweep trial is already its own allocation. See Hyperparameter Sweeps for manifests, status aggregation, and examples.

x-env

x-env is structured host-side software setup. It is available at the top level and under services.<name>.

x-env:
  modules:
    - cuda/12.4
    - openmpi/5
  spack:
    view: /shared/spack/views/ml
  env:
    HDF5_USE_FILE_LOCKING: "FALSE"

services:
  app:
    image: python:3.11-slim
    x-env:
      modules:
        purge: false
        load:
          - netcdf/4.9
      env:
        OMP_NUM_THREADS: "8"

Supported forms:

  • modules: [name, ...]
  • modules: { purge: bool, load: [name, ...] }
  • spack: { view: /path/to/view }
  • env: { KEY: VALUE }

Rules:

  • Top-level x-env renders before x-slurm.setup.
  • Service-level x-env renders immediately before that service’s srun.
  • env entries are exported on the host and forwarded into Pyxis containers.
  • Service-level x-env.env overrides top-level x-env.env when the same variable is set.
  • Top-level modules: [...] and service-level modules: [...] are shorthand for the matching x-env.modules.load list. The shorthand is list-only and cannot be combined with x-env.modules at the same scope.
  • spack.view prepends bin, lib, lib64, and Python site-package paths only when those directories exist.
  • Modules and Spack views are host-side setup. Container filesystem visibility still requires explicit volumes, x-slurm.mpi.host_mpi.bind_paths, or other site-specific binds.

Settings-aware command table

Use these commands and global flags when you want the project-local settings file (.hpc-compose/settings.toml) to remember compose path, env files, env vars, and binary overrides.

Command or flagPurposeNotes
--profile <NAME>Select the profile from settingsGlobal flag; applies to every subcommand.
--settings-file <PATH>Use an explicit settings fileGlobal flag; bypasses upward auto-discovery of .hpc-compose/settings.toml.
hpc-compose setupCreate or update the project-local settings fileInteractive by default; supports --non-interactive with --profile-name, --compose-file, --env-file, --env, --binary, --cache-dir, and --default-profile.
hpc-compose contextPrint fully resolved execution contextShows selected settings/profile, compose path, binaries, referenced interpolation vars, runtime paths, and value sources; supports --format json. Sensitive-looking interpolation values are redacted unless --show-values is passed.
hpc-compose validate --strict-envFail when interpolation fell back to defaultsDetects when ${VAR:-...} or ${VAR-...} consumed fallback values because VAR was missing.
hpc-compose lintRun opinionated authoring checksBuilds on validation and planning, then reports stable finding codes for risky dependency, memory, and shared-write patterns.
hpc-compose schemaPrint the checked-in JSON SchemaUseful for editor integration and authoring tools. Rust validation remains the semantic source of truth.

x-slurm

These fields live under the top-level x-slurm block.

FieldShapeDefaultNotes
resourcesstringomittedName of a [resource_profiles.<name>] entry in .hpc-compose/settings.toml. Profile values are defaults only; explicit x-slurm fields win.
job_namestringname when presentRendered as #SBATCH --job-name.
partitionstringomittedPassed through to #SBATCH --partition.
accountstringomittedPassed through to #SBATCH --account.
qosstringomittedPassed through to #SBATCH --qos.
timestringomittedPassed through to #SBATCH --time.
nodespositive integeromittedSlurm allocation node count. Defaults to 1 when omitted.
ntaskspositive integeromittedPassed through to #SBATCH --ntasks.
ntasks_per_nodepositive integeromittedPassed through to #SBATCH --ntasks-per-node.
cpus_per_taskpositive integeromittedTop-level Slurm CPU request.
memstringomittedPassed through to #SBATCH --mem.
gresstringomittedPassed through to #SBATCH --gres.
gpuspositive integeromittedUsed only when gres is not set.
gpus_per_nodepositive integeromittedPassed through to #SBATCH --gpus-per-node.
gpus_per_taskpositive integeromittedPassed through to #SBATCH --gpus-per-task.
cpus_per_gpupositive integeromittedPassed through to #SBATCH --cpus-per-gpu.
mem_per_gpustringomittedPassed through to #SBATCH --mem-per-gpu.
gpu_bindstringomittedPassed through to #SBATCH --gpu-bind.
cpu_bindstringomittedPassed through to #SBATCH --cpu-bind.
mem_bindstringomittedPassed through to #SBATCH --mem-bind.
distributionstringomittedPassed through to #SBATCH --distribution.
hintstringomittedPassed through to #SBATCH --hint.
constraintstringomittedPassed through to #SBATCH --constraint.
outputstringomittedPassed through to #SBATCH --output.
errorstringomittedPassed through to #SBATCH --error.
chdirstringomittedPassed through to #SBATCH --chdir.
arraystringomittedSlurm array spec such as 0, 1-10, 1-10:2, 0,3,8-12, or 0-99%10. Rendered as #SBATCH --array.
after_jobstring or mappingomittedScheduler dependency on a prior job id. String shorthand means afterany:<id>; mapping supports { id, condition }.
dependencystringomittedCurrently supports singleton, combined with after_job when both are set.
cache_dirstringsettings profile, settings defaults, then $HOME/.cache/hpc-composeMust resolve to shared storage visible from the login node and the compute nodes.
scratchmappingomittedOptional scratch path mounted into services and exposed as HPC_COMPOSE_SCRATCH_DIR.
stage_inlist of mappingsomittedCopy or rsync host paths before services launch.
stage_outlist of mappingsomittedCopy or rsync paths during teardown, optionally by outcome.
burst_buffermappingomittedRaw #BB / #DW directives for site-specific burst-buffer systems.
metricsmappingomittedEnables runtime metrics sampling.
artifactsmappingomittedEnables tracked artifact collection and export metadata.
resumemappingomittedEnables checkpoint-aware resume semantics with a shared host path mounted into every service.
notifymappingomittedFirst-class Slurm email notification settings.
setuplist of stringsomittedRaw shell lines inserted into the generated batch script before service launches.
submit_argslist of stringsomittedExtra raw Slurm arguments appended as #SBATCH ... lines.
rendezvousstring, list, or mappingomittedResolve cross-job service records from the shared cache and inject HPC_COMPOSE_RDZV_* env vars.

Resource profiles

Resource profiles are reusable settings defaults, distinct from the global --profile setting selector. Define them in .hpc-compose/settings.toml:

[resource_profiles.gpu-small]
partition = "gpu"
time = "01:00:00"
gpus = 1
cpus_per_task = 8
mem = "32G"

Reference one from the spec:

x-slurm:
  resources: gpu-small
  mem: 64G

The profile fills only omitted resource fields. In the example above, partition, time, gpus, and cpus_per_task come from the profile, while the explicit mem: 64G wins. Profiles intentionally exclude behavior such as job_name, cache_dir, arrays, dependencies, submit_args, setup hooks, scratch/staging, artifacts, resume, notify, and metrics.

Allowed profile fields are: partition, account, qos, time, nodes, ntasks, ntasks_per_node, cpus_per_task, mem, gres, gpus, gpus_per_node, gpus_per_task, cpus_per_gpu, mem_per_gpu, gpu_bind, cpu_bind, mem_bind, distribution, hint, and constraint.

x-slurm.array

x-slurm:
  array: 0-99%10
  output: logs/%A_%a.out
services:
  worker:
    image: python:3.12-slim
    command: python worker.py

array accepts Slurm list, range, step, and concurrency forms such as 0, 1-10, 1-10:2, 0,3,8-12, and 0-99%10. Values with spaces, null bytes, malformed ranges, negative numbers, zero step, or zero concurrency are rejected.

Array jobs currently require hpc-compose up --detach; live watch/log fan-out for per-task array elements is future work. --local rejects array specs. Slurm provides SLURM_ARRAY_JOB_ID, SLURM_ARRAY_TASK_ID, SLURM_ARRAY_TASK_COUNT, SLURM_ARRAY_TASK_MAX, SLURM_ARRAY_TASK_MIN, and SLURM_ARRAY_TASK_STEP; for Pyxis jobs, hpc-compose forwards these names into the container when x-slurm.array is set. Prefer output patterns such as %A_%a so task logs do not overwrite each other.

x-slurm.after_job and x-slurm.dependency

x-slurm:
  after_job:
    id: "12345"
    condition: afterok
  dependency: singleton

after_job: "12345" is shorthand for afterany:12345. Mapping form accepts id plus condition, where condition is afterany, afterok, or afternotok. Job ids must be numeric Slurm ids such as 12345, or array elements such as 12345_7.

dependency: singleton is separate because Slurm’s singleton dependency does not take a job id. When both fields are set, hpc-compose submits one command-line dependency string such as --dependency=afterok:12345,singleton.

Dependencies are passed to sbatch as CLI arguments, not rendered as #SBATCH lines, because dependency job ids are commonly dynamic. --local rejects scheduler dependencies.

x-slurm.setup

x-slurm:
  setup:
    - module load enroot
    - source /shared/env.sh
  • Shape: list of strings
  • Default: omitted
  • Notes:
    • Each line is emitted verbatim into the generated bash script.
    • The script runs under set -euo pipefail.
    • Shell quoting and escaping are the user’s responsibility.

x-slurm.submit_args

x-slurm:
  submit_args:
    - "--mail-type=END"
    - "--mail-user=user@example.com"
    - "--reservation=gpu-reservation"
  • Shape: list of strings
  • Default: omitted
  • Notes:
    • Each entry is emitted as #SBATCH {arg}.
    • Entries are rejected if they contain line breaks or null bytes.
    • Entries are not validated against Slurm option syntax.
    • First-class fields reject conflicting raw entries for the same option. Use x-slurm.array, x-slurm.after_job, or x-slurm.dependency instead of raw --array or --dependency.

x-slurm.notify

x-slurm:
  notify:
    email:
      to: user@example.com
      on: [end, fail]
FieldShapeDefaultNotes
notify.emailmappingomittedRequired when notify is present.
notify.email.tostringrequiredRendered as #SBATCH --mail-user.
notify.email.onlist of events[end, fail]Rendered as #SBATCH --mail-type.

Supported events:

EventSlurm mail type
startBEGIN
endEND
failFAIL
allALL

Rules:

  • When on is omitted or empty, defaults to [end, fail].
  • If all is present, it replaces all other events.
  • Cannot be combined with raw --mail-type or --mail-user in x-slurm.submit_args.

x-slurm.cache_dir

  • Shape: string
  • Default precedence: explicit x-slurm.cache_dir, then [profiles.<name>.cache].dir, then [defaults.cache].dir, then $HOME/.cache/hpc-compose.
  • Notes:
    • Relative paths and environment variables are resolved against the compose file directory.
    • Settings cache paths are resolved against the settings base directory.
    • Paths under /tmp, /var/tmp, /private/tmp, and /dev/shm are accepted by parsing and planning, but preflight reports them as unsafe because they are not valid shared-cache locations for login-node prepare plus compute-node reuse.
    • The path must be visible from both the login node and the compute nodes.

Settings example:

[defaults.cache]
dir = "/cluster/shared/hpc-compose-cache"

[profiles.dev.cache]
dir = "/cluster/shared/dev-hpc-compose-cache"

runtime

runtime:
  backend: apptainer
  gpu: auto
FieldShapeDefaultNotes
backendpyxis, apptainer, singularity, or hostpyxisSelects the runtime used inside Slurm steps.
gpuauto, none, or nvidiaautoFor Apptainer/Singularity, controls --nv; auto enables it when Slurm GPU resources are requested.

Backend notes:

  • pyxis uses srun --container-* flags and Enroot .sqsh artifacts.
  • apptainer and singularity build or reuse .sif artifacts and launch them through apptainer exec/run or singularity exec/run inside srun.
  • host runs commands directly under srun; services must set command or entrypoint, and image prepare blocks, service volumes, and x-slurm.mpi.host_mpi.bind_paths are not allowed because no container bind mount is applied.
  • x-enroot.prepare is a Pyxis/Enroot compatibility spelling. Prefer x-runtime.prepare for new specs, especially with Apptainer/Singularity.

x-slurm.scratch, stage_in, stage_out, and burst_buffer

x-slurm:
  scratch:
    scope: shared
    base: /scratch/$USER/jobs
    mount: /scratch
    cleanup: on_success
  stage_in:
    - from: /shared/input
      to: /scratch/input
      mode: rsync
  stage_out:
    - from: /scratch/output
      to: /shared/results/${SLURM_JOB_ID}
      when: always
      mode: copy
  burst_buffer:
    directives:
      - "#BB create_persistent name=data capacity=100G"
  • scratch.base is a host path. scratch.mount is the container-visible mount point.
  • scratch.scope is node_local or shared; cluster profiles can warn when a shared scratch path does not look shared.
  • scratch.cleanup is always, on_success, or never.
  • stage_in runs before services launch; stage_out runs during teardown.
  • mode is rsync or copy; rsync falls back to cp -R when rsync is unavailable.
  • stage_out.when is always, on_success, or on_failure.
  • ${SLURM_JOB_ID} is preserved in scratch and staging paths for runtime expansion.
  • burst_buffer.directives entries are emitted as raw batch-script directives and must start with #BB or #DW .

Multi-node placement rules

  • x-slurm.nodes > 1 reserves a multi-node allocation.
  • Helper services remain single-node steps and are pinned to the allocation’s primary node.
  • When a multi-node job has exactly one service, that service defaults to the distributed full-allocation step.
  • Services may use services.<name>.x-slurm.placement to select explicit allocation node indices.
  • Overlapping explicit placements are rejected unless one side sets allow_overlap: true or uses share_with.
  • Any service spanning more than one node may use readiness.type: sleep or readiness.type: log, or TCP/HTTP readiness only with an explicit non-local host or URL.

x-slurm.metrics

x-slurm:
  metrics:
    interval_seconds: 5
    collectors: [gpu, slurm]
  • Shape: mapping
  • Default: omitted
  • Notes:
    • Omitting the block disables runtime metrics sampling.
    • If the block is present and enabled is omitted, metrics sampling is enabled.
    • interval_seconds defaults to 5 and must be at least 1.
    • collectors defaults to [gpu, slurm].
    • Supported collectors:
      • gpu samples device and process telemetry through nvidia-smi
      • slurm samples job-step CPU and memory data through sstat
    • In multi-node jobs, gpu sampling launches one best-effort sampler task per allocated node and writes node metadata into GPU rows; legacy rows without node remain readable as primary-node samples.
    • Sampler files are written under ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/metrics on the host and are also visible inside containers at /hpc-compose/job/metrics.
    • Diagnostics are written under metrics/diagnostics/ when available, including nvidia-smi topo -m, nvidia-smi -q, selected fabric/GPU environment variables, and best-effort ibstat, ibv_devinfo, ucx_info -v, and fi_info output.

x-slurm.rendezvous

Client-side cross-job discovery resolves records from <cache_dir>/rendezvous/<name>/latest.json before launching services:

x-slurm:
  cache_dir: /cluster/shared/hpc-compose-cache
  rendezvous: model-server

The mapping form supports multiple names and a timeout:

x-slurm:
  rendezvous:
    discover:
      - model-server
      - tokenizer
    timeout_seconds: 60
    require: true

Resolved records become generic variables such as HPC_COMPOSE_RDZV_URL and name-scoped variables such as HPC_COMPOSE_RDZV_MODEL_SERVER_URL.

  • Collector failures are best-effort and do not fail the batch job.

x-slurm.artifacts

x-slurm:
  artifacts:
    collect: always
    export_dir: ./results/${SLURM_JOB_ID}
    paths:
      - /hpc-compose/job/metrics/**
    bundles:
      checkpoints:
        paths:
          - /hpc-compose/job/checkpoints/*.pt
  • Shape: mapping
  • Default: omitted
  • Notes:
    • Omitting the block disables tracked artifact collection.
    • collect defaults to always. Supported values are always, on_success, and on_failure.
    • export_dir is required and is resolved relative to the compose file directory when hpc-compose artifacts runs.
    • ${SLURM_JOB_ID} is preserved in export_dir until hpc-compose artifacts expands it from tracked metadata.
    • paths remains supported as the implicit default bundle.
    • bundles is optional. Bundle names must match [A-Za-z0-9_-]+, and default is reserved for top-level paths.
    • At least one source path must be present in paths or bundles.
    • Every source path must be an absolute container-visible path rooted at /hpc-compose/job.
    • Paths under /hpc-compose/job/artifacts are rejected.
    • Collection happens during batch teardown and is best-effort.
    • Collected payloads and manifest.json are written under ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/artifacts/.
    • hpc-compose artifacts --bundle <name> exports only the selected bundle or bundles.
    • hpc-compose artifacts --tarball also writes one <bundle>.tar.gz archive per exported bundle.
    • Export writes per-bundle provenance metadata under <export_dir>/_hpc-compose/bundles/<bundle>.json.

x-slurm.resume

x-slurm:
  resume:
    path: /shared/$USER/runs/my-run
  • Shape: mapping
  • Default: omitted
  • Notes:
    • Omitting the block disables resume semantics.
    • path is required and must be an absolute host path.
    • /hpc-compose/... paths are rejected because path must point at shared host storage, not a container-visible path.
    • /tmp and /var/tmp technically validate, but preflight warns because those paths are not reliable resume storage.
    • When enabled, hpc-compose mounts path into every service at /hpc-compose/resume.
    • Services also receive HPC_COMPOSE_RESUME_DIR, HPC_COMPOSE_ATTEMPT, and HPC_COMPOSE_IS_RESUME.
    • The canonical resume source is the shared path, not exported artifact bundles.
    • Attempt-specific runtime state moves under ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID}/attempts/<attempt>/, and the top-level logs, metrics, artifacts, and state.json paths continue to point at the latest attempt for compatibility.

Allocation metadata inside services

Every service receives:

  • HPC_COMPOSE_PRIMARY_NODE
  • HPC_COMPOSE_NODE_COUNT
  • HPC_COMPOSE_NODELIST
  • HPC_COMPOSE_NODELIST_FILE
  • HPC_COMPOSE_SERVICE_PRIMARY_NODE
  • HPC_COMPOSE_SERVICE_NODE_COUNT
  • HPC_COMPOSE_SERVICE_NODELIST
  • HPC_COMPOSE_SERVICE_NODELIST_FILE

The allocation-wide data is also written under /hpc-compose/job/allocation/primary_node and /hpc-compose/job/allocation/nodes.txt. Service-scoped node lists are written under /hpc-compose/job/allocation/service-nodelists/.

Multi-node services also receive distributed launch helpers:

  • HPC_COMPOSE_DIST_MASTER_ADDR
  • HPC_COMPOSE_DIST_MASTER_PORT
  • HPC_COMPOSE_DIST_RDZV_ENDPOINT
  • HPC_COMPOSE_DIST_NNODES
  • HPC_COMPOSE_DIST_NODE_RANK
  • HPC_COMPOSE_DIST_LOCAL_RANK
  • HPC_COMPOSE_DIST_GLOBAL_RANK
  • HPC_COMPOSE_DIST_NPROC_PER_NODE
  • HPC_COMPOSE_DIST_WORLD_SIZE
  • HPC_COMPOSE_DIST_HOSTFILE

HPC_COMPOSE_DIST_NPROC_PER_NODE is derived from a service environment override, GPU requests, ntasks_per_node, then 1. The distributed hostfile is written under /hpc-compose/job/allocation/distributed-hostfiles/. When a discovered .hpc-compose/cluster.toml contains [distributed.env], those profile variables are injected only for multi-node services; explicit service environment values win on name conflicts and are still the durable config source.

Services that configure services.<name>.x-slurm.mpi also receive:

  • HPC_COMPOSE_MPI_TYPE
  • HPC_COMPOSE_MPI_PROFILE when x-slurm.mpi.profile is set
  • HPC_COMPOSE_MPI_IMPLEMENTATION when x-slurm.mpi.implementation is set or implied by x-slurm.mpi.profile
  • HPC_COMPOSE_MPI_HOSTFILE

The MPI hostfile is written under /hpc-compose/job/allocation/mpi-hostfiles/ and contains the service’s effective node list. When ntasks_per_node is known, each host line includes slots=<ntasks_per_node>. For a single-node service with ntasks but no ntasks_per_node, the hostfile uses slots=<ntasks>. Otherwise it emits one node per line without slots.

MPI services also forward common PMI, PMIx, and Slurm rank variables into the container through Pyxis --container-env, including PMI_RANK, PMI_SIZE, PMIX_RANK, PMIX_NAMESPACE, SLURM_PROCID, SLURM_LOCALID, SLURM_NODEID, SLURM_NTASKS, and SLURM_TASKS_PER_NODE.

gres and gpus

When both gres and gpus are set at the same level, gres takes priority and gpus is ignored.

Service fields

FieldShapeDefaultNotes
extendsstring or mappingomittedAuthoring-only service template reference. See extends.
imagestringrequired unless runtime.backend: hostCan be a remote image reference, a local .sqsh / .squashfs path for Pyxis, or a local .sif path for Apptainer/Singularity.
commandstring or list of stringsomittedShell form or exec form.
entrypointstring or list of stringsomittedMust use the same form as command when both are present.
scriptstringomittedMulti-line shell script sugar for command: ["/bin/sh", "-lc", script]; mutually exclusive with command and entrypoint.
environmentmapping or list of KEY=VALUE stringsomittedBoth forms normalize to key/value pairs.
moduleslist of stringsomittedList-only shorthand for service x-env.modules.load; cannot be combined with service x-env.modules.
volumeslist of host_path:container_path stringsomittedRuntime bind mounts. Host paths resolve against the compose file directory.
working_dirstringomittedValid only when the service also has an explicit command or entrypoint.
depends_onlist or mappingomittedDependency list with service_started or service_healthy conditions.
readinessmappingomittedPost-launch readiness gate.
healthcheckmappingomittedCompose-compatible sugar for a subset of readiness. Mutually exclusive with readiness.
assertmappingomittedPost-run service contract checked during batch cleanup and surfaced in status.
x-envmappingomittedStructured host-side module, Spack view, and environment setup for this service.
x-slurmmappingomittedPer-service Slurm overrides.
x-runtimemappingomittedBackend-neutral image preparation rules.
x-enrootmappingomittedPyxis/Enroot preparation compatibility alias.

Image rules

Remote images

  • Any image reference without an explicit :// scheme is prefixed with docker://.
  • Explicit schemes are allowed only for docker://, dockerd://, and podman://.
  • Other schemes are rejected.
  • Shell variables in the image string are expanded at plan time.
  • Unset variables expand to empty strings.

Local images

  • Pyxis local image paths must point to .sqsh or .squashfs files.
  • Apptainer/Singularity local image paths must point to .sif files.
  • Relative paths are resolved against the compose file directory.
  • Paths that look like build contexts are rejected.

command, entrypoint, and script

Both fields accept either:

  • a string, interpreted as shell form
  • a list of strings, interpreted as exec form

Rules:

  • If both fields are present, they must use the same form.
  • Mixed string/array combinations are rejected.
  • If neither field is present, the image default entrypoint and command are used.
  • If working_dir is set, at least one of command or entrypoint must also be set.
  • A multi-line string-form command is automatically normalized to ["/bin/sh", "-lc", command] so YAML block scalars run as one shell script.
  • Single-line string-form command remains shell form.
  • script is a convenience field for multi-line shell snippets and normalizes to command: ["/bin/sh", "-lc", script].
  • script cannot be combined with command or entrypoint.

environment

Accepted forms:

environment:
  APP_ENV: prod
  LOG_LEVEL: info
environment:
  - APP_ENV=prod
  - LOG_LEVEL=info

Rules:

  • List items must use KEY=VALUE syntax.
  • .env from the compose file directory is loaded automatically when present.
  • Shell environment variables override .env; .env fills only missing variables.
  • environment, x-runtime.prepare.env, and compatibility x-enroot.prepare.env values support $VAR, ${VAR}, ${VAR:-default}, and ${VAR-default} interpolation.
  • Missing variables without defaults are errors.
  • Use $$ for a literal dollar sign in interpolated fields.
  • String-form shell snippets are still literal. For example, $PATH inside a string-form command is not expanded at plan time.

volumes

Accepted form:

volumes:
  - ./app:/workspace
  - /shared/data:/data
  - /shared/reference:/reference:ro

Rules:

  • Host paths are resolved against the compose file directory.
  • Runtime mounts accept host_path:container_path and host_path:container_path:ro|rw.
  • Pyxis mounts are passed through srun --container-mounts=...; Apptainer/Singularity mounts are passed as --bind.
  • Every service also gets an automatic shared mount at /hpc-compose/job, backed by ${SLURM_SUBMIT_DIR:-$PWD}/.hpc-compose/${SLURM_JOB_ID} on the host.
  • /hpc-compose/job is reserved and cannot be used as an explicit volume destination.

Warning

If a mounted file is a symlink, the symlink target must also be visible from inside the mounted directory. Otherwise the path can exist on the host but fail inside the container.

depends_on

Accepted forms:

depends_on:
  - redis
depends_on:
  redis:
    condition: service_started
depends_on:
  redis:
    condition: service_healthy

Rules:

  • List form means condition: service_started.
  • Map form accepts condition: service_started, condition: service_healthy, and condition: service_completed_successfully.
  • service_healthy requires the dependency service to define readiness.
  • service_started waits only for the dependency process to be launched and still alive.
  • service_healthy waits for the dependency readiness check to succeed.
  • service_completed_successfully waits for the dependency to exit with status 0 before launching the dependent service, which is useful for one-shot DAG stages such as preprocess -> train -> postprocess.

readiness

Supported types:

Sleep

readiness:
  type: sleep
  seconds: 5
  • seconds is required.

TCP

readiness:
  type: tcp
  host: 127.0.0.1
  port: 6379
  timeout_seconds: 30
  • host defaults to 127.0.0.1.
  • timeout_seconds defaults to 60.

Log

readiness:
  type: log
  pattern: "Server started"
  timeout_seconds: 60
  • timeout_seconds defaults to 60.

HTTP

readiness:
  type: http
  url: http://127.0.0.1:8080/health
  status_code: 200
  timeout_seconds: 30
  • status_code defaults to 200.
  • timeout_seconds defaults to 60.
  • The readiness check polls the URL through curl.

healthcheck

healthcheck is accepted as migration sugar and is normalized into the readiness model.

services:
  redis:
    image: redis:7
    healthcheck:
      test: ["CMD", "nc", "-z", "127.0.0.1", "6379"]
      timeout: 30s

Rules:

  • healthcheck and readiness are mutually exclusive.
  • Supported probe forms are a constrained subset:
    • ["CMD", "nc", "-z", HOST, PORT]
    • ["CMD-SHELL", "nc -z HOST PORT"]
    • recognized curl probes against http:// or https:// URLs
    • recognized wget --spider probes against http:// or https:// URLs
  • timeout maps to timeout_seconds.
  • disable: true disables readiness for that service.
  • interval, retries, and start_period are parsed but rejected in v1.
  • HTTP-style healthchecks normalize to readiness.type: http with status_code: 200.

assert

assert defines post-run contracts for a service. Checks run in the rendered script’s cleanup() after services are reaped and before artifact collection or stage-out. Any failed assertion marks the job failed, even when the service uses x-slurm.failure_policy.mode: ignore.

services:
  train:
    image: trainer:latest
    command: python train.py
    assert:
      exit_code: 0
      artifacts_contain: "model/*.pt"
      max_duration_seconds: 7200
FieldShapeNotes
exit_codeinteger 0..255Expected final service exit code.
artifacts_containstringGlob that must match at least one path. Relative patterns resolve under /hpc-compose/job; absolute patterns must stay under /hpc-compose/job.
max_duration_secondspositive integerMaximum wall-clock seconds from first service launch to terminal service exit, including restart time.

At least one assertion field is required. Assertion results are written into runtime state.json; hpc-compose status --format json includes them under each service’s assertions object.

Service-level x-slurm

These fields live under services.<name>.x-slurm.

FieldShapeDefaultNotes
nodespositive integeromittedLegacy shorthand: 1 for a helper step, or the full top-level allocation node count for a full-allocation distributed service. Partial multi-node counts require placement.node_count.
placementmappingomittedExplicit node-index placement inside the allocation.
ntaskspositive integeromittedAdds --ntasks to that service’s srun.
ntasks_per_nodepositive integeromittedAdds --ntasks-per-node to that service’s srun.
cpus_per_taskpositive integeromittedAdds --cpus-per-task to that service’s srun.
gpuspositive integeromittedAdds --gpus when gres is not set.
gresstringomittedAdds --gres to that service’s srun. Takes priority over gpus.
gpus_per_nodepositive integeromittedAdds --gpus-per-node to that service’s srun.
gpus_per_taskpositive integeromittedAdds --gpus-per-task to that service’s srun.
cpus_per_gpupositive integeromittedAdds --cpus-per-gpu to that service’s srun.
mem_per_gpustringomittedAdds --mem-per-gpu to that service’s srun.
gpu_bindstringomittedAdds --gpu-bind to that service’s srun.
cpu_bindstringomittedAdds --cpu-bind to that service’s srun.
mem_bindstringomittedAdds --mem-bind to that service’s srun.
distributionstringomittedAdds --distribution to that service’s srun.
hintstringomittedAdds --hint to that service’s srun.
time_limitstringomittedAdvisory per-service time limit. Validated against Slurm time formats but not passed to srun. inspect surfaces warnings when the limit exceeds allocation time or conflicts with dependencies. Accepted formats: MM, MM:SS, HH:MM:SS, D-HH, D-HH:MM, D-HH:MM:SS.
extra_srun_argslist of stringsomittedAppended directly to the service’s srun command.
mpimappingomittedAdds first-class MPI launch metadata and srun --mpi=<type>.
failure_policymappingomittedPer-service failure handling (fail_job, ignore, restart_on_failure).
prologuestring or mappingomittedPer-service shell hook run before each launch attempt. String shorthand runs on the host.
epiloguestring or mappingomittedPer-service shell hook run after each service exit attempt. String shorthand runs on the host.
hookslist of mappingsomittedHost-side event hooks for failure-policy transitions such as accepted restarts and crash-loop window exhaustion.
rendezvousmappingomittedProvider registration config for cross-job service discovery.

services.<name>.x-slurm.rendezvous

Provider-side registration writes an atomic shared-cache record after readiness succeeds when readiness is configured:

services:
  model:
    image: python:3.12-slim
    command: python -m http.server 8000
    readiness:
      type: tcp
      port: 8000
    x-slurm:
      rendezvous:
        register:
          name: model-server
          port: 8000
          protocol: http
          path: /
          ttl_seconds: 3600

Names are single safe path components using ASCII letters, digits, ., _, and -. Rendezvous is same-cluster shared-storage coordination only; it does not provide DNS, tunneling, or authentication.

services.<name>.x-slurm.prologue / epilogue

services:
  trainer:
    image: trainer:latest
    command: python train.py
    x-slurm:
      prologue: |
        module load cuda/12.1
        nvidia-smi
      epilogue:
        context: container
        script: |
          tar czf /shared/logs-${SLURM_JOB_ID}.tar.gz /hpc-compose/job/logs
  • Shape: either a block string, or a mapping with script and optional context.
  • context: host (default) or container.
  • Hook scripts are emitted as trusted shell and are not Compose-interpolated, so runtime variables such as ${SLURM_JOB_ID} are preserved.
  • Hooks run once per service launch attempt, including restart_on_failure retries.
  • Host hooks run in the generated batch supervisor on the allocation’s primary execution context. Container hooks wrap the service command inside the container and can use /hpc-compose/job.
  • Hook stdout/stderr is written to the service log.
  • Container hooks require an explicit command or entrypoint; image-default services cannot be wrapped.

services.<name>.x-slurm.hooks

services:
  trainer:
    image: trainer:latest
    command: python train.py
    x-slurm:
      failure_policy:
        mode: restart_on_failure
      hooks:
        - on: restart
          context: host
          script: |
            echo "Service $HPC_COMPOSE_SERVICE_NAME restarted (attempt $HPC_COMPOSE_ATTEMPT)" >> /shared/restart.log
        - on: window_exhausted
          script: |
            curl -X POST "$WEBHOOK_URL" -d '{"alert": "crash loop detected"}'
  • Shape: list of mappings with on, script, and optional context.
  • on: restart or window_exhausted.
  • context: host only. Omitted context defaults to host; container is rejected for event hooks.
  • restart runs after a non-zero exit has passed the lifetime and rolling-window guards, after restart counters are recorded, and before backoff/relaunch.
  • window_exhausted runs only when the rolling-window guard blocks another restart. It does not run for lifetime max_restarts exhaustion.
  • Event hooks are best-effort observability hooks. A non-zero hook exit is logged to the service log and does not change the restart or failure-policy outcome.
  • Event hook scripts are emitted as trusted shell and are not Compose-interpolated.
  • Event hooks receive HPC_COMPOSE_HOOK_PHASE, HPC_COMPOSE_SERVICE_NAME, HPC_COMPOSE_SERVICE_LOG, HPC_COMPOSE_SERVICE_EXIT_CODE, HPC_COMPOSE_ATTEMPT, HPC_COMPOSE_RESTART_COUNT, HPC_COMPOSE_MAX_RESTARTS, HPC_COMPOSE_WINDOW_SECONDS, HPC_COMPOSE_MAX_RESTARTS_IN_WINDOW, and HPC_COMPOSE_RESTART_FAILURES_IN_WINDOW.

services.<name>.x-slurm.placement

services:
  a:
    image: app:a
    x-slurm:
      placement: { node_range: "0-3" }
  b:
    image: app:b
    x-slurm:
      placement: { node_range: "4-7" }
  ps:
    image: app:b
    x-slurm:
      placement: { share_with: b }

Exactly one selector is required:

FieldShapeNotes
node_rangestringZero-based inclusive allocation indices, for example "0-3" or "0-3,6".
node_countintegerSelects this many eligible nodes starting at start_index, default 0.
node_percentinteger 1..100Selects ceil(percent * eligible_nodes / 100), minimum one node.
share_withstringReuses another service’s resolved node set for explicit co-location.

Optional fields:

  • start_index: applies to node_count and node_percent.
  • exclude: zero-based allocation indices removed from the eligible set and passed to srun --exclude.
  • allow_overlap: permits intentional overlap with another explicit placement.

Node indices are resolved against the Slurm allocation order from scontrol show hostnames "$SLURM_JOB_NODELIST". At runtime, containers receive both allocation-wide metadata (HPC_COMPOSE_NODELIST) and service-scoped metadata (HPC_COMPOSE_SERVICE_NODELIST, HPC_COMPOSE_SERVICE_NODELIST_FILE, HPC_COMPOSE_SERVICE_PRIMARY_NODE, HPC_COMPOSE_SERVICE_NODE_COUNT).

services.<name>.x-slurm.mpi

services:
  trainer:
    image: mpi-image:latest
    command: /usr/local/bin/train
    x-slurm:
      nodes: 2
      ntasks_per_node: 4
      mpi:
        type: pmix_v4
        profile: openmpi
        implementation: openmpi
        launcher: srun
        expected_ranks: 8
        host_mpi:
          bind_paths:
            - /opt/site/openmpi:/opt/site/openmpi:ro
          env:
            MPI_DIR: /opt/site/openmpi
  • Shape: mapping
  • Default: omitted
  • type is an exact srun --mpi=<type> plugin token. Common values include pmix, pmix_v4, pmi2, pmi1, and openmpi; use srun --mpi=list or hpc-compose doctor cluster-report on the target cluster to discover site-specific values.
  • Notes:
    • Rendered as --mpi=<type> on the service’s srun command.
    • profile is optional compatibility metadata used for validation, cluster-profile diagnostics, and doctor mpi-smoke output. Supported values are openmpi, mpich, and intel_mpi.
    • profile does not auto-select or rewrite type; use the exact token that your cluster reports through srun --mpi=list.
    • launcher defaults to srun; v1 rejects other launchers.
    • implementation is optional metadata for diagnostics. Supported values are openmpi, mpich, intel_mpi, mvapich2, cray_mpi, hpe_mpi, and unknown.
    • When both profile and implementation are set, they must describe the same MPI family.
    • expected_ranks, when set, must match the resolved Slurm task geometry.
    • host_mpi.bind_paths uses host_path:container_path[:ro|rw] syntax, is validated like service volumes, and is automatically mounted into the service.
    • host_mpi.env is injected into the service environment after normal service environment entries.
    • Cannot be combined with raw --mpi... entries in extra_srun_args.
    • MPI services receive HPC_COMPOSE_MPI_TYPE and HPC_COMPOSE_MPI_HOSTFILE.
    • MPI services also receive HPC_COMPOSE_MPI_PROFILE when profile is set and HPC_COMPOSE_MPI_IMPLEMENTATION when implementation is set or implied by profile.
    • hpc-compose doctor mpi-smoke -f compose.yaml --service trainer renders a smoke probe for the service; add --submit to run it through Slurm. hpc-compose doctor fabric-smoke -f compose.yaml --service trainer --checks auto extends the same pattern with NCCL, UCX, OFI, and InfiniBand diagnostics when available. Smoke plans keep allocation and MPI launch settings, but strip application workflow blocks such as setup, scratch staging, resume metadata, artifacts, and burst-buffer directives.

Profile-specific compatibility checks are intentionally conservative:

  • profile: openmpi expects a PMIx-capable type such as pmix or pmix_v*, with pmi2 accepted as a fallback.
  • profile: mpich expects pmi2 or a PMIx-capable setup.
  • profile: intel_mpi expects pmi2; preflight and doctor warn when no I_MPI_PMI_LIBRARY or cluster-profile PMI2 library is visible.

services.<name>.x-slurm.failure_policy

services:
  worker:
    image: python:3.11-slim
    x-slurm:
      failure_policy:
        mode: restart_on_failure
        max_restarts: 3
        backoff_seconds: 5
        window_seconds: 60
        max_restarts_in_window: 3
FieldShapeDefaultNotes
modefail_job | ignore | restart_on_failurefail_jobfail_job keeps fail-fast behavior. ignore keeps the job running after non-zero exits. restart_on_failure restarts on non-zero exits only.
max_restartsinteger3 when mode=restart_on_failureRequired to be at least 1 after defaults are applied. Valid only for restart_on_failure.
backoff_secondsinteger5 when mode=restart_on_failureFixed delay between restart attempts. Required to be at least 1 after defaults are applied. Valid only for restart_on_failure.
window_secondsinteger60 when mode=restart_on_failureRolling window for counting restart-triggering exits. Required to be at least 1 after defaults are applied. Valid only for restart_on_failure.
max_restarts_in_windowintegerresolved max_restarts when mode=restart_on_failureMaximum restart-triggering exits allowed within window_seconds. Required to be at least 1 after defaults are applied. Valid only for restart_on_failure.

Rules:

  • In a multi-node allocation, implicit helper services are pinned to HPC_COMPOSE_PRIMARY_NODE.
  • Explicit service placements may not overlap unless one side sets placement.allow_overlap: true or uses placement.share_with.
  • max_restarts, backoff_seconds, window_seconds, and max_restarts_in_window are rejected unless mode: restart_on_failure.
  • Restart attempts count relaunches after the initial launch.
  • Restarts trigger only for non-zero exits.
  • restart_on_failure enforces both a lifetime cap (max_restarts) and a rolling-window cap (max_restarts_in_window within window_seconds) during one live batch-script execution.
  • If you omit the rolling-window fields, restart_on_failure still enables default crash-loop protection with window_seconds: 60 and max_restarts_in_window: <resolved max_restarts>.
  • Services configured with mode: ignore cannot be used as dependencies in depends_on.

Examples:

Use the defaults when you only need bounded retries:

services:
  worker:
    image: python:3.11-slim
    x-slurm:
      failure_policy:
        mode: restart_on_failure

That resolves to:

  • max_restarts: 3
  • backoff_seconds: 5
  • window_seconds: 60
  • max_restarts_in_window: 3

Use explicit fields when you need a larger lifetime budget but still want a tighter crash-loop guard:

services:
  worker:
    image: python:3.11-slim
    x-slurm:
      failure_policy:
        mode: restart_on_failure
        max_restarts: 8
        backoff_seconds: 10
        window_seconds: 60
        max_restarts_in_window: 3

Semantics:

  • The initial launch does not count as a restart.
  • restart_count counts granted relaunches after the initial launch.
  • max_restarts_in_window counts restart-triggering non-zero exits whose timestamps still satisfy now - event < window_seconds.
  • If a non-zero exit would exceed the rolling-window cap, the job fails immediately and that blocked exit is not recorded as a consumed restart.
  • Successful exits do not trigger restarts and do not add entries to the rolling window.
  • The rolling window is attempt-local to one live batch-script execution. It is not hydrated from prior state.json, resume metadata, or Slurm requeue history.
  • x-slurm.hooks can observe accepted restart events and blocked window_exhausted events without changing the policy decision.

Tracked state:

  • status --format json includes failure_policy_mode, restart_count, max_restarts, window_seconds, max_restarts_in_window, restart_failures_in_window, and last_exit_code for each tracked service.
  • Text status renders the live rolling-window budget as window=<current>/<max>@<seconds>s.

Unknown keys under top-level x-slurm or per-service x-slurm cause hard errors.

x-runtime.prepare and x-enroot.prepare

x-runtime.prepare lets a service build a prepared runtime image from its base image before submission. x-enroot.prepare remains accepted as a Pyxis-only compatibility spelling.

services:
  app:
    image: python:3.11-slim
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir numpy pandas
        mounts:
          - ./requirements.txt:/tmp/requirements.txt
        env:
          PIP_CACHE_DIR: /tmp/pip-cache
        root: true
FieldShapeDefaultNotes
commandslist of stringsrequired when prepare is presentEach command runs through the selected backend’s writable prepare flow.
mountslist of host_path:container_path stringsomittedVisible only during prepare. Relative host paths resolve against the compose file directory.
envmapping or list of KEY=VALUE stringsomittedPassed only during prepare. Values support the same interpolation rules as environment.
rootbooleantrueControls whether prepare commands request root/fakeroot behavior where the backend supports it.

Rules:

  • If x-runtime.prepare or x-enroot.prepare is present, commands cannot be empty.
  • A service may not set both spellings.
  • x-enroot.prepare is rejected when runtime.backend is not pyxis.
  • If prepare.mounts is non-empty, the service rebuilds on every prepare or up.
  • Remote base images are imported under cache_dir/base.
  • Prepared images are exported under cache_dir/prepared.
  • Unknown keys under x-runtime, x-enroot, or prepare cause hard errors.

Unsupported Compose keys

These keys are rejected with explicit messages:

  • build
  • ports
  • networks
  • network_mode
  • Compose restart (use services.<name>.x-slurm.failure_policy)
  • deploy

Any other unknown key at the service level is also rejected.

Migration to Spec v2

This page is reserved for the first breaking hpc-compose spec release. Current hpc-compose builds support spec version 1; use version: "1" or omit the field for v1 specs.

Known v2 migration hint:

  • steps was renamed to services in v2. Rename top-level steps: to services: before validating with a v2-aware hpc-compose build.

Example Source

This appendix embeds the runnable repository example YAML files directly from examples/.

Some repository examples keep an explicit ${CACHE_DIR:-/cluster/shared/hpc-compose-cache} for portability, while starter examples rely on the settings/builtin cache default. Before running on a real cluster, configure a shared path visible from both the submission host and the compute nodes:

export CACHE_DIR=/cluster/shared/hpc-compose-cache
mkdir -p "$CACHE_DIR"
test -w "$CACHE_DIR"

App Redis Worker

Source: examples/app-redis-worker.yaml

name: redis-demo

x-slurm:
  job_name: redis-demo
  time: "00:15:00"
  mem: 8G
  cpus_per_task: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  redis:
    image: redis:7
    command: redis-server --save "" --appendonly no
    readiness:
      type: tcp
      host: 127.0.0.1
      port: 6379
      timeout_seconds: 30
    x-slurm:
      cpus_per_task: 1

  worker:
    image: redis:7
    depends_on:
      redis:
        condition: service_healthy
    command:
      - /bin/sh
      - -lc
      - |
        redis-cli -h 127.0.0.1 ping
        while true; do
          redis-cli -h 127.0.0.1 incr jobs
          sleep 2
        done
    x-slurm:
      cpus_per_task: 1

Canary Right Size

Source: examples/canary-right-size.yaml

name: canary-right-size

x-slurm:
  job_name: canary-right-size
  partition: gpu
  time: "04:00:00"
  mem: 64G
  gpus: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}
  metrics:
    enabled: true
    interval_seconds: 10

services:
  trainer:
    image: python:3.12-slim
    command:
      - /bin/sh
      - -lc
      - |
        python - <<'PY'
        import time
        data = bytearray(512 * 1024 * 1024)
        print(f"allocated {len(data)} bytes")
        time.sleep(20)
        PY
    x-slurm:
      cpus_per_task: 8

Dev Python App

Source: examples/dev-python-app.yaml

name: dev-python-app

x-slurm:
  job_name: dev-python-app
  time: "00:30:00"
  mem: 8G
  cpus_per_task: 2

services:
  app:
    image: python:3.11-slim
    working_dir: /workspace
    volumes:
      - ./app:/workspace
    command:
      - python
      - -m
      - main
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir fastapi uvicorn openai

Dev Python Smoke

Source: examples/dev-python-smoke.yaml

name: dev-python-smoke

x-slurm:
  job_name: dev-python-smoke
  time: "00:01:00"
  mem: 2G
  cpus_per_task: 1

services:
  app:
    image: python:3.11-slim
    working_dir: /workspace
    volumes:
      - ./app:/workspace
    command:
      - python
      - -c
      - "import main; print('smoke ok', flush=True)"
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir fastapi uvicorn openai

Fairseq Preprocess

Source: examples/fairseq-preprocess.yaml

name: fairseq-preprocess

x-slurm:
  job_name: fairseq-preprocess
  time: "02:00:00"
  mem: 32G
  cpus_per_task: 8
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  preprocess:
    image: python:3.11-slim
    volumes:
      - /shared/$USER/data/raw:/data/raw
      - /shared/$USER/data/processed:/data/processed
    environment:
      INPUT_DIR: /data/raw
      OUTPUT_DIR: /data/processed
      NUM_WORKERS: "8"
    command:
      - /bin/sh
      - -lc
      - |
        python -c "
        import os, json, hashlib, multiprocessing
        from pathlib import Path
        from concurrent.futures import ProcessPoolExecutor

        input_dir = Path(os.environ['INPUT_DIR'])
        output_dir = Path(os.environ['OUTPUT_DIR'])
        num_workers = int(os.environ['NUM_WORKERS'])
        output_dir.mkdir(parents=True, exist_ok=True)

        files = sorted(input_dir.glob('*.txt'))
        if not files:
            print(f'No .txt files found in {input_dir}')
            exit(1)
        print(f'Found {len(files)} input files')

        def process_file(path):
            text = path.read_text(encoding='utf-8', errors='replace')
            lines = [l.strip() for l in text.splitlines() if l.strip()]
            tokens = []
            for line in lines:
                tokens.extend(line.lower().split())
            out = output_dir / f'{path.stem}.jsonl'
            with open(out, 'w') as f:
                for i, line in enumerate(lines):
                    record = {
                        'id': f'{path.stem}_{i}',
                        'text': line,
                        'tokens': len(line.split()),
                    }
                    f.write(json.dumps(record) + '\n')
            return path.name, len(lines), len(tokens)

        with ProcessPoolExecutor(max_workers=num_workers) as pool:
            results = list(pool.map(process_file, files))

        total_lines = sum(r[1] for r in results)
        total_tokens = sum(r[2] for r in results)
        for name, lines, tokens in results:
            print(f'  {name}: {lines} lines, {tokens} tokens')
        print(f'Total: {total_lines} lines, {total_tokens} tokens across {len(files)} files')

        manifest = {
            'files': len(files),
            'total_lines': total_lines,
            'total_tokens': total_tokens,
        }
        (output_dir / 'manifest.json').write_text(json.dumps(manifest, indent=2))
        print('Preprocessing complete')
        "
    x-slurm:
      cpus_per_task: 8

Llama App

Source: examples/llama-app.yaml

name: llama-stack

x-slurm:
  job_name: llama-stack
  time: "02:00:00"
  mem: 32G
  cpus_per_task: 8
  gpus: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  llama:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    volumes:
      - ./models:/models
    command:
      - /bin/sh
      - -lc
      - exec /app/llama-server -m /models/model.gguf --host 0.0.0.0 --port 8080
    readiness:
      type: tcp
      host: 127.0.0.1
      port: 8080
      timeout_seconds: 60
    x-slurm:
      gpus: 1
      cpus_per_task: 4

  app:
    image: python:3.11-slim
    depends_on:
      llama:
        condition: service_healthy
    working_dir: /workspace
    volumes:
      - ./app:/workspace
    environment:
      LLM_BASE_URL: http://127.0.0.1:8080/v1
    command:
      - python
      - -m
      - main
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir openai fastapi uvicorn
    x-slurm:
      cpus_per_task: 2

Llama UV Worker

Source: examples/llama-uv-worker.yaml

name: llama-uv-worker

x-slurm:
  job_name: llama-uv-worker
  time: "01:00:00"
  mem: 32G
  cpus_per_task: 8
  gpus: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  llama:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    environment:
      GGUF_MODEL_PATH: /models/model.gguf
    volumes:
      - ./models:/models
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        rm -f /hpc-compose/job/request.done
        /app/llama-server -m "$$GGUF_MODEL_PATH" --host 0.0.0.0 --port 8080 &
        server_pid=$$!
        while [ ! -f /hpc-compose/job/request.done ]; do
          if ! kill -0 "$$server_pid" 2>/dev/null; then
            wait "$$server_pid"
            exit $$?
          fi
          sleep 1
        done
        kill "$$server_pid" 2>/dev/null || true
        wait "$$server_pid" || true
    readiness:
      type: log
      pattern: "main: model loaded"
      timeout_seconds: 300
    x-slurm:
      gpus: 1
      cpus_per_task: 4

  worker:
    image: python:3.11-slim
    working_dir: /workspace
    volumes:
      - ./llama-uv-worker:/workspace
    depends_on:
      llama:
        condition: service_healthy
    environment:
      OPENAI_BASE_URL: http://127.0.0.1:8080/v1
      MODEL_NAME: local-model
      REQUEST_DONE_PATH: /hpc-compose/job/request.done
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        UV_CACHE_DIR=/hpc-compose/job/.uv-cache uv run worker.py
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir uv
    x-slurm:
      cpus_per_task: 2

LLM Curl Workflow

Source: examples/llm-curl-workflow.yaml

name: llm-curl-workflow

x-slurm:
  job_name: llm-curl-workflow
  time: "00:30:00"
  mem: 32G
  cpus_per_task: 8
  gpus: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  llm:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    volumes:
      - ./models:/models
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        rm -f /hpc-compose/job/request.done
        /app/llama-server -m /models/model.gguf --host 0.0.0.0 --port 8080 &
        server_pid=$$!
        while [ ! -f /hpc-compose/job/request.done ]; do
          if ! kill -0 "$$server_pid" 2>/dev/null; then
            wait "$$server_pid"
            exit $$?
          fi
          sleep 1
        done
        kill "$$server_pid" 2>/dev/null || true
        wait "$$server_pid" || true
    readiness:
      type: log
      pattern: "main: model loaded"
      timeout_seconds: 300
    x-slurm:
      gpus: 1
      cpus_per_task: 4

  curl_client:
    image: debian:bookworm-slim
    depends_on:
      llm:
        condition: service_healthy
    environment:
      LLM_BASE_URL: http://127.0.0.1:8080
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        cat >/tmp/request.json <<'JSON'
        {
          "model": "local-model",
          "messages": [
            {
              "role": "system",
              "content": "You are a concise assistant."
            },
            {
              "role": "user",
              "content": "Explain what readiness checks do in one sentence."
            }
          ],
          "temperature": 0.2,
          "max_tokens": 64
        }
        JSON
        echo "Sending test request to $$LLM_BASE_URL/v1/chat/completions"
        curl --fail --show-error --silent \
          -H 'Content-Type: application/json' \
          --data @/tmp/request.json \
          "$$LLM_BASE_URL/v1/chat/completions"
        touch /hpc-compose/job/request.done
    x-runtime:
      prepare:
        commands:
          - apt-get update
          - apt-get install -y --no-install-recommends bash ca-certificates curl
          - rm -rf /var/lib/apt/lists/*
    x-slurm:
      cpus_per_task: 1

LLM Curl Workflow Workdir

Source: examples/llm-curl-workflow-workdir.yaml

name: llm-curl-workflow

x-slurm:
  job_name: llm-curl-workflow
  time: "00:30:00"
  mem: 32G
  cpus_per_task: 8
  gpus: 1
  # Uncomment if your cluster requires them.
  # partition: gpu
  # account: my-project
  # Set CACHE_DIR to a path visible from the submission host and compute nodes.
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  llm:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    environment:
      MODEL_FILE: model.gguf
    volumes:
      - $HOME/models:/models
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        rm -f /hpc-compose/job/request.done
        /app/llama-server -m /models/$$MODEL_FILE --host 0.0.0.0 --port 8080 &
        server_pid=$$!
        while [ ! -f /hpc-compose/job/request.done ]; do
          if ! kill -0 "$$server_pid" 2>/dev/null; then
            wait "$$server_pid"
            exit $$?
          fi
          sleep 1
        done
        kill "$$server_pid" 2>/dev/null || true
        wait "$$server_pid" || true
    readiness:
      type: log
      pattern: "main: model loaded"
      timeout_seconds: 300
    x-slurm:
      gpus: 1
      cpus_per_task: 4

  curl_client:
    image: debian:bookworm-slim
    depends_on:
      llm:
        condition: service_healthy
    environment:
      LLM_BASE_URL: http://127.0.0.1:8080
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        cat >/tmp/request.json <<'JSON'
        {
          "model": "local-model",
          "messages": [
            {
              "role": "system",
              "content": "You are a concise assistant."
            },
            {
              "role": "user",
              "content": "Explain what readiness checks do in one sentence."
            }
          ],
          "temperature": 0.2,
          "max_tokens": 64
        }
        JSON
        echo "Sending test request to $$LLM_BASE_URL/v1/chat/completions"
        curl --fail --show-error --silent \
          -H 'Content-Type: application/json' \
          --data @/tmp/request.json \
          "$$LLM_BASE_URL/v1/chat/completions"
        touch /hpc-compose/job/request.done
    x-runtime:
      prepare:
        commands:
          - apt-get update
          - apt-get install -y --no-install-recommends bash ca-certificates curl
          - rm -rf /var/lib/apt/lists/*
    x-slurm:
      cpus_per_task: 1

Minimal Batch

Source: examples/minimal-batch.yaml

name: minimal-batch

x-slurm:
  job_name: minimal-batch
  time: "00:10:00"
  mem: 4G
  cpus_per_task: 2

services:
  app:
    image: python:3.11-slim
    command: python -c "print('Hello from Slurm!')"

MPI Hello

Source: examples/mpi-hello.yaml

name: mpi-hello

x-slurm:
  job_name: mpi-hello
  time: "00:15:00"
  mem: 8G
  cpus_per_task: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  mpi:
    image: debian:bookworm-slim
    command:
      - /bin/sh
      - -lc
      - /usr/local/bin/mpi_hello
    x-runtime:
      prepare:
        commands:
          - apt-get update
          - apt-get install -y --no-install-recommends libopenmpi-dev openmpi-bin gcc
          - |
            cat > /tmp/hello.c << 'EOF'
            #include <mpi.h>
            #include <stdio.h>
            int main(int argc, char **argv) {
                MPI_Init(&argc, &argv);
                int rank, size;
                MPI_Comm_rank(MPI_COMM_WORLD, &rank);
                MPI_Comm_size(MPI_COMM_WORLD, &size);
                printf("Hello from rank %d of %d\n", rank, size);
                MPI_Finalize();
                return 0;
            }
            EOF
            mpicc /tmp/hello.c -o /usr/local/bin/mpi_hello
          - rm -rf /var/lib/apt/lists/* /tmp/hello.c
    x-slurm:
      ntasks: 4
      cpus_per_task: 4
      mpi:
        type: pmix
        profile: openmpi
        implementation: openmpi

MPI PMIx v4 Host MPI

Source: examples/mpi-pmix-v4-host-mpi.yaml

name: mpi-pmix-v4-host-mpi

runtime:
  backend: pyxis

x-slurm:
  job_name: mpi-pmix-v4-host-mpi
  time: "00:20:00"
  nodes: 2
  ntasks_per_node: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  mpi:
    image: debian:bookworm-slim
    command:
      - /bin/sh
      - -lc
      - |
        echo "mpi_type=$$HPC_COMPOSE_MPI_TYPE"
        echo "hostfile=$$HPC_COMPOSE_MPI_HOSTFILE"
        cat "$$HPC_COMPOSE_MPI_HOSTFILE"
        /opt/site/openmpi/bin/mpirun --version || true
    x-slurm:
      nodes: 2
      ntasks_per_node: 2
      mpi:
        type: pmix_v4
        profile: openmpi
        implementation: openmpi
        launcher: srun
        expected_ranks: 4
        host_mpi:
          bind_paths:
            - /opt/site/openmpi:/opt/site/openmpi:ro
          env:
            MPI_HOME: /opt/site/openmpi

Multi Node MPI

Source: examples/multi-node-mpi.yaml

name: multi-node-mpi

x-slurm:
  job_name: multi-node-mpi
  time: "00:20:00"
  nodes: 2
  ntasks_per_node: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  bootstrap:
    image: alpine:3.20
    command:
      - /bin/sh
      - -lc
      - |
        echo "primary=$(cat /hpc-compose/job/allocation/primary_node)"
        sleep 30
    readiness:
      type: sleep
      seconds: 1
    x-slurm:
      nodes: 1

  mpi:
    image: python:3.11-slim
    depends_on:
      bootstrap:
        condition: service_healthy
    command:
      - /bin/sh
      - -lc
      - |
        echo "primary=$(cat /hpc-compose/job/allocation/primary_node)"
        echo "nodes=$(tr '\n' ' ' < /hpc-compose/job/allocation/nodes.txt)"
        echo "mpi_hostfile=$$HPC_COMPOSE_MPI_HOSTFILE"
        cat "$$HPC_COMPOSE_MPI_HOSTFILE"
        python - <<'PY'
        import os
        print("mpi placeholder")
        print("node_count", os.environ["HPC_COMPOSE_NODE_COUNT"])
        print("mpi_type", os.environ["HPC_COMPOSE_MPI_TYPE"])
        PY
    readiness:
      type: sleep
      seconds: 2
    x-slurm:
      nodes: 2
      ntasks_per_node: 2
      mpi:
        type: pmix
        profile: openmpi
        implementation: openmpi
        launcher: srun
        expected_ranks: 4

Multi Node Partitioned

Source: examples/multi-node-partitioned.yaml

name: multi-node-partitioned

x-slurm:
  job_name: multi-node-partitioned
  time: "00:20:00"
  nodes: 8
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  service-a:
    image: alpine:3.20
    command:
      - /bin/sh
      - -lc
      - |
        echo "service-a nodes=$$HPC_COMPOSE_SERVICE_NODELIST"
        sleep 30
    readiness:
      type: sleep
      seconds: 1
    x-slurm:
      placement:
        node_range: "0-3"

  service-b:
    image: alpine:3.20
    command:
      - /bin/sh
      - -lc
      - |
        echo "service-b nodes=$$HPC_COMPOSE_SERVICE_NODELIST"
        sleep 30
    readiness:
      type: sleep
      seconds: 1
    x-slurm:
      placement:
        node_range: "4-7"

  parameter-server:
    image: alpine:3.20
    depends_on:
      service-b:
        condition: service_healthy
    command:
      - /bin/sh
      - -lc
      - |
        echo "co-located with service-b on $$HPC_COMPOSE_SERVICE_NODELIST"
        sleep 30
    readiness:
      type: sleep
      seconds: 1
    x-slurm:
      placement:
        share_with: service-b

  monitor:
    image: alpine:3.20
    command:
      - /bin/sh
      - -lc
      - |
        echo "monitor nodes=$$HPC_COMPOSE_SERVICE_NODELIST"
        sleep 30
    x-slurm:
      placement:
        node_percent: 25
        allow_overlap: true

Multi Node Torchrun

Source: examples/multi-node-torchrun.yaml

name: multi-node-torchrun

x-slurm:
  job_name: multi-node-torchrun
  time: "04:00:00"
  nodes: 2
  gpus_per_node: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  trainer:
    image: pytorch/pytorch:2.4.1-cuda12.1-cudnn9-runtime
    command:
      - /bin/sh
      - -lc
      - |
        echo "master=$$HPC_COMPOSE_DIST_MASTER_ADDR"
        echo "nodes=$$HPC_COMPOSE_SERVICE_NODELIST"
        echo "node_rank=$$HPC_COMPOSE_DIST_NODE_RANK"
        torchrun \
          --nnodes="$$HPC_COMPOSE_DIST_NNODES" \
          --nproc-per-node="$$HPC_COMPOSE_DIST_NPROC_PER_NODE" \
          --node-rank="$$HPC_COMPOSE_DIST_NODE_RANK" \
          --rdzv-backend=c10d \
          --rdzv-endpoint="$$HPC_COMPOSE_DIST_RDZV_ENDPOINT" \
          train.py
    readiness:
      type: sleep
      seconds: 5
    x-slurm:
      nodes: 2
      ntasks_per_node: 1
      gpus_per_node: 4

Multi Node Deepspeed

Source: examples/multi-node-deepspeed.yaml

name: multi-node-deepspeed

x-slurm:
  job_name: multi-node-deepspeed
  time: "04:00:00"
  nodes: 2
  gpus_per_node: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  trainer:
    image: pytorch/pytorch:2.4.1-cuda12.1-cudnn9-runtime
    command:
      - /bin/sh
      - -lc
      - |
        echo "master=$$HPC_COMPOSE_DIST_MASTER_ADDR"
        echo "nodes=$$HPC_COMPOSE_SERVICE_NODELIST"
        echo "node_rank=$$HPC_COMPOSE_DIST_NODE_RANK"
        deepspeed \
          --no_ssh \
          --hostfile "$$HPC_COMPOSE_DIST_HOSTFILE" \
          --num_nodes "$$HPC_COMPOSE_DIST_NNODES" \
          --num_gpus "$$HPC_COMPOSE_DIST_NPROC_PER_NODE" \
          --node_rank "$$HPC_COMPOSE_DIST_NODE_RANK" \
          --master_addr "$$HPC_COMPOSE_DIST_MASTER_ADDR" \
          --master_port "$$HPC_COMPOSE_DIST_MASTER_PORT" \
          train.py
    readiness:
      type: sleep
      seconds: 5
    x-slurm:
      nodes: 2
      ntasks_per_node: 1
      gpus_per_node: 4

Multi Node Accelerate

Source: examples/multi-node-accelerate.yaml

name: multi-node-accelerate

x-slurm:
  job_name: multi-node-accelerate
  time: "04:00:00"
  nodes: 2
  gpus_per_node: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  trainer:
    image: pytorch/pytorch:2.4.1-cuda12.1-cudnn9-runtime
    command:
      - /bin/sh
      - -lc
      - |
        echo "master=$$HPC_COMPOSE_DIST_MASTER_ADDR"
        echo "nodes=$$HPC_COMPOSE_SERVICE_NODELIST"
        echo "machine_rank=$$HPC_COMPOSE_DIST_NODE_RANK"
        accelerate launch \
          --multi_gpu \
          --num_machines "$$HPC_COMPOSE_DIST_NNODES" \
          --num_processes "$$HPC_COMPOSE_DIST_WORLD_SIZE" \
          --machine_rank "$$HPC_COMPOSE_DIST_NODE_RANK" \
          --main_process_ip "$$HPC_COMPOSE_DIST_MASTER_ADDR" \
          --main_process_port "$$HPC_COMPOSE_DIST_MASTER_PORT" \
          train.py
    readiness:
      type: sleep
      seconds: 5
    x-slurm:
      nodes: 2
      ntasks_per_node: 1
      gpus_per_node: 4

Multi Node Horovod

Source: examples/multi-node-horovod.yaml

name: multi-node-horovod

x-slurm:
  job_name: multi-node-horovod
  time: "04:00:00"
  nodes: 2
  gpus_per_node: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  trainer:
    image: horovod/horovod:latest
    command:
      - /bin/sh
      - -lc
      - |
        echo "rank=$$SLURM_PROCID local_rank=$$SLURM_LOCALID world=$$SLURM_NTASKS"
        python train_horovod.py
    readiness:
      type: sleep
      seconds: 5
    x-slurm:
      nodes: 2
      ntasks_per_node: 4
      gpus_per_node: 4
      mpi:
        type: pmix
        profile: openmpi
        expected_ranks: 8

Multi Node Jax

Source: examples/multi-node-jax.yaml

name: multi-node-jax

x-slurm:
  job_name: multi-node-jax
  time: "04:00:00"
  nodes: 2
  gpus_per_node: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  trainer:
    image: jaxai/jax:latest
    command:
      - /bin/sh
      - -lc
      - |
        echo "coordinator=$$HPC_COMPOSE_DIST_RDZV_ENDPOINT"
        echo "process_id=$$HPC_COMPOSE_DIST_NODE_RANK processes=$$HPC_COMPOSE_DIST_NNODES"
        python train_jax.py
    readiness:
      type: sleep
      seconds: 5
    x-slurm:
      nodes: 2
      ntasks_per_node: 1
      gpus_per_node: 4

Nccl Tests

Source: examples/nccl-tests.yaml

name: nccl-tests

x-slurm:
  job_name: nccl-tests
  time: "00:30:00"
  nodes: 2
  gpus_per_node: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  all-reduce:
    image: nvcr.io/nvidia/pytorch:24.08-py3
    command:
      - /bin/sh
      - -lc
      - |
        echo "rank=$$SLURM_PROCID local_rank=$$SLURM_LOCALID world=$$SLURM_NTASKS"
        if command -v all_reduce_perf >/dev/null 2>&1; then
          all_reduce_perf -b 8 -e 4G -f 2 -g 1
        elif [ -x /workspace/nccl-tests/build/all_reduce_perf ]; then
          /workspace/nccl-tests/build/all_reduce_perf -b 8 -e 4G -f 2 -g 1
        else
          echo "all_reduce_perf not found; use an image with nccl-tests installed" >&2
          exit 127
        fi
    readiness:
      type: sleep
      seconds: 2
    x-slurm:
      nodes: 2
      ntasks_per_node: 4
      gpus_per_node: 4
      mpi:
        type: pmix
        profile: openmpi
        expected_ranks: 8

Ray Symmetric

Source: examples/ray-symmetric.yaml

name: ray-symmetric

x-slurm:
  job_name: ray-symmetric
  time: "02:00:00"
  nodes: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  ray:
    image: rayproject/ray:2.49.0-py310
    command:
      - /bin/sh
      - -lc
      - |
        ray symmetric-run \
          --address "$$HPC_COMPOSE_DIST_RDZV_ENDPOINT" \
          --min-nodes "$$HPC_COMPOSE_DIST_NNODES" \
          -- \
          python app.py
    readiness:
      type: sleep
      seconds: 10
    x-slurm:
      nodes: 2
      ntasks_per_node: 1

Rendezvous Client

Source: examples/rendezvous-client.yaml

name: rendezvous-client

x-slurm:
  job_name: model-client
  time: "00:10:00"
  mem: 2G
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}
  rendezvous: model-server

services:
  client:
    image: curlimages/curl:8.10.1
    command:
      - /bin/sh
      - -lc
      - |
        curl -fsS "$${HPC_COMPOSE_RDZV_MODEL_SERVER_URL}"

Rendezvous Model Server

Source: examples/rendezvous-model-server.yaml

name: rendezvous-model-server

x-slurm:
  job_name: model-server
  partition: gpu
  time: "02:00:00"
  mem: 32G
  gpus: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  model:
    image: python:3.12-slim
    command:
      - /bin/sh
      - -lc
      - |
        python -m http.server 8000
    readiness:
      type: tcp
      port: 8000
      timeout_seconds: 60
    x-slurm:
      rendezvous:
        register:
          name: model-server
          port: 8000
          protocol: http
          path: /
          ttl_seconds: 3600

Ray Head Workers

Source: examples/ray-head-workers.yaml

name: ray-head-workers

x-slurm:
  job_name: ray-head-workers
  time: "02:00:00"
  nodes: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  head:
    image: rayproject/ray:2.49.0-py310
    command:
      - /bin/sh
      - -lc
      - |
        ray start --head --node-ip-address="$$HPC_COMPOSE_SERVICE_PRIMARY_NODE" --port=6379 --block
    readiness:
      type: sleep
      seconds: 10
    x-slurm:
      nodes: 1

  worker:
    image: rayproject/ray:2.49.0-py310
    command:
      - /bin/sh
      - -lc
      - |
        ray start --address="$$HPC_COMPOSE_PRIMARY_NODE:6379" --block
    depends_on:
      head:
        condition: service_healthy
    x-slurm:
      nodes: 1
      placement:
        node_range: "1"

Dask Scheduler Workers

Source: examples/dask-scheduler-workers.yaml

name: dask-scheduler-workers

x-slurm:
  job_name: dask-scheduler-workers
  time: "02:00:00"
  nodes: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  scheduler:
    image: ghcr.io/dask/dask:latest
    command:
      - /bin/sh
      - -lc
      - |
        dask scheduler --host "$$HPC_COMPOSE_SERVICE_PRIMARY_NODE" --port 8786
    readiness:
      type: tcp
      host: 127.0.0.1
      port: 8786
      timeout_seconds: 60
    x-slurm:
      nodes: 1

  workers:
    image: ghcr.io/dask/dask:latest
    command:
      - /bin/sh
      - -lc
      - |
        dask worker "tcp://$$HPC_COMPOSE_PRIMARY_NODE:8786"
    depends_on:
      scheduler:
        condition: service_healthy
    x-slurm:
      nodes: 2
      ntasks_per_node: 1

Spark Standalone

Source: examples/spark-standalone.yaml

name: spark-standalone

x-slurm:
  job_name: spark-standalone
  time: "02:00:00"
  nodes: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  master:
    image: apache/spark:3.5.3
    command:
      - /bin/sh
      - -lc
      - |
        /opt/spark/sbin/start-master.sh --host "$$HPC_COMPOSE_SERVICE_PRIMARY_NODE" --port 7077
        tail -f /opt/spark/logs/*
    readiness:
      type: tcp
      host: 127.0.0.1
      port: 7077
      timeout_seconds: 60
    x-slurm:
      nodes: 1

  workers:
    image: apache/spark:3.5.3
    command:
      - /bin/sh
      - -lc
      - |
        /opt/spark/sbin/start-worker.sh "spark://$$HPC_COMPOSE_PRIMARY_NODE:7077"
        tail -f /opt/spark/logs/*
    depends_on:
      master:
        condition: service_healthy
    x-slurm:
      nodes: 2
      ntasks_per_node: 1

  app:
    image: apache/spark:3.5.3
    command:
      - /bin/sh
      - -lc
      - |
        spark-submit --master "spark://$$HPC_COMPOSE_PRIMARY_NODE:7077" app.py
    depends_on:
      master:
        condition: service_healthy
    x-slurm:
      nodes: 1

Flux Nested

Source: examples/flux-nested.yaml

name: flux-nested

runtime:
  backend: host

x-slurm:
  job_name: flux-nested
  time: "01:00:00"
  nodes: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  flux:
    command:
      - /bin/sh
      - -lc
      - |
        flux start bash -lc 'flux run --label-io -N "$$HPC_COMPOSE_DIST_NNODES" hostname'
    x-slurm:
      nodes: 2
      ntasks_per_node: 1

Nextflow Bridge

Source: examples/nextflow-bridge.yaml

name: nextflow-bridge

runtime:
  backend: host

x-slurm:
  job_name: nextflow-bridge
  time: "02:00:00"
  nodes: 1
  cpus_per_task: 8
  mem: 16G
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}
  artifacts:
    export_dir: ./results/${SLURM_JOB_ID}
    paths:
      - /hpc-compose/job/nextflow-work/**
      - /hpc-compose/job/reports/**
      - /hpc-compose/job/logs/**

services:
  nextflow:
    command:
      - /bin/sh
      - -lc
      - |
        mkdir -p /hpc-compose/job/nextflow-work /hpc-compose/job/reports
        nextflow run "$${NEXTFLOW_PIPELINE:-main.nf}" \
          -work-dir /hpc-compose/job/nextflow-work \
          -with-report /hpc-compose/job/reports/report.html \
          -with-trace /hpc-compose/job/reports/trace.txt \
          $${NEXTFLOW_ARGS:-}
    environment:
      NEXTFLOW_PIPELINE: main.nf
      NEXTFLOW_ARGS: ""
    x-slurm:
      ntasks: 1

Snakemake Bridge

Source: examples/snakemake-bridge.yaml

name: snakemake-bridge

runtime:
  backend: host

x-slurm:
  job_name: snakemake-bridge
  time: "02:00:00"
  nodes: 1
  cpus_per_task: 8
  mem: 16G
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}
  artifacts:
    export_dir: ./results/${SLURM_JOB_ID}
    paths:
      - /hpc-compose/job/snakemake-work/**
      - /hpc-compose/job/reports/**
      - /hpc-compose/job/logs/**

services:
  snakemake:
    command:
      - /bin/sh
      - -lc
      - |
        mkdir -p /hpc-compose/job/snakemake-work /hpc-compose/job/reports
        snakemake \
          --snakefile "$${SNAKEMAKE_FILE:-Snakefile}" \
          --cores "$${SNAKEMAKE_CORES:-$${SLURM_CPUS_PER_TASK:-1}}" \
          --directory "$${SNAKEMAKE_WORKDIR:-.}" \
          --printshellcmds \
          $${SNAKEMAKE_ARGS:-}
    environment:
      SNAKEMAKE_FILE: Snakefile
      SNAKEMAKE_WORKDIR: "."
      SNAKEMAKE_ARGS: ""
    x-slurm:
      ntasks: 1

Multi Stage Pipeline

Source: examples/multi-stage-pipeline.yaml

name: multi-stage-pipeline

x-slurm:
  job_name: multi-stage-pipeline
  time: "00:30:00"
  mem: 8G
  cpus_per_task: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  producer:
    image: python:3.11-slim
    command:
      - /bin/sh
      - -lc
      - |
        python -c "
        import csv, random, os

        output = '/hpc-compose/job/output.csv'
        with open(output, 'w', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(['id', 'value', 'category'])
            for i in range(1000):
                writer.writerow([i, round(random.gauss(50, 15), 2), random.choice(['A', 'B', 'C'])])

        print(f'Wrote 1000 rows to {output}')
        print('producer complete')
        "
    readiness:
      type: log
      pattern: "producer complete"
      timeout_seconds: 60
    x-slurm:
      cpus_per_task: 1

  consumer:
    image: python:3.11-slim
    depends_on:
      producer:
        condition: service_healthy
    command:
      - /bin/sh
      - -lc
      - |
        python -c "
        import csv, collections

        with open('/hpc-compose/job/output.csv') as f:
            reader = csv.DictReader(f)
            rows = list(reader)

        by_cat = collections.defaultdict(list)
        for row in rows:
            by_cat[row['category']].append(float(row['value']))

        print(f'Read {len(rows)} rows')
        for cat in sorted(by_cat):
            vals = by_cat[cat]
            print(f'  {cat}: count={len(vals)}, mean={sum(vals)/len(vals):.2f}')

        print('consumer complete')
        "
    x-slurm:
      cpus_per_task: 1

Pipeline DAG

Source: examples/pipeline-dag.yaml

name: pipeline-dag

x-slurm:
  job_name: pipeline-dag
  time: "00:20:00"
  mem: 4G
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  preprocess:
    image: alpine:3.20
    command:
      - /bin/sh
      - -lc
      - |
        mkdir -p /hpc-compose/job/pipeline
        printf 'records=3\n' > /hpc-compose/job/pipeline/prepared.txt

  train:
    image: alpine:3.20
    depends_on:
      preprocess:
        condition: service_completed_successfully
    command:
      - /bin/sh
      - -lc
      - |
        cat /hpc-compose/job/pipeline/prepared.txt
        printf 'accuracy=0.91\n' > /hpc-compose/job/pipeline/model.txt

  postprocess:
    image: alpine:3.20
    depends_on:
      train:
        condition: service_completed_successfully
    command:
      - /bin/sh
      - -lc
      - |
        cat /hpc-compose/job/pipeline/model.txt
        printf 'done\n' > /hpc-compose/job/pipeline/report.txt

Postgres ETL

Source: examples/postgres-etl.yaml

name: postgres-etl

x-slurm:
  job_name: postgres-etl
  time: "01:00:00"
  mem: 16G
  cpus_per_task: 4
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: etl
      POSTGRES_PASSWORD: etl
      POSTGRES_DB: pipeline
    readiness:
      type: tcp
      host: 127.0.0.1
      port: 5432
      timeout_seconds: 30
    x-slurm:
      cpus_per_task: 2

  etl:
    image: python:3.11-slim
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DATABASE_URL: postgresql://etl:etl@127.0.0.1:5432/pipeline
    command:
      - /bin/sh
      - -lc
      - |
        python -c "
        import psycopg2, os

        conn = psycopg2.connect(os.environ['DATABASE_URL'])
        cur = conn.cursor()
        cur.execute('CREATE TABLE IF NOT EXISTS results (id SERIAL, value FLOAT)')
        for i in range(100):
            cur.execute('INSERT INTO results (value) VALUES (%s)', (i * 1.5,))
        conn.commit()
        cur.execute('SELECT count(*), avg(value) FROM results')
        count, avg = cur.fetchone()
        print(f'Inserted {count} rows, average value: {avg:.2f}')
        conn.close()
        "
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir psycopg2-binary
    x-slurm:
      cpus_per_task: 2

Restart Policy

Source: examples/restart-policy.yaml

name: restart-policy

x-slurm:
  job_name: restart-policy
  time: "00:10:00"
  mem: 4G
  cpus_per_task: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  flaky-worker:
    image: python:3.11-slim
    command:
      - /bin/sh
      - -lc
      - |
        python - <<'PY'
        import pathlib
        import sys
        import time

        state_dir = pathlib.Path("/hpc-compose/job/restart-policy")
        counter_path = state_dir / "attempts.txt"

        state_dir.mkdir(parents=True, exist_ok=True)
        attempts = int(counter_path.read_text()) if counter_path.exists() else 0
        attempts += 1
        counter_path.write_text(f"{attempts}\n")

        print(f"attempt {attempts}")
        if attempts <= 2:
            print("simulating transient failure")
            sys.exit(42)

        print("work completed after transient failures")
        time.sleep(1)
        PY
    x-slurm:
      failure_policy:
        mode: restart_on_failure
        max_restarts: 5
        backoff_seconds: 2
        window_seconds: 60
        max_restarts_in_window: 3

Training Checkpoints

Source: examples/training-checkpoints.yaml

name: training-checkpoints

x-slurm:
  job_name: training-checkpoints
  time: "04:00:00"
  mem: 64G
  cpus_per_task: 8
  gpus: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  trainer:
    image: pytorch/pytorch:2.3.1-cuda12.1-cudnn9-runtime
    volumes:
      - /shared/$USER/checkpoints:/checkpoints
    environment:
      CHECKPOINT_DIR: /checkpoints
      NUM_EPOCHS: "10"
    command:
      - /bin/sh
      - -lc
      - |
        python -c "
        import os, torch

        device = 'cuda' if torch.cuda.is_available() else 'cpu'
        print(f'Training on {device}')

        ckpt_dir = os.environ['CHECKPOINT_DIR']
        os.makedirs(ckpt_dir, exist_ok=True)

        model = torch.nn.Linear(128, 10).to(device)
        optimizer = torch.optim.Adam(model.parameters())
        data = torch.randn(256, 128, device=device)

        for epoch in range(int(os.environ['NUM_EPOCHS'])):
            out = model(data)
            loss = out.sum()
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            path = os.path.join(ckpt_dir, f'checkpoint_epoch_{epoch}.pt')
            torch.save({'epoch': epoch, 'model': model.state_dict()}, path)
            print(f'Epoch {epoch}: loss={loss.item():.4f}, saved {path}')

        print('Training complete')
        "
    x-slurm:
      gpus: 1
      cpus_per_task: 4

Training Resume

Source: examples/training-resume.yaml

name: training-resume

x-slurm:
  job_name: training-resume
  time: "04:00:00"
  mem: 64G
  cpus_per_task: 8
  gpus: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}
  resume:
    path: /shared/$USER/runs/training-resume
  artifacts:
    export_dir: ./results/${SLURM_JOB_ID}
    paths:
      - /hpc-compose/job/checkpoints/**

services:
  trainer:
    image: pytorch/pytorch:2.3.1-cuda12.1-cudnn9-runtime
    environment:
      NUM_EPOCHS: "10"
    command:
      - /bin/sh
      - -lc
      - |
        python - <<'PY'
        import json
        import os
        import pathlib
        import time

        resume_dir = pathlib.Path(os.environ["HPC_COMPOSE_RESUME_DIR"])
        attempt = os.environ["HPC_COMPOSE_ATTEMPT"]
        is_resume = os.environ["HPC_COMPOSE_IS_RESUME"] == "1"
        checkpoint_dir = pathlib.Path("/hpc-compose/job/checkpoints")
        latest_state_path = resume_dir / "latest.json"

        resume_dir.mkdir(parents=True, exist_ok=True)
        checkpoint_dir.mkdir(parents=True, exist_ok=True)

        start_epoch = 0
        if latest_state_path.exists():
            state = json.loads(latest_state_path.read_text())
            start_epoch = state["next_epoch"]
            print(f"Resuming run at epoch {start_epoch} (attempt {attempt})")
        else:
            print(f"Starting fresh run (attempt {attempt})")

        for epoch in range(start_epoch, int(os.environ["NUM_EPOCHS"])):
            state = {
                "completed_epoch": epoch,
                "next_epoch": epoch + 1,
                "attempt": int(attempt),
                "is_resume": is_resume,
            }
            latest_state_path.write_text(json.dumps(state, indent=2) + "\n")
            artifact_path = checkpoint_dir / f"checkpoint_epoch_{epoch}.json"
            artifact_path.write_text(json.dumps(state, indent=2) + "\n")
            print(f"Epoch {epoch}: wrote {artifact_path}")
            time.sleep(1)
        PY

Training Sweep

Source: examples/training-sweep.yaml

name: training-sweep

x-slurm:
  job_name: training-sweep
  time: "00:20:00"
  mem: 8G
  cpus_per_task: 2
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

sweep:
  parameters:
    lr: [0.001, 0.01, 0.1]
    batch_size: [32, 64]
  matrix: full

services:
  trainer:
    image: python:3.11-slim
    environment:
      LR: "${lr:-0.001}"
      BATCH_SIZE: "${batch_size:-32}"
      SWEEP_ID: "${HPC_COMPOSE_SWEEP_ID:-manual}"
      TRIAL_ID: "${HPC_COMPOSE_SWEEP_TRIAL:-manual}"
    command:
      - python
      - -c
      - |
        import os
        import random

        lr = float(os.environ["LR"])
        batch_size = int(os.environ["BATCH_SIZE"])
        random.seed(f"{lr}:{batch_size}")
        score = 0.8 + random.random() * 0.05

        print(f"sweep={os.environ['SWEEP_ID']} trial={os.environ['TRIAL_ID']}")
        print(f"lr={lr} batch_size={batch_size} score={score:.4f}")

vLLM OpenAI

Source: examples/vllm-openai.yaml

name: vllm-openai

x-slurm:
  job_name: vllm-openai
  time: "01:00:00"
  mem: 64G
  cpus_per_task: 8
  gpus: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  vllm:
    image: vllm/vllm-openai:latest
    environment:
      MODEL_NAME: facebook/opt-125m
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        rm -f /hpc-compose/job/request.done
        python -m vllm.entrypoints.openai.api_server \
          --model $$MODEL_NAME \
          --host 0.0.0.0 \
          --port 8000 &
        server_pid=$$!
        while [ ! -f /hpc-compose/job/request.done ]; do
          if ! kill -0 "$$server_pid" 2>/dev/null; then
            wait "$$server_pid"
            exit $$?
          fi
          sleep 1
        done
        kill "$$server_pid" 2>/dev/null || true
        wait "$$server_pid" || true
    readiness:
      type: log
      pattern: "Uvicorn running on"
      timeout_seconds: 300
    x-slurm:
      gpus: 1
      cpus_per_task: 4

  client:
    image: python:3.11-slim
    depends_on:
      vllm:
        condition: service_healthy
    environment:
      OPENAI_BASE_URL: http://127.0.0.1:8000/v1
      MODEL_NAME: facebook/opt-125m
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        python -c "
        import openai, os

        client = openai.OpenAI(
            base_url=os.environ['OPENAI_BASE_URL'],
            api_key='unused',
        )
        response = client.chat.completions.create(
            model=os.environ['MODEL_NAME'],
            messages=[
                {'role': 'system', 'content': 'You are a concise assistant.'},
                {'role': 'user', 'content': 'What is HPC in one sentence?'},
            ],
            max_tokens=64,
            temperature=0.2,
        )
        print(response.choices[0].message.content)
        "
        touch /hpc-compose/job/request.done
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir openai
    x-slurm:
      cpus_per_task: 2

vLLM UV Worker

Source: examples/vllm-uv-worker.yaml

name: vllm-uv-worker

x-slurm:
  job_name: vllm-uv-worker
  time: "01:00:00"
  mem: 64G
  cpus_per_task: 8
  gpus: 1
  cache_dir: ${CACHE_DIR:-/cluster/shared/hpc-compose-cache}

services:
  vllm:
    image: vllm/vllm-openai:latest
    environment:
      MODEL_NAME: facebook/opt-125m
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        rm -f /hpc-compose/job/request.done
        python -m vllm.entrypoints.openai.api_server \
          --model "$$MODEL_NAME" \
          --host 0.0.0.0 \
          --port 8000 &
        server_pid=$$!
        while [ ! -f /hpc-compose/job/request.done ]; do
          if ! kill -0 "$$server_pid" 2>/dev/null; then
            wait "$$server_pid"
            exit $$?
          fi
          sleep 1
        done
        kill "$$server_pid" 2>/dev/null || true
        wait "$$server_pid" || true
    readiness:
      type: log
      pattern: "Uvicorn running on"
      timeout_seconds: 300
    x-slurm:
      gpus: 1
      cpus_per_task: 4

  worker:
    image: python:3.11-slim
    working_dir: /workspace
    volumes:
      - ./vllm-uv-worker:/workspace
    depends_on:
      vllm:
        condition: service_healthy
    environment:
      OPENAI_BASE_URL: http://127.0.0.1:8000/v1
      MODEL_NAME: facebook/opt-125m
      REQUEST_DONE_PATH: /hpc-compose/job/request.done
    command:
      - /bin/sh
      - -lc
      - |
        set -eu
        UV_CACHE_DIR=/hpc-compose/job/.uv-cache uv run worker.py
    x-runtime:
      prepare:
        commands:
          - pip install --no-cache-dir uv
    x-slurm:
      cpus_per_task: 2

Codex Skill

This repository ships a Codex skill at skills/hpc-compose/ for agents that need to help users set up, adapt, validate, and troubleshoot hpc-compose workflows.

Use it when a user asks for tasks such as:

  • make my repository work with hpc-compose
  • migrate this Docker Compose or Slurm workflow to hpc-compose
  • prepare this project for HAICORE or another Slurm cluster
  • debug hpc-compose validation, preflight, or run failures

What It Contains

The skill keeps the main trigger and workflow in SKILL.md, then uses progressively loaded references for details:

PathPurpose
skills/hpc-compose/SKILL.mdTrigger description, core workflow, adaptation rules, and output expectations.
skills/hpc-compose/references/hpc-compose-workflow.mdhpc-compose command path, Docker Compose migration, backend selection, verification, and troubleshooting.
skills/hpc-compose/references/haicore-kit.mdHAICORE/NHR@KIT Slurm, GPU, filesystem, cache, Pyxis/Enroot, and verification guidance.
skills/hpc-compose/references/cluster-adaptation.mdGeneral Slurm cluster reconnaissance and portable adaptation guidance.
skills/hpc-compose/scripts/hpc_compose_repo_probe.pyHeuristic repository probe for migration clues.

Using The Skill

Install or copy skills/hpc-compose/ into the Codex skills directory, typically $CODEX_HOME/skills/hpc-compose or ~/.codex/skills/hpc-compose, then start a fresh Codex session so skill discovery can reload.

Example prompt:

Use $hpc-compose to make this repository run with hpc-compose on HAICORE.

For local reconnaissance, run:

python3 skills/hpc-compose/scripts/hpc_compose_repo_probe.py .

The probe is intentionally heuristic. Treat its output as an inventory and hypothesis generator, then verify with repository files, cluster documentation, and hpc-compose static checks.

Agent Expectations

Agents using this skill should:

  • inspect the target repository before proposing a spec
  • check current cluster documentation for site-specific details
  • prefer hpc-compose static checks before real Slurm submissions
  • ask before commands that submit or cancel jobs or consume allocation quota
  • report observations, hypotheses, recommendations, and open questions when cluster facts remain uncertain

Roadmap

This roadmap is intentionally short. hpc-compose is not trying to become a general-purpose orchestrator.

Authoring Ergonomics

  • make the supported Compose subset easier to discover from examples and docs
  • keep validate, inspect, config, and render as the fast path for authoring confidence
  • improve starter templates and example selection before adding more surface area

Runtime Visibility

  • make tracked jobs easier to reconnect to and reason about
  • keep improving status, ps, watch, stats, and artifact export for real cluster debugging
  • prefer inspectable generated state over hidden orchestration behavior

Cluster Compatibility

  • expand confidence on more Linux cluster environments before broadening scope
  • keep support policy explicit through the support matrix
  • improve docs and examples around shared storage, Pyxis, and Enroot expectations

If your workflow falls outside this roadmap, that is useful feedback. Open an adoption feedback issue with your cluster type, workload type, and main friction point.

Architecture for Contributors

The library crate owns the core staged pipeline. The binary entrypoint delegates to command-family modules under src/commands/, while presentation lives under src/output/. Reusable planning, prepare, render, tracking, cache, context, and template logic stay in the library modules.

Module map

  • spec: parse, interpolate, and validate the supported Compose subset
  • planner: normalize the parsed spec into a deterministic plan
  • lint: run opinionated static checks over validated plans
  • context: resolve .hpc-compose/settings.toml, profiles, env files, interpolation variables, and binary overrides
  • cluster: generate and apply best-effort cluster capability profiles from doctor cluster-report
  • preflight: check login-node prerequisites and cluster policy issues
  • prepare: import base images and rebuild prepared runtime artifacts
  • render: generate the final sbatch script and service launch commands
  • job: track submissions, logs, metrics, replay, status, and artifact export
  • tracked_paths: centralize the .hpc-compose/ layout used by render and job tracking
  • cache: persist cache manifests for imported and prepared images
  • init: expose the shipped example templates for hpc-compose new plus the legacy init alias
  • schema and manpages: expose the checked-in JSON Schema and generated section-1 manpage flow
  • commands/spec: binary-only handlers for plan, validate, lint, render, prepare, preflight, config, and inspect
  • commands/runtime: binary-only handlers for up, debug, run, status, ps, watch, replay, stats, artifacts, logs, down, cancel, and clean
  • commands/cache: binary-only handlers for cache inspection and pruning
  • commands/init: binary-only handlers for new / init, setup, context, and completions
  • watch_ui: terminal UI controller and renderer for up, watch, and replay playback
  • output: binary-only text, JSON, CSV, and JSONL formatting helpers

Execution flow

  1. ComposeSpec::load parses YAML, resolves authoring extends, validates supported keys, interpolates variables, and applies semantic validation.
  2. planner::build_plan resolves paths, command shapes, dependencies, and prepare blocks into a normalized plan.
  3. prepare::build_runtime_plan computes concrete cache artifact locations.
  4. context and optional cluster profiles provide resolved paths, binaries, env, and compatibility warnings.
  5. preflight::run checks cluster prerequisites before submission.
  6. prepare::prepare_runtime_plan imports or rebuilds artifacts when needed.
  7. render::render_script emits the batch script consumed by sbatch.
  8. job persists tracked metadata under .hpc-compose/ and powers status, ps, watch, replay, stats, logs, cancel, and artifact export. job::replay reconstructs a best-effort timeline from existing state, service-exit, metrics, and log artifacts while reusing the watch renderer for playback.
  9. commands/* turns CLI variants into library calls, and output formats the final presentation.

Tracked Runtime Layout

tracked_paths is the single source of truth for the tracked-job layout shared by render and job.

  • Compose-level metadata lives under .hpc-compose/ next to the compose file.
  • Per-job runtime state lives under ${SLURM_SUBMIT_DIR}/.hpc-compose/<job-id>/.
  • Root-level logs/, metrics/, artifacts/, and state.json are the latest-view paths used by status and export commands.
  • Resume-aware runs still write attempt-specific state under attempts/<attempt>/....
  • The batch script updates root-level latest symlinks so contributor-facing tooling can read the most recent attempt without reconstructing shell logic independently.

Contributor commands

cargo test
cargo test --test cli_runtime
cargo test --test release_metadata
cargo doc --no-deps
mdbook build docs
cargo run --features manpage-bin --bin gen-manpages -- --check

Documentation split

  • Use this mdBook for user-facing workflows, examples, and reference material.
  • Use rustdoc for contributor-facing internals and the library module map.
  • Keep README short and point readers into the book instead of duplicating long-form guidance.