Quickstart
This is the shortest safe path from an empty shell to a static plan, a first real Slurm run, and one-command failure triage.
If Slurm terms such as sbatch, srun, allocation, job step, Pyxis, or Enroot are unfamiliar, read Slurm And Container Basics before the first real cluster run.
1. Install The CLI
For normal use, install from the latest published GitHub Release and pin the tag you selected:
RELEASE_TAG=vX.Y.Z
curl -fsSL "https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/${RELEASE_TAG}/install.sh" \
| env HPC_COMPOSE_VERSION="${RELEASE_TAG}" sh
Replace vX.Y.Z with the published release tag shown on the release page.
The installer places hpc-compose in ~/.local/bin by default and verifies the release checksum sidecar before installing. Release verification, manual downloads, package-manager installs, and source-checkout builds are covered in Installation.
If your shell does not find the command immediately, add the default install directory to your PATH:
export PATH="$HOME/.local/bin:$PATH"
hpc-compose --version
2. Learn The Safe Authoring Path First
plan is the safe authoring command. It does not call sbatch, does not import images, and does not write a script file:
Create a starter spec first:
hpc-compose new \
--template minimal-batch \
--name my-app \
--output compose.yaml
If you want a guided learning path instead of a single starter template, run the Spec Metamorphosis tutorial:
hpc-compose evolve --output compose.yaml
Then inspect the static plan:
hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml
Expected output includes:
spec is valid
service order: app
This is the right first path on macOS, a laptop, or any machine where you want to evaluate the authoring model before touching a real cluster. The same flow is also available as an asciinema-style demo cast, but the snippets above are the accessible reference output.
The normal workflow to remember is:
hpc-compose plan -f compose.yaml
hpc-compose up -f compose.yaml
hpc-compose debug -f compose.yaml --preflight
3. Choose A Starting Spec
Use the built-in starter templates when you want a fresh compose.yaml with your application name filled in:
hpc-compose new \
--template minimal-batch \
--name my-app \
--output compose.yaml
Add --cache-dir '<shared-cache-dir>' when you want the generated file to include an explicit x-slurm.cache_dir. Otherwise the plan uses the active settings cache default or $HOME/.cache/hpc-compose.
From a source checkout, you can also inspect a known-good repository example:
hpc-compose plan -f examples/minimal-batch.yaml
The Examples page is the single selection guide for beginner, LLM, training, distributed, and pipeline workflows.
Use Spec Metamorphosis when you want to learn those concepts progressively in one evolving valid spec.
4. Pick And Test A Cache Directory
cache_dir is optional in the spec, but real clusters usually need a site-specific shared path because image preparation happens before the job starts and compute nodes must later see those artifacts.
Ask your cluster documentation or support team for a project scratch, work, or shared filesystem path, then test it:
export CACHE_DIR=/cluster/shared/hpc-compose-cache
mkdir -p "$CACHE_DIR"
test -w "$CACHE_DIR"
Persist it in project settings when you want the same value every time:
hpc-compose setup --profile-name dev --cache-dir "$CACHE_DIR" --default-profile dev --non-interactive
Or keep using an environment-backed explicit spec value and persist it next to your copied spec:
printf 'CACHE_DIR=%s\n' "$CACHE_DIR" > .env
Do not use /tmp, /var/tmp, /private/tmp, or /dev/shm for x-slurm.cache_dir. Validation may accept those strings, but preflight reports them as unsafe because prepare happens before runtime and compute nodes must later see the cached artifacts.
5. Before Your First Cluster Run
| Command category | Where to run it | Required tools | Notes |
|---|---|---|---|
Authoring: new, plan, validate, inspect, render, config, schema | laptop, workstation, or login node | hpc-compose | plan is the recommended static pre-run check. |
Prepare: prepare | Linux host with selected runtime backend | Pyxis needs Enroot; Apptainer needs apptainer; Singularity needs singularity; host backend needs no container runtime | Does not call sbatch, but needs runtime tools for image work. |
Cluster checks: preflight, doctor cluster-report | Linux Slurm login node | Slurm client tools plus selected backend tools | Use preflight --strict when warnings should block launch. |
Run: up, run | Linux Slurm login node | sbatch, srun, scheduler tools, selected backend tools | up is the normal cluster execution path. |
Local launch: up --local | Linux host only | Enroot and runtime.backend: pyxis | Single-host only; not a distributed Slurm substitute. |
For Pyxis, srun --help should mention --container-image.
6. Submit On A Real Cluster
When you move to a supported Linux submission host, the normal run is:
hpc-compose up -f compose.yaml
up runs preflight, prepares missing artifacts, renders the batch script, submits it through sbatch, then follows scheduler state and tracked logs. On an interactive TTY it opens the full-screen watch UI; otherwise it falls back to line-oriented output. Add --watch-queue when you want line-oriented queue polling until the Slurm job reaches RUNNING before the normal watch view opens; --queue-warn-after <DURATION> controls the one-time long-pending warning. The watch UI holds the final screen on failures by default; use --hold-on-exit never|failure|always to tune that behavior. Use hpc-compose up --detach -f compose.yaml when you want submit-and-return behavior.
Success looks like:
- the job is submitted or launched
- a tracked job id is recorded
- the watch UI or text follower shows scheduler progress
status,ps, andlogscan reconnect to the tracked run later
7. If The First Cluster Run Fails
| Symptom | Best next command | Why |
|---|---|---|
Missing sbatch, srun, enroot, apptainer, or singularity | hpc-compose debug -f compose.yaml --preflight | Reruns prerequisite checks and keeps the latest tracked context in one report. |
srun does not advertise --container-image | hpc-compose doctor cluster-report | Pyxis support is unavailable or not loaded on that node. |
| Job submitted but no service log appeared | hpc-compose debug -f compose.yaml | Shows scheduler state, batch log tail, service log hints, and the next command. |
| Cache path warning or error | hpc-compose debug -f compose.yaml --preflight | Confirms whether x-slurm.cache_dir is shared and writable. |
| Services start in the wrong order | hpc-compose plan --explain --verbose -f compose.yaml | Shows normalized dependencies, readiness gates, and planner hints before running. |
The longer symptom guide is Troubleshooting.
8. Revisit A Tracked Run Later
hpc-compose jobs list
hpc-compose status -f compose.yaml
hpc-compose ps -f compose.yaml
hpc-compose watch -f compose.yaml
hpc-compose stats -f compose.yaml
hpc-compose logs -f compose.yaml --follow
Use jobs list first when you need to rediscover tracked runs under the current repo tree. Use ps for a stable per-service snapshot, watch to reconnect to the live UI, and logs --follow for a text-only follower.
From A Source Checkout
If you are developing from a local checkout instead of an installed binary:
cargo build --release
target/release/hpc-compose validate -f examples/minimal-batch.yaml
target/release/hpc-compose plan -f examples/minimal-batch.yaml
target/release/hpc-compose plan --show-script -f examples/minimal-batch.yaml