Quickstart

This is the shortest safe path from an empty shell to a static plan, a first real Slurm run, and one-command failure triage.

If Slurm terms such as sbatch, srun, allocation, job step, Pyxis, or Enroot are unfamiliar, read Slurm And Container Basics before the first real cluster run.

1. Install The CLI

For normal use, install from the latest published GitHub Release and pin the tag you selected:

RELEASE_TAG=vX.Y.Z
curl -fsSL "https://raw.githubusercontent.com/NicolasSchuler/hpc-compose/${RELEASE_TAG}/install.sh" \
  | env HPC_COMPOSE_VERSION="${RELEASE_TAG}" sh

Replace vX.Y.Z with the published release tag shown on the release page.

The installer places hpc-compose in ~/.local/bin by default and verifies the release checksum sidecar before installing. Release verification, manual downloads, package-manager installs, and source-checkout builds are covered in Installation.

If your shell does not find the command immediately, add the default install directory to your PATH:

export PATH="$HOME/.local/bin:$PATH"
hpc-compose --version

2. Learn The Safe Authoring Path First

plan is the safe authoring command. It does not call sbatch, does not import images, and does not write a script file:

Create a starter spec first:

hpc-compose new \
  --template minimal-batch \
  --name my-app \
  --output compose.yaml

If you want a guided learning path instead of a single starter template, run the Spec Metamorphosis tutorial:

hpc-compose evolve --output compose.yaml

Then inspect the static plan:

hpc-compose plan -f compose.yaml
hpc-compose plan --show-script -f compose.yaml

Expected output includes:

spec is valid

service order: app

This is the right first path on macOS, a laptop, or any machine where you want to evaluate the authoring model before touching a real cluster. The same flow is also available as an asciinema-style demo cast, but the snippets above are the accessible reference output.

The normal workflow to remember is:

hpc-compose plan -f compose.yaml
hpc-compose up -f compose.yaml
hpc-compose debug -f compose.yaml --preflight

3. Choose A Starting Spec

Use the built-in starter templates when you want a fresh compose.yaml with your application name filled in:

hpc-compose new \
  --template minimal-batch \
  --name my-app \
  --output compose.yaml

Add --cache-dir '<shared-cache-dir>' when you want the generated file to include an explicit x-slurm.cache_dir. Otherwise the plan uses the active settings cache default or $HOME/.cache/hpc-compose.

From a source checkout, you can also inspect a known-good repository example:

hpc-compose plan -f examples/minimal-batch.yaml

The Examples page is the single selection guide for beginner, LLM, training, distributed, and pipeline workflows.

Use Spec Metamorphosis when you want to learn those concepts progressively in one evolving valid spec.

4. Pick And Test A Cache Directory

cache_dir is optional in the spec, but real clusters usually need a site-specific shared path because image preparation happens before the job starts and compute nodes must later see those artifacts.

Ask your cluster documentation or support team for a project scratch, work, or shared filesystem path, then test it:

export CACHE_DIR=/cluster/shared/hpc-compose-cache
mkdir -p "$CACHE_DIR"
test -w "$CACHE_DIR"

Persist it in project settings when you want the same value every time:

hpc-compose setup --profile-name dev --cache-dir "$CACHE_DIR" --default-profile dev --non-interactive

Or keep using an environment-backed explicit spec value and persist it next to your copied spec:

printf 'CACHE_DIR=%s\n' "$CACHE_DIR" > .env

Do not use /tmp, /var/tmp, /private/tmp, or /dev/shm for x-slurm.cache_dir. Validation may accept those strings, but preflight reports them as unsafe because prepare happens before runtime and compute nodes must later see the cached artifacts.

5. Before Your First Cluster Run

Command category	Where to run it	Required tools	Notes
Authoring: `new`, `plan`, `validate`, `inspect`, `render`, `config`, `schema`	laptop, workstation, or login node	`hpc-compose`	`plan` is the recommended static pre-run check.
Prepare: `prepare`	Linux host with selected runtime backend	Pyxis needs Enroot; Apptainer needs `apptainer`; Singularity needs `singularity`; host backend needs no container runtime	Does not call `sbatch`, but needs runtime tools for image work.
Cluster checks: `preflight`, `doctor cluster-report`	Linux Slurm login node	Slurm client tools plus selected backend tools	Use `preflight --strict` when warnings should block launch.
Run: `up`, `run`	Linux Slurm login node	`sbatch`, `srun`, scheduler tools, selected backend tools	`up` is the normal cluster execution path.
Local launch: `up --local`	Linux host only	Enroot and `runtime.backend: pyxis`	Single-host only; not a distributed Slurm substitute.

For Pyxis, srun --help should mention --container-image.

6. Submit On A Real Cluster

When you move to a supported Linux submission host, the normal run is:

hpc-compose up -f compose.yaml

up runs preflight, prepares missing artifacts, renders the batch script, submits it through sbatch, then follows scheduler state and tracked logs. On an interactive TTY it opens the full-screen watch UI; otherwise it falls back to line-oriented output. Add --watch-queue when you want line-oriented queue polling until the Slurm job reaches RUNNING before the normal watch view opens; --queue-warn-after <DURATION> controls the one-time long-pending warning. The watch UI holds the final screen on failures by default; use --hold-on-exit never|failure|always to tune that behavior. Use hpc-compose up --detach -f compose.yaml when you want submit-and-return behavior.

Success looks like:

the job is submitted or launched
a tracked job id is recorded
the watch UI or text follower shows scheduler progress
status, ps, and logs can reconnect to the tracked run later

7. If The First Cluster Run Fails

Symptom	Best next command	Why
Missing `sbatch`, `srun`, `enroot`, `apptainer`, or `singularity`	`hpc-compose debug -f compose.yaml --preflight`	Reruns prerequisite checks and keeps the latest tracked context in one report.
`srun` does not advertise `--container-image`	`hpc-compose doctor cluster-report`	Pyxis support is unavailable or not loaded on that node.
Job submitted but no service log appeared	`hpc-compose debug -f compose.yaml`	Shows scheduler state, batch log tail, service log hints, and the next command.
Cache path warning or error	`hpc-compose debug -f compose.yaml --preflight`	Confirms whether `x-slurm.cache_dir` is shared and writable.
Services start in the wrong order	`hpc-compose plan --explain --verbose -f compose.yaml`	Shows normalized dependencies, readiness gates, and planner hints before running.

The longer symptom guide is Troubleshooting.

8. Revisit A Tracked Run Later

hpc-compose jobs list
hpc-compose status -f compose.yaml
hpc-compose ps -f compose.yaml
hpc-compose watch -f compose.yaml
hpc-compose stats -f compose.yaml
hpc-compose logs -f compose.yaml --follow

Use jobs list first when you need to rediscover tracked runs under the current repo tree. Use ps for a stable per-service snapshot, watch to reconnect to the live UI, and logs --follow for a text-only follower.

From A Source Checkout

If you are developing from a local checkout instead of an installed binary:

cargo build --release
target/release/hpc-compose validate -f examples/minimal-batch.yaml
target/release/hpc-compose plan -f examples/minimal-batch.yaml
target/release/hpc-compose plan --show-script -f examples/minimal-batch.yaml

Keyboard shortcuts

hpc-compose