Execution Model

This page explains the few runtime rules that matter most when a Compose mental model meets Slurm and HPC runtime backends.

What runs where

Stage	Where it runs	What happens
`plan`, `validate`, `inspect`, `preflight`	login node or local shell	Parse the spec, resolve paths, preview the runtime plan, and check prerequisites
`prepare`	login node or local shell with the selected runtime backend	Import base images and build prepared runtime artifacts
`up`	login node or local shell with Slurm access	Run preflight, prepare missing artifacts, render the batch script, call `sbatch`, and watch by default
Batch script and services	compute-node allocation	Launch the planned services through `srun` and the selected runtime backend
`status`, `ps`, `watch`, `stats`, `logs`, `artifacts`	login node or local shell	Read tracked metadata and job outputs after submission

The main consequence is simple: image preparation and validation happen before the job starts, but the containers themselves run later inside the Slurm allocation.

Service failure policies inside one job

hpc-compose does not provide a separate long-running orchestrator. Service failure handling happens inside the rendered batch script for the current allocation.

mode: fail_job keeps fail-fast behavior and stops the job on the first non-zero service exit.
mode: ignore records the failure but allows the rest of the job to continue.
mode: restart_on_failure only reacts to non-zero process exits. It does not restart on successful exits, and it does not use cross-attempt or cross-requeue history.

For restart_on_failure, the batch script enforces two limits during one live execution:

a lifetime cap through max_restarts
a rolling-window cap through max_restarts_in_window within window_seconds

If a service omits the rolling-window fields, hpc-compose still enables crash-loop protection with window_seconds: 60 and max_restarts_in_window: <resolved max_restarts>.

Use status to inspect the tracked policy state after submission. The text view reports:

state service 'worker': failure_policy=restart_on_failure restarts=1/5 window=1/3@60s last_exit=42 completed=no

Use logs to inspect the corresponding restart messages from the batch script when you need to distinguish lifetime-cap exhaustion from rolling-window exhaustion.

Use per-service x-slurm.hooks when you want host-side notifications around those policy transitions. on: restart runs before a granted relaunch; on: window_exhausted runs when the rolling-window guard blocks another restart. These hooks are best-effort and do not change the service policy outcome.

Which paths must be shared

The resolved cache directory must be visible from both the login node and the compute nodes. It may come from x-slurm.cache_dir, project settings, or the builtin $HOME/.cache/hpc-compose fallback.
Relative host paths in volumes, local image paths, and x-runtime.prepare.mounts resolve against the compose file directory.
Each submitted job writes per-job runtime state under <runtime-root>/<job-id> on the host. <runtime-root> defaults to <submit-dir>/.hpc-compose and can be overridden with x-slurm.runtime_root.
The active job workspace is mounted into containerized services at /hpc-compose/job. For ordinary runs that workspace is <runtime-root>/<job-id>; for resume-aware attempts it is <runtime-root>/<job-id>/attempts/<attempt>, with top-level paths kept as the latest view.
Multi-node jobs also populate /hpc-compose/job/allocation/{primary_node,nodes.txt} and export allocation-wide HPC_COMPOSE_NODE... variables plus service-scoped HPC_COMPOSE_SERVICE_NODE... variables.

Use /hpc-compose/job for small shared state inside the allocation, such as ready files, request payloads, logs, metrics, or teardown signals.

Enroot runtime paths

The generated batch script sets three Enroot runtime paths scoped per job under the resolved cache directory:

Variable	Value	Purpose
`ENROOT_CACHE_PATH`	`$CACHE_ROOT/runtime/$SLURM_JOB_ID/cache`	Enroot image cache for the current job
`ENROOT_DATA_PATH`	`$CACHE_ROOT/runtime/$SLURM_JOB_ID/data`	Enroot data directory for the current job
`ENROOT_TEMP_PATH`	`$CACHE_ROOT/runtime/$SLURM_JOB_ID/tmp`	Enroot temp directory for the current job

These paths are created at batch startup and are available inside the batch script and to tooling that reads Enroot environment variables. They are not injected into service containers.

The cache must live on storage shared between login and compute nodes because prepare runs on the login node while services run on compute nodes; node-local /tmp fails because each node sees a different filesystem. For the operational list of invalid cache paths and cache configuration, see Cache Management.

Networking inside the allocation

Single-node services share the host network on one node.
In a multi-node job, helper services stay on the allocation’s primary node by default.
A distributed service may span the full allocation, or services may use x-slurm.placement to select explicit allocation node subsets.
Partitioned services should use service-scoped metadata such as HPC_COMPOSE_SERVICE_PRIMARY_NODE, HPC_COMPOSE_SERVICE_NODE_COUNT, HPC_COMPOSE_SERVICE_NODELIST, and HPC_COMPOSE_SERVICE_NODELIST_FILE.
ports, custom Docker networks, and service-name DNS are not part of the model.
Use depends_on plus readiness when a dependent service must wait for real availability rather than process start.
Use depends_on with condition: service_completed_successfully when a dependent service should wait for a one-shot stage to exit successfully.

Use 127.0.0.1 only when both sides are intentionally on the same node. For multi-node distributed or partitioned runs, derive rendezvous addresses from allocation or service metadata files and environment variables instead of relying on localhost.

If a service binds its TCP port before it is actually ready, prefer HTTP or log-based readiness over plain TCP readiness.

`volumes` vs `x-runtime.prepare`

Mechanism	Use it for	When it is applied	Reuse behavior
`volumes`	fast-changing source code, model directories, input data, checkpoint paths	at runtime inside the allocation	reads live host content every normal run
`x-runtime.prepare.commands`	slower-changing dependencies, tools, and image customization	before submission on the login node	cached until the prepared artifact changes

Recommended default:

keep active source trees in volumes
keep slower-changing dependency installation in x-runtime.prepare.commands
use prepare.mounts only when the prepare step truly needs host files

Warning

If a mounted file is a symlink, the symlink target must also be visible from inside the mounted directory. Otherwise the path can exist on the host but fail inside the container.

Command vocabulary

The normal run is hpc-compose up -f compose.yaml. See Quickstart for the full end-to-end description.
The tracked follow-up tools are status for scheduler/log summaries, ps for a stable per-service snapshot, and watch when you want to reconnect to the live TUI later.
The debugging flow is validate, inspect, preflight, and prepare run separately when you need more visibility.

Read Runtime Backends before changing runtime.backend, Runbook for the operational workflow, Examples for starting points, and Spec reference for exact field behavior.

Keyboard shortcuts

hpc-compose