Skip to content

Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Supported Slurm Model

This page makes the hpc-compose Slurm boundary explicit. It is a tool for compiling one Compose-like application into one Slurm allocation with one or more srun steps. Those steps can use Pyxis/Enroot, Apptainer, Singularity, or host runtime software. It is not a general frontend for the full Slurm command surface.

First-class support

These capabilities are modeled, validated, and intentionally supported by the planner, renderer, and tracked-job workflow.

AreaSupport
Allocation modelOne Slurm allocation per application
Submission flownew, plan, validate, config, inspect, preflight, prepare, render, up, when, alloc, run, debug
Tracked job workflowstatus, ps, watch, stats, score, logs, down, cancel, artifacts, clean, cache inspection/pruning
Top-level Slurm fieldsjob_name, partition, account, qos, time, nodes, ntasks, ntasks_per_node, cpus_per_task, mem, gres, gpus, GPU/CPU binding fields, constraint, output, error, chdir
Service step fieldsnodes, placement, ntasks, ntasks_per_node, cpus_per_task, gres, gpus, GPU/CPU binding fields, mpi
Multi-node modelSingle-node jobs, full-allocation distributed steps, and explicit node-index partitioning within one allocation
Runtime orchestrationdepends_on, readiness checks, one-shot completion dependencies, service failure policies, primary-node helper placement, explicit co-location through placement.share_with
Service hooksPer-service prologue and epilogue lifecycle hooks, plus host-side restart and window_exhausted event hooks
Runtime workflowPyxis/Enroot .sqsh, Apptainer/Singularity .sif, host runtime commands, x-runtime.prepare, shared cache handling
Scratch and stagingx-slurm.scratch, stage_in, stage_out, per-service scratch opt-out, raw #BB/#DW burst-buffer directives
Job trackingScheduler state via squeue/sacct, step stats via sstat, tracked logs, runtime state, metrics, artifacts, resume metadata
Advisory cluster weatherweather summarizes current node and queue conditions from read-only Slurm probes without reserving resources or changing submission behavior
Conditional submissionwhen actively monitors typed conditions, then submits one normal hpc-compose allocation
Canary right-sizinggerminate submits one short canary, writes latest-canary.json, and recommends resource settings without rewriting the spec
Hyperparameter sweepssweep submit expands one embedded sweep into many independent single-allocation jobs, then sweep status aggregates their tracked state
Cross-job rendezvousProvider/client discovery through shared-cache JSON records under one cluster-visible cache directory

Raw pass-through

These capabilities are usable, but hpc-compose does not model or validate their semantics beyond passing them through to Slurm.

MechanismWhat it allows
x-slurm.submit_argsRaw #SBATCH ... lines for site-specific flags such as mail settings, reservations, or other submit-time options
services.<name>.x-slurm.extra_srun_argsRaw srun arguments for site-specific launch flags such as exclusivity settings
Existing reservationsJoining an already-created reservation through raw submit args is supported as pass-through

Pass-through is appropriate when a site-specific flag is useful but does not justify a first-class schema field. hpc-compose rejects line breaks and null bytes in raw #SBATCH entries so one list entry cannot emit multiple directives, but it does not validate the Slurm semantics of those flags.

Unsupported or out of scope

These capabilities are intentionally outside the product seam.

AreaStatus
Admin-plane Slurm managementOut of scope
sacctmgr account administrationOut of scope
Reservation creation or lifecycle managementOut of scope
Federation / multi-cluster controlOut of scope
Cross-cluster service discoveryOut of scope; rendezvous is same-cluster shared-storage coordination only
Generic scontrol mutationOut of scope
Broad cluster inspection tools such as a full sinfo / sprio / sreport frontendOut of scope; weather is limited to a compact advisory snapshot
Background submit daemons or reservationsOut of scope; when is a foreground advisory monitor and does not reserve resources
Dynamic scheduling or bin packing across nodesNot supported; use explicit x-slurm.placement selectors
Heterogeneous jobs and job arrays as first-class workflow conceptsNot supported in v1; sweeps deliberately submit many normal allocations instead of Slurm arrays
Compose build, ports, custom networks, restart, deployNot supported

Non-goals

hpc-compose should not grow into a generic Slurm administration layer. In particular, it will not broaden into sacctmgr, reservation management, federation control, or generic scontrol mutation. Those are real Slurm features, but they do not fit the “one application, one allocation, tracked runtime workflow” seam this tool is built around.