Task Guide
Use this page when you know what you want to do, but not yet which command or example should be your starting point.
First run
- Read Quickstart.
- Start from
minimal-batchwithhpc-compose init --template minimal-batch --name my-app --cache-dir /shared/$USER/hpc-compose-cache --output compose.yaml. - Run
hpc-compose submit --watch -f compose.yaml.
Migrate from Docker Compose
- Read Docker Compose Migration.
- Replace
build:withimage:plusx-enroot.prepare.commands. - Replace service-name networking with
127.0.0.1or explicit allocation metadata where appropriate.
Single-node multi-service app
- Start from app-redis-worker.yaml.
- Add
depends_onandreadinessonly where ordering really matters. - Use Execution model to confirm which services can rely on localhost.
Multi-node distributed training
- Start from multi-node-torchrun.yaml or multi-node-mpi.yaml.
- Treat helper services as primary-node-only and the distributed job as the single allocation-wide step.
- Use allocation metadata such as
HPC_COMPOSE_PRIMARY_NODEinstead of Docker-style service discovery.
Checkpoint and resume workflows
- Start from training-checkpoints.yaml when you only need artifact output.
- Start from training-resume.yaml when the run should resume from shared storage across retries or later submissions.
- Keep the canonical resume source in
x-slurm.resume.path, not in exported artifact bundles.
LLM serving workflows
- Start from llm-curl-workflow.yaml, llm-curl-workflow-workdir.yaml, llama-uv-worker.yaml, or vllm-uv-worker.yaml.
- Use
volumesfor model directories and fast-changing code. - Use
x-enroot.prepare.commandsfor slower-changing dependencies.
Debug cluster readiness
- Run
hpc-compose validate -f compose.yaml. - Run
hpc-compose inspect --verbose -f compose.yaml. - Run
hpc-compose preflight -f compose.yaml. - Read the troubleshooting sections in Runbook.
Cache and artifact management
- Use
hpc-compose cache listto inspect imported/prepared artifacts. - Use
hpc-compose cache inspect -f compose.yamlto see per-service reuse expectations. - Use
hpc-compose artifacts -f compose.yamlafter a run to export tracked payloads.
Automation and scripting with JSON output
- Prefer
--format jsonfor machine-readable output onvalidate,render,prepare,preflight,inspect,status,stats,artifacts, andcachesubcommands. - Use
hpc-compose stats --format jsonlor--format csvwhen downstream tooling wants row-oriented metrics. - Treat
--jsonas a compatibility alias on older machine-readable commands; new automation should prefer--format json.