Migrating from Docker Compose
This guide helps you convert an existing docker-compose.yaml into an hpc-compose spec for Slurm clusters using Pyxis/Enroot, Apptainer, Singularity, or host runtimes.
At a glance
| Docker Compose feature | hpc-compose equivalent |
|---|---|
image | image (same syntax, auto-prefixed with docker://) |
command | command (string or list, same syntax) |
entrypoint | entrypoint (string or list, same syntax) |
environment | environment (map or list, same syntax) |
volumes | volumes (host:container bind mounts, same syntax) |
depends_on | depends_on (list or map with condition: service_started / service_healthy) |
working_dir | working_dir (requires explicit command or entrypoint) |
build | Not supported. Use image + x-runtime.prepare.commands instead. |
ports | Not supported. Use host networking semantics instead. 127.0.0.1 works only when both sides run on the same node. |
networks / network_mode | Not supported. There is no Docker-style overlay network or service-name DNS layer. |
restart | Not supported as a Compose key. Use services.<name>.x-slurm.failure_policy. |
deploy | Not supported. Use x-slurm for resource allocation. |
healthcheck | Supported for a constrained TCP/HTTP subset and normalized into readiness; use explicit readiness for anything more complex. |
Resource limits (cpus, mem_limit) | Use x-slurm.cpus_per_task, x-slurm.mem, x-slurm.gpus |
Side-by-side: web app + Redis
Docker Compose
version: "3.9"
services:
redis:
image: redis:7
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
app:
build: .
ports:
- "8000:8000"
depends_on:
redis:
condition: service_healthy
environment:
REDIS_HOST: redis
volumes:
- ./app:/workspace
working_dir: /workspace
command: python -m main
hpc-compose
version: "1"
name: my-app
x-slurm:
job_name: my-app
time: "01:00:00"
mem: 8G
cpus_per_task: 4
cache_dir: /cluster/shared/hpc-compose-cache
services:
redis:
image: redis:7
command: redis-server --save "" --appendonly no
readiness:
type: tcp
host: 127.0.0.1
port: 6379
timeout_seconds: 30
app:
image: python:3.11-slim
depends_on:
redis:
condition: service_healthy
environment:
REDIS_HOST: 127.0.0.1
volumes:
- ./app:/workspace
working_dir: /workspace
command: python -m main
x-runtime:
prepare:
commands:
- pip install --no-cache-dir redis fastapi uvicorn
Key changes
version: "3.9"→version: "1"or remove the field. hpc-compose uses this as its own spec schema version, not a Docker Compose compatibility version.build: .→image: python:3.11-slim+x-runtime.prepare.commandsfor dependencies.ports→ Removed. Services communicate via127.0.0.1because they run on the same node.REDIS_HOST: redis→REDIS_HOST: 127.0.0.1. No DNS service names; use localhost.healthcheck→readinesswithtype: tcp.- Added
x-slurmblock for Slurm resource allocation (time, memory, CPUs). - Configured a shared cache for image storage, either through
x-slurm.cache_diras shown or project settings.
Key differences
Networking
Docker Compose creates isolated networks where services find each other by name. In hpc-compose, helper services on the same node share the host network directly, and multi-node distributed steps must use explicit rendezvous addresses. Replace service hostnames with 127.0.0.1 only when both sides intentionally stay on one node. For multi-node runs, derive the rendezvous host from /hpc-compose/job/allocation/primary_node or HPC_COMPOSE_PRIMARY_NODE.
Building images
Docker Compose uses build: to run a Dockerfile. hpc-compose uses x-runtime.prepare.commands instead:
# Docker Compose
app:
build:
context: .
dockerfile: Dockerfile
# hpc-compose
app:
image: python:3.11-slim
x-runtime:
prepare:
commands:
- pip install --no-cache-dir -r /tmp/requirements.txt
mounts:
- ./requirements.txt:/tmp/requirements.txt
Prefer volumes for fast-changing source code and x-runtime.prepare.commands for slower-changing dependencies. x-enroot.prepare remains accepted as a Pyxis/Enroot compatibility spelling, but new specs should use x-runtime.prepare.
Health checks vs readiness
Docker Compose uses healthcheck with a test command, interval, timeout, and retries. hpc-compose now accepts a constrained healthcheck subset and normalizes it into readiness:
# TCP: wait for a port to accept connections
readiness:
type: tcp
host: 127.0.0.1
port: 6379
timeout_seconds: 30
# Log: wait for a pattern in service output
readiness:
type: log
pattern: "Server started"
timeout_seconds: 60
# Sleep: fixed delay
readiness:
type: sleep
seconds: 5
Supported healthcheck migration patterns:
["CMD", "nc", "-z", HOST, PORT]["CMD-SHELL", "nc -z HOST PORT"]- recognized
curlprobes againsthttp://orhttps://URLs - recognized
wget --spiderprobes againsthttp://orhttps://URLs
Still unsupported in v1:
- arbitrary custom command probes
intervalretriesstart_period
Resource allocation
Docker Compose uses deploy.resources or top-level cpus/mem_limit. hpc-compose uses Slurm-native resource settings:
x-slurm:
time: "02:00:00"
mem: 32G
cpus_per_task: 8
gpus: 1
services:
app:
x-slurm:
cpus_per_task: 4
gpus: 1
Restart policies
Docker Compose supports restart: always, on-failure, etc. hpc-compose does not accept the Compose restart: key, but it does support per-service restart behavior through services.<name>.x-slurm.failure_policy.
services:
app:
image: python:3.11-slim
x-slurm:
failure_policy:
mode: restart_on_failure
max_restarts: 3
backoff_seconds: 5
window_seconds: 60
max_restarts_in_window: 3
restart_on_failure retries only on non-zero exits. It enforces both a lifetime restart cap and a rolling-window crash-loop cap during one live batch-script execution. If you omit the rolling-window fields, hpc-compose defaults to window_seconds: 60 and max_restarts_in_window: <resolved max_restarts>. Use mode: fail_job (default) for fail-fast behavior, or mode: ignore for non-critical sidecars.
Practical mapping:
- Compose
restart: "no"-> omitfailure_policyor usemode: fail_job - Compose
restart: on-failure[:N]-> usemode: restart_on_failurewithmax_restarts: Nwhen you want a similar lifetime retry budget - Compose
restart: always/unless-stopped-> no direct equivalent;hpc-composeintentionally keeps restart handling bounded within one batch job
The rolling-window fields have no direct Docker Compose equivalent. They exist to stop fast crash loops inside one Slurm allocation without giving up a larger lifetime retry budget for transient failures.
What to do about unsupported features
| Feature | Alternative |
|---|---|
build | Use image + x-runtime.prepare.commands. Mount build context files with x-runtime.prepare.mounts if needed. |
ports | Not needed. Services share 127.0.0.1 on one node. |
networks / network_mode | Not needed. All services are on the same host network. |
restart | Use services.<name>.x-slurm.failure_policy (fail_job, ignore, restart_on_failure). |
deploy | Use x-slurm for resources. |
| Service DNS names | Use 127.0.0.1 for same-node helpers, or explicit host metadata such as HPC_COMPOSE_PRIMARY_NODE for distributed runs. |
| Named volumes | Use host-path bind mounts in volumes. |
.env file | Supported. .env in the compose file directory is loaded automatically. |
Migration checklist
- Replace Compose
version:— Useversion: "1"or omit the field; values like"3.9"are rejected by hpc-compose. - Remove
build:— Replace withimage:pointing to a base image. Move dependency installation tox-runtime.prepare.commands. - Remove
ports:— Use host-network semantics instead of container port publishing. - Remove
networks:/network_mode:— There is no Docker-style overlay network or service-name DNS layer. - Remove Compose
restart:— useservices.<name>.x-slurm.failure_policywhen you need per-service restart behavior. - Remove
deploy:— Usex-slurmfor resource allocation. - Replace service hostnames — Change any service-name references (e.g.
redis,postgres) to127.0.0.1for same-node helpers, or to explicit allocation metadata for distributed runs. - Replace
healthcheck:— Convert toreadiness:withtype: tcp,type: log, ortype: sleep. - Add
x-slurm:— Settime,mem,cpus_per_task, and optionallygpus,partition,account. - Set cache storage — Point
x-slurm.cache_dirorsetup --cache-dirto shared storage visible from login and compute nodes. - Validate — Run
hpc-compose validate -f compose.yamlto check the converted spec. - Inspect — Run
hpc-compose inspect --verbose -f compose.yamlto confirm the planner understood your intent.