Basic Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
annotations:
architect.loopholelabs.io/managed-containers: '["my-app-container"]'
architect.loopholelabs.io/scaledown-durations: '{"my-app-container":"30s"}'
spec:
runtimeClassName: runc-architect
containers:
- name: my-app-container
image: my-app:latest
resources:
requests:
memory: "512Mi"
cpu: "250m"Runtime Classes
runc-architect: automatic hibernation on idle, wake on network orkubectl exec. Container-scoped checkpoints. Usemanaged-containers.runsc-architect: gVisor security isolation. Pod-scoped checkpoints created explicitly viaPersistentCheckpointCRDs. Usemanaged-pod. See Examples.
Annotations
managed-containers
architect.loopholelabs.io/managed-containers: '["container-1", "container-2"]'Which containers Architect manages. Unlisted containers run normally.
scaledown-durations
architect.loopholelabs.io/scaledown-durations: '{"container-1":"30s", "container-2":"60s"}'Idle time before hibernation. Default: 60s.
initial-scaledown-delays
architect.loopholelabs.io/initial-scaledown-delays: '{"container-1":"90s"}'Grace period (Go duration string) that suppresses hibernation for N seconds
after the container's first ever scale-up. Useful for slow-starting workloads
(e.g. JVMs whose readiness probes take longer than scaledown-durations) so
they aren't hibernated mid-startup. After the window elapses, normal
activity-based scale-down resumes. The window is not re-armed after a
migration or post-scale-down restart — the workload is already past its slow
startup by then. Default 0 (gate disabled). Values are clamped to 24h.
network-monitor
architect.loopholelabs.io/network-monitor: '{"container-1":"packets", "container-2":"connections"}'Enables network-based wake: a container that has been scaled down wakes when
it receives network traffic. An eBPF program in the pod's network namespace
watches the container's declared ports and triggers a scale up. Without this
annotation, the only way to wake a scaled-down container is kubectl exec.
Modes:
packets: wake on any incoming TCP/UDP packet on a tracked port. Best for sporadic request/response workloads (HTTP APIs, webhook receivers).connections: TCP only. Wake on connection establishment, stay awake while any TCP connection is open. Best for long-lived connection patterns (databases, message brokers, gRPC servers). Avoid for clients that hold a pooled connection open indefinitely — the container will never scale down.
Activity is tracked per port. Architect monitors traffic only on the
ports the container declares in its ports array. Shadow ports injected by
health-check-proxy and shadow-ports
are added to the container's ports array too (so Kubernetes Services can
target them), but Architect ignores traffic on them when assessing activity —
that's the whole point of the shadow port pattern. The traffic still reaches
the application; it just doesn't keep the container running.
Activity is scoped per container. Sidecars sharing the pod's network
namespace (Istio sidecars, fluentd, etc.) do not keep the managed container
awake. Outbound traffic the container sends from an ephemeral source port
also doesn't count. If your workload only ever does outbound traffic from
ephemeral ports, use
disable-autoscaledown-containers to
opt out of automatic scale down.
Requires managed-containers.
health-check-proxy
architect.loopholelabs.io/health-check-proxy: '{"mappings":[{"containerName":"app","appPort":8080,"shadowPort":9080}]}'Lets kubelet liveness, readiness, and startup probes pass while the container is scaled down, without waking it.
Without this annotation, every probe hits the application port and counts as activity, so a probed container never scales down. Removing the probes isn't an option either — they're the safety net that catches a stuck process.
To use this feature, configure your liveness, readiness, and startup
probes to target the shadowPort instead of the application's real port.
Architect takes care of the rest: it starts a sidecar that handles probes on
the shadow port, forwarding them to the application while the container is
running and answering them itself while the container is scaled down, so
kubelet keeps seeing a healthy response.
Mapping fields:
containerName(required): name of a container inmanaged-containers.appPort(required, 1–65535): the application's real probe port.shadowPort(required, 1–65535): the port to point your probes at.
Duplicate shadowPort values across mappings are dropped with a warning.
The sidecar is not added (and a warning is logged) if managed-containers
or network-monitor is missing. To confirm the sidecar was added, check
that architect-health-check-proxy appears in the pod's containers:
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'Requires managed-containers and network-monitor. See
Examples → Web API for a worked example, and
Troubleshooting → Health probes wake the container
if probes still wake the container.
shadow-ports
architect.loopholelabs.io/shadow-ports: '{"mappings":[{"containerName":"app","appPort":9090,"shadowPort":29090}]}'Lets a scraper (Prometheus, an external health check, a debug tool) reach an application port without counting as activity, so regular scrapes don't keep the container awake.
Without this annotation, a 15-second Prometheus scrape on /metrics looks
like continuous traffic to the application port and the container never
scales down.
To use this feature, point your scraper (ServiceMonitor, PodMonitor, or
static Prometheus scrape_configs) at the shadowPort instead of the
application's real port. Traffic still reaches the application on the real
port; the application isn't aware of the redirect.
Mapping fields:
containerName(required): name of a container inmanaged-containers.appPort(required, 1–65535): the real port the application listens on.shadowPort(required, 1–65535): the port to point your scraper at.
Duplicate shadowPort values across mappings are dropped with a warning.
The shadow ports are not added (and a warning is logged) if
managed-containers or network-monitor is missing.
If you can't move the scraper to a different port (e.g. it's hard-coded in
existing Prometheus discovery), use
ignore-activity-ports instead — it excludes the
existing application port from activity tracking.
Requires managed-containers and network-monitor. See
Examples → Metrics Scraping for a worked
example, and
Troubleshooting → Scrape traffic wakes the container
if scrapes still wake the container.
ignore-activity-ports
architect.loopholelabs.io/ignore-activity-ports: '{"container-1":[9091, 9100]}'Marks specific ports on the container's existing port spec as conntrack-bypassed
so traffic to them does not count as activity. Unlike shadow-ports,
there is no DNAT and no new port is injected — the operator is asserting that
the listed ports are already declared on the container and the app already
listens on them. Use this when a metrics scraper hits the real application port
directly and should not keep the workload awake.
See Troubleshooting → Scrape traffic wakes the container for diagnostics.
postmigration-autoscaleup-containers
architect.loopholelabs.io/postmigration-autoscaleup-containers: '["container-1"]'Containers that automatically scale up after migration (by default they stay hibernated to avoid thundering herd).
disable-autoscaledown-containers
architect.loopholelabs.io/disable-autoscaledown-containers: '["container-1"]'Prevents automatic hibernation. Useful for background jobs that should migrate but not hibernate on idle.
scaleup-timeout-containers
architect.loopholelabs.io/scaleup-timeout-containers: '{"container-1": "60s"}'How long to wait for a checkpoint during startup. Default: 30s.
runc-architect only.
migrate-emptydir-containers
architect.loopholelabs.io/migrate-emptydir-containers: '["container-1"]'Preserves emptyDir volume data during migration. By default, emptyDir volumes are not migrated.
sparse-files-containers
architect.loopholelabs.io/sparse-files-containers: '{"container-1": ["/var/cache/app.db"]}'Recreates the listed files as sparse files (same size and mode, contents are
zeros) at the destination instead of copying their bytes through the
upper-layer snapshot. Skips the listed paths on the source so the migration
does not pay the per-byte snapshot cost. Use for workloads that re-scan or rewrite the file
post-restore (caches, generated artefacts, scratch space). Workloads that read the original
contents after migration will see zeros. runc-architect only.
rewrite-listener-addresses-containers
architect.loopholelabs.io/rewrite-listener-addresses-containers: '["container-1"]'Rewrites listener socket addresses in CRIU checkpoints during migration. When an
application binds to the pod IP (rather than 0.0.0.0), the listener address
becomes invalid on the destination pod. This annotation rewrites those addresses
to INADDR_ANY (0.0.0.0) or in6addr_any (::) so the restore succeeds.
runc-architect only.
rewrite-established-addresses-containers
architect.loopholelabs.io/rewrite-established-addresses-containers: '["container-1"]'Rewrites the source IP of established TCP connections in CRIU checkpoints during
migration. The source pod's IP no longer exists on the destination pod, which
causes CRIU's socket restore to fail. This annotation rewrites the source
address to the new pod's IP (read from /etc/hosts). Supports both IPv4 and
IPv6. runc-architect only.
managed-pod (gVisor only)
architect.loopholelabs.io/managed-pod: "true"Used with runsc-architect instead of managed-containers. The entire pod is
managed and checkpointed together.
start-from-persistent-checkpoint
# Same namespace (name only):
architect.loopholelabs.io/start-from-persistent-checkpoint: "persistent-checkpoint-name"
# Cross-namespace (namespace/name):
architect.loopholelabs.io/start-from-persistent-checkpoint: "namespace/persistent-checkpoint-name"Restore from a PersistentCheckpoint CRD on startup. When only a name is provided (no /),
the PersistentCheckpoint is looked up in the same namespace as the pod. Use
the namespace/name format to reference a checkpoint in a different namespace.
When set, this annotation takes priority over pod-template-hash-based
Checkpoint CRDs: on any failure (not found, empty, download error, registry
storage) the pod starts fresh without falling back to the
runc-architect migration path.