~/DOCS/

Configuration

Basic Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
      annotations:
        architect.loopholelabs.io/managed-containers: '["my-app-container"]'
        architect.loopholelabs.io/scaledown-durations: '{"my-app-container":"30s"}'
    spec:
      runtimeClassName: runc-architect
      containers:
        - name: my-app-container
          image: my-app:latest
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"

Runtime Classes

  • runc-architect: automatic hibernation on idle, wake on network or kubectl exec. Container-scoped checkpoints. Use managed-containers.
  • runsc-architect: gVisor security isolation. Pod-scoped checkpoints created explicitly via PersistentCheckpoint CRDs. Use managed-pod. See Examples.

Annotations

managed-containers

architect.loopholelabs.io/managed-containers: '["container-1", "container-2"]'

Which containers Architect manages. Unlisted containers run normally.

scaledown-durations

architect.loopholelabs.io/scaledown-durations: '{"container-1":"30s", "container-2":"60s"}'

Idle time before hibernation. Default: 60s.

initial-scaledown-delays

architect.loopholelabs.io/initial-scaledown-delays: '{"container-1":"90s"}'

Grace period (Go duration string) that suppresses hibernation for N seconds after the container's first ever scale-up. Useful for slow-starting workloads (e.g. JVMs whose readiness probes take longer than scaledown-durations) so they aren't hibernated mid-startup. After the window elapses, normal activity-based scale-down resumes. The window is not re-armed after a migration or post-scale-down restart — the workload is already past its slow startup by then. Default 0 (gate disabled). Values are clamped to 24h.

network-monitor

architect.loopholelabs.io/network-monitor: '{"container-1":"packets", "container-2":"connections"}'

Enables network-based wake: a container that has been scaled down wakes when it receives network traffic. An eBPF program in the pod's network namespace watches the container's declared ports and triggers a scale up. Without this annotation, the only way to wake a scaled-down container is kubectl exec.

Modes:

  • packets: wake on any incoming TCP/UDP packet on a tracked port. Best for sporadic request/response workloads (HTTP APIs, webhook receivers).
  • connections: TCP only. Wake on connection establishment, stay awake while any TCP connection is open. Best for long-lived connection patterns (databases, message brokers, gRPC servers). Avoid for clients that hold a pooled connection open indefinitely — the container will never scale down.

Activity is tracked per port. Architect monitors traffic only on the ports the container declares in its ports array. Shadow ports injected by health-check-proxy and shadow-ports are added to the container's ports array too (so Kubernetes Services can target them), but Architect ignores traffic on them when assessing activity — that's the whole point of the shadow port pattern. The traffic still reaches the application; it just doesn't keep the container running.

Activity is scoped per container. Sidecars sharing the pod's network namespace (Istio sidecars, fluentd, etc.) do not keep the managed container awake. Outbound traffic the container sends from an ephemeral source port also doesn't count. If your workload only ever does outbound traffic from ephemeral ports, use disable-autoscaledown-containers to opt out of automatic scale down.

Requires managed-containers.

health-check-proxy

architect.loopholelabs.io/health-check-proxy: '{"mappings":[{"containerName":"app","appPort":8080,"shadowPort":9080}]}'

Lets kubelet liveness, readiness, and startup probes pass while the container is scaled down, without waking it.

Without this annotation, every probe hits the application port and counts as activity, so a probed container never scales down. Removing the probes isn't an option either — they're the safety net that catches a stuck process.

To use this feature, configure your liveness, readiness, and startup probes to target the shadowPort instead of the application's real port. Architect takes care of the rest: it starts a sidecar that handles probes on the shadow port, forwarding them to the application while the container is running and answering them itself while the container is scaled down, so kubelet keeps seeing a healthy response.

Mapping fields:

  • containerName (required): name of a container in managed-containers.
  • appPort (required, 1–65535): the application's real probe port.
  • shadowPort (required, 1–65535): the port to point your probes at.

Duplicate shadowPort values across mappings are dropped with a warning. The sidecar is not added (and a warning is logged) if managed-containers or network-monitor is missing. To confirm the sidecar was added, check that architect-health-check-proxy appears in the pod's containers:

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'

Requires managed-containers and network-monitor. See Examples → Web API for a worked example, and Troubleshooting → Health probes wake the container if probes still wake the container.

shadow-ports

architect.loopholelabs.io/shadow-ports: '{"mappings":[{"containerName":"app","appPort":9090,"shadowPort":29090}]}'

Lets a scraper (Prometheus, an external health check, a debug tool) reach an application port without counting as activity, so regular scrapes don't keep the container awake.

Without this annotation, a 15-second Prometheus scrape on /metrics looks like continuous traffic to the application port and the container never scales down.

To use this feature, point your scraper (ServiceMonitor, PodMonitor, or static Prometheus scrape_configs) at the shadowPort instead of the application's real port. Traffic still reaches the application on the real port; the application isn't aware of the redirect.

Mapping fields:

  • containerName (required): name of a container in managed-containers.
  • appPort (required, 1–65535): the real port the application listens on.
  • shadowPort (required, 1–65535): the port to point your scraper at.

Duplicate shadowPort values across mappings are dropped with a warning. The shadow ports are not added (and a warning is logged) if managed-containers or network-monitor is missing.

If you can't move the scraper to a different port (e.g. it's hard-coded in existing Prometheus discovery), use ignore-activity-ports instead — it excludes the existing application port from activity tracking.

Requires managed-containers and network-monitor. See Examples → Metrics Scraping for a worked example, and Troubleshooting → Scrape traffic wakes the container if scrapes still wake the container.

ignore-activity-ports

architect.loopholelabs.io/ignore-activity-ports: '{"container-1":[9091, 9100]}'

Marks specific ports on the container's existing port spec as conntrack-bypassed so traffic to them does not count as activity. Unlike shadow-ports, there is no DNAT and no new port is injected — the operator is asserting that the listed ports are already declared on the container and the app already listens on them. Use this when a metrics scraper hits the real application port directly and should not keep the workload awake.

See Troubleshooting → Scrape traffic wakes the container for diagnostics.

postmigration-autoscaleup-containers

architect.loopholelabs.io/postmigration-autoscaleup-containers: '["container-1"]'

Containers that automatically scale up after migration (by default they stay hibernated to avoid thundering herd).

disable-autoscaledown-containers

architect.loopholelabs.io/disable-autoscaledown-containers: '["container-1"]'

Prevents automatic hibernation. Useful for background jobs that should migrate but not hibernate on idle.

scaleup-timeout-containers

architect.loopholelabs.io/scaleup-timeout-containers: '{"container-1": "60s"}'

How long to wait for a checkpoint during startup. Default: 30s. runc-architect only.

migrate-emptydir-containers

architect.loopholelabs.io/migrate-emptydir-containers: '["container-1"]'

Preserves emptyDir volume data during migration. By default, emptyDir volumes are not migrated.

sparse-files-containers

architect.loopholelabs.io/sparse-files-containers: '{"container-1": ["/var/cache/app.db"]}'

Recreates the listed files as sparse files (same size and mode, contents are zeros) at the destination instead of copying their bytes through the upper-layer snapshot. Skips the listed paths on the source so the migration does not pay the per-byte snapshot cost. Use for workloads that re-scan or rewrite the file post-restore (caches, generated artefacts, scratch space). Workloads that read the original contents after migration will see zeros. runc-architect only.

rewrite-listener-addresses-containers

architect.loopholelabs.io/rewrite-listener-addresses-containers: '["container-1"]'

Rewrites listener socket addresses in CRIU checkpoints during migration. When an application binds to the pod IP (rather than 0.0.0.0), the listener address becomes invalid on the destination pod. This annotation rewrites those addresses to INADDR_ANY (0.0.0.0) or in6addr_any (::) so the restore succeeds. runc-architect only.

rewrite-established-addresses-containers

architect.loopholelabs.io/rewrite-established-addresses-containers: '["container-1"]'

Rewrites the source IP of established TCP connections in CRIU checkpoints during migration. The source pod's IP no longer exists on the destination pod, which causes CRIU's socket restore to fail. This annotation rewrites the source address to the new pod's IP (read from /etc/hosts). Supports both IPv4 and IPv6. runc-architect only.

managed-pod (gVisor only)

architect.loopholelabs.io/managed-pod: "true"

Used with runsc-architect instead of managed-containers. The entire pod is managed and checkpointed together.

start-from-persistent-checkpoint

# Same namespace (name only):
architect.loopholelabs.io/start-from-persistent-checkpoint: "persistent-checkpoint-name"
# Cross-namespace (namespace/name):
architect.loopholelabs.io/start-from-persistent-checkpoint: "namespace/persistent-checkpoint-name"

Restore from a PersistentCheckpoint CRD on startup. When only a name is provided (no /), the PersistentCheckpoint is looked up in the same namespace as the pod. Use the namespace/name format to reference a checkpoint in a different namespace.

When set, this annotation takes priority over pod-template-hash-based Checkpoint CRDs: on any failure (not found, empty, download error, registry storage) the pod starts fresh without falling back to the runc-architect migration path.