Troubleshooting | Architect Docs

Pod Not Hibernating

# Check idle timeout
kubectl get pod <pod-name> \
  -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/scaledown-durations}'

# Verify container is managed
kubectl get pod <pod-name> \
  -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/managed-containers}'

# Check status label
kubectl get pod <pod-name> \
  -o jsonpath='{.metadata.labels.status\.architect\.loopholelabs\.io/<container-name>}'

# Review daemon logs
kubectl logs -n architect -l app.kubernetes.io/name=architectd | grep <pod-name>

The Architect Console also shows per-pod events, timings, and detailed debugging info.

Pod Not Waking

# Test wake via exec
kubectl exec -it <pod-name> -- /bin/sh -c "echo test"

# Test wake via network
kubectl port-forward <pod-name> <port>:<port>
curl localhost:<port>

# Check events
kubectl describe pod <pod-name>

# Verify daemon is running on the pod's node
kubectl get pod <pod-name> -o wide
kubectl get pods -n architect -o wide | grep <node-name>

Scale Down and Wake

Health probes wake the container

If a managed container with health-check-proxy configured still wakes whenever kubelet probes it:

# Confirm the sidecar was added
kubectl get pod <pod-name> \
  -o jsonpath='{.spec.containers[*].name}'

# Confirm probe ports target the shadow port, not the app port
kubectl get pod <pod-name> \
  -o jsonpath='{.spec.containers[?(@.name=="<container>")].livenessProbe}'

# Check the admission controller didn't skip the sidecar
kubectl logs -n architect -l app=architect-admission-controller \
  | grep -i 'health check proxy'

The first command lists every container in the pod; you should see architect-health-check-proxy alongside your application container, e.g.:

my-app architect-health-check-proxy

Checklist:

The probe's port field on each managed container must reference the shadowPort, not the appPort. Probes that still target the application port bypass the sidecar entirely.
Both managed-containers and network-monitor annotations must be present. Without either, the admission controller logs a warning and skips sidecar injection.
The sidecar (architect-health-check-proxy) must be present in spec.containers. If it isn't, check admission controller logs.

Scrape traffic wakes the container

If a Prometheus scrape (or other external poller) wakes a managed container that has shadow-ports configured:

# Confirm the shadow port is on the container spec
kubectl get pod <pod-name> \
  -o jsonpath='{.spec.containers[?(@.name=="<container>")].ports}'

# Check the admission controller didn't skip the shadow ports
kubectl logs -n architect -l app=architect-admission-controller \
  | grep -i 'shadow ports'

The first command lists the container's ports; the shadow port appears with a shadow- name prefix, e.g.:

[{"containerPort":9090} {"containerPort":29090,"name":"shadow-29090","protocol":"TCP"}]

Checklist:

The scraper must target the shadowPort, not the appPort. Verify your ServiceMonitor, PodMonitor, or scrape_configs references the shadow port (named shadow-<port> on the container spec).
Both managed-containers and network-monitor annotations must be present. Without either, the admission controller logs a warning and skips injection.
If you can't move the scraper to a new port, swap shadow-ports for ignore-activity-ports so the existing app port is exempted from activity tracking.

Sidecar fails to inject

If health-check-proxy is set but no sidecar appears on the pod:

kubectl logs -n architect -l app=architect-admission-controller \
  | grep -i 'health check proxy\|shadow ports'

Checklist:

The annotation JSON must parse — invalid JSON is logged and the feature is skipped.
managed-containers must list the container referenced in each mapping.
network-monitor must be set on the pod.
All ports must be in the 1–65535 range; mappings outside the range are dropped with a warning.
Duplicate shadowPort values across mappings are dropped with a warning. Only the first mapping per shadow port is used.

High Wake Times

If wake times exceed 50ms:

Check node CPU and memory availability — contention slows restore
Large memory footprints produce larger checkpoints
Verify no resource contention on the node
Check daemon logs or the Architect Console for per-pod restore timings:

kubectl logs -n architect -l app.kubernetes.io/name=architectd --tail=500 \
  | grep -E "checkpoint|restore|error"

Checkpoint Failures

GPU workloads are not supported yet
Checkpoints use 50-200MB per pod; check node disk space:

kubectl get nodes \
  -o custom-columns=NAME:.metadata.name,DISK:.status.allocatable.ephemeral-storage

Verify runtimeClassName is set and the node has the architect.loopholelabs.io/node=true label
Check the Architect Console for checkpoint error details

Runtime Class Errors After Uninstall

Pods still referencing runc-architect or runsc-architect will error. Remove runtimeClassName from affected workloads. See Uninstalling.