~/DOCS/

Troubleshooting

Pod Not Hibernating

# Check idle timeout
kubectl get pod <pod-name> \
  -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/scaledown-durations}'

# Verify container is managed
kubectl get pod <pod-name> \
  -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/managed-containers}'

# Check status label
kubectl get pod <pod-name> \
  -o jsonpath='{.metadata.labels.status\.architect\.loopholelabs\.io/<container-name>}'

# Review daemon logs
kubectl logs -n architect -l app=architectd | grep <pod-name>

The Architect Console also shows per-pod events, timings, and detailed debugging info.

Pod Not Waking

# Test wake via exec
kubectl exec -it <pod-name> -- /bin/sh -c "echo test"

# Test wake via network
kubectl port-forward <pod-name> <port>:<port>
curl localhost:<port>

# Check events
kubectl describe pod <pod-name>

# Verify daemon is running on the pod's node
kubectl get pod <pod-name> -o wide
kubectl get pods -n architect -o wide | grep <node-name>

High Wake Times

If wake times exceed 50ms:

  • Check node CPU and memory availability — contention slows restore
  • Large memory footprints produce larger checkpoints
  • Verify no resource contention on the node
  • Check daemon logs or the Architect Console for per-pod restore timings:
kubectl logs -n architect -l app=architectd --tail=500 \
  | grep -E "checkpoint|restore|error"

Checkpoint Failures

  • GPU workloads are not supported yet
  • Checkpoints use 50-200MB per pod; check node disk space:
kubectl get nodes \
  -o custom-columns=NAME:.metadata.name,DISK:.status.allocatable.ephemeral-storage
  • Verify runtimeClassName is set and the node has the architect.loopholelabs.io/node=true label
  • Check the Architect Console for checkpoint error details

Runtime Class Errors After Uninstall

Pods still referencing runc-architect or runsc-architect will error. Remove runtimeClassName from affected workloads. See Uninstalling.