~/DOCS/

Best Practices

Node Configuration

  • Label only intended nodes: architect.loopholelabs.io/node=true controls where architectd runs. Only label nodes where you want Architect workloads.
  • Use critical-node labels for control components: The architect-admission-controller and architect-control-plane require architect.loopholelabs.io/critical-node=true. Place these on stable nodes that are unlikely to be drained or preempted.
  • Use tolerations and node selectors: The Helm chart exposes architectdNodeSelector, architectControlPlaneNodeSelector, and architectAdmissionControllerNodeSelector (plus matching toleration options) to control placement. See Installation → Helm Chart Options.

Application Suitability

Well-suited applications:

  • Stateless web services and APIs
  • Microservices with intermittent traffic
  • Development and staging environments
  • Services with predictable traffic patterns

See FAQ → Compatibility for tested languages and frameworks.

Not yet supported:

  • GPU workloads (CUDA state preservation is under development)

Configuration Guidelines

  • Start with the default timeout: The default idle timeout is 60s. Lower it gradually per-container via the scaledown-durations annotation once you've validated behavior.
  • Test in staging first: Always validate hibernation behavior in non-production environments before rolling out. See Testing Your Application.
  • Enable network-monitor for web traffic: Use the network-monitor annotation with packets or connections mode so containers wake on incoming requests instead of only on kubectl exec.
  • Use health-check-proxy for probed services: If your workload has liveness, readiness, or startup probes, enable health-check-proxy. Without it the probes themselves prevent sleep (they hit the application port and count as activity), and removing them loses the safety net that catches a stuck process. See Configuration → health-check-proxy.
  • Use shadow-ports for scraped metrics: If Prometheus (or any other external scraper) hits the workload on a regular interval, enable shadow-ports to redirect scrape traffic to a port that doesn't count as activity. Same problem, same fix. Use ignore-activity-ports if you can't move the scraper to a different port. See Configuration → shadow-ports.

Capacity Planning

Hibernated pods consume zero CPU and memory while staying scheduled. This means you can run more replicas for availability without proportional cost increase -- idle replicas hibernate automatically and wake in under 50ms when traffic arrives.