why¶
klue why diagnoses why a Kubernetes resource is unhealthy. It discovers
resources at runtime (including CRDs), fetches related cluster state, and runs
diagnostic rules to explain root causes.
Syntax¶
Resource tokens¶
The <resource> argument accepts:
- Kind — for example
pod,deployment,certificate - Plural resource name — for example
pods,deployments,certificates - Aliases — kubectl-style short names where defined (for example
deploy)
klue resolves the token against the built-in catalog plus CRDs discovered on the cluster.
Custom resources (CRDs)
CRDs such as cert-manager Certificate objects work out of the box. When a
token is served by multiple API groups or versions, add --api-version:
If the token is ambiguous without --api-version, klue prints the candidate
apiVersions so you can re-run with the correct one.
Namespace behavior¶
Use -n / --namespace to set the namespace (default: default).
| Scope | Behavior |
|---|---|
| Namespaced resources | Target is looked up in the given namespace |
| Cluster-scoped resources | Namespace flag is ignored for the target lookup1 |
Diagnosis pipeline¶
sequenceDiagram
participant User
participant klue
participant API as Kubernetes API
User->>klue: why pod my-pod -n default
klue->>API: Discover resources
klue->>API: Fetch namespace snapshot
klue->>klue: Build resource graph
klue->>API: Fetch container logs (unhealthy related pods)
klue->>klue: Run diagnostic rules
klue-->>User: Diagnosis (text, JSON, or Markdown)
- Connect — load kubeconfig / in-cluster credentials (see Kubernetes access).
- Discover — enumerate served API resources (built-ins + CRDs).
- Resolve — map
<resource>to a single API resource descriptor. - Fetch — list objects in parallel to build a namespace snapshot and
resource graph. By default, klue prefetches the target first and limits CRD
lists to related kinds; use
--full-snapshotto force legacy full-snapshot behavior and--fetch-crds=all|related|noneto tune CRD listing scope. - Collect logs (conditional second pass) — for unhealthy pods related to the target, fetch trailing container logs only when first-pass findings suggest logs are likely to add signal (enabled by default; see Flags).
- Diagnose — run selected rules against the graph, event index, and log excerpts.
- Render — output text (default) or JSON (
-o json).
Container logs¶
By default, klue fetches the last 100 lines from containers on unhealthy pods
related to the diagnosis target (for example pods in CrashLoopBackOff owned by
a Deployment). Candidate containers are selected from status and warning-event
signals (for example crash loops, probe failures, and create/config/image-pull
wait states when warning events corroborate them). klue first runs diagnosis
without logs, then fetches logs only when the first pass indicates they are
likely to improve the diagnosis. Log excerpts are matched against common failure
patterns (panic, OOM, connection refused, DNS failures, missing config,
permission denied), and attached as evidence on findings.
Event and log correlation¶
klue correlates warning events and logs to improve diagnostic confidence while staying deterministic:
- Pod rules can attach both warning event evidence and log evidence to the same finding.
- Warning-event messages are parsed into structured signal categories (image pull, scheduling, probe, mount, provisioning) that rules use for focused matching and explanations.
pod/image-pull,pod/probe-failure,pod/pending,pod/mount-failure, andpvc/provisioner-stuckuse those structured signals.- Confidence/ranking prefer corroborated findings when severity is equal.
- Generic warning findings are suppressed when the same event evidence is already represented by a stronger typed finding.
- If logs cannot be fetched (RBAC/network constraints), diagnosis still uses status and warning events.
Debug metadata¶
Use --debug to include pipeline observability details in output:
- selected rules
- log candidates and selection reasons
- log fetch counts/errors
- correlation and duplicate-suppression counters
| Flag | Default | Description |
|---|---|---|
--no-fetch-logs |
false |
Skip container log collection |
--log-tail-lines |
100 |
Trailing lines to fetch per container |
Log collection requires pods/log get permission on the candidate pods.
When log access is denied, diagnosis continues using status and events only.
Log excerpts appear in text and JSON output. Treat output as potentially sensitive when sharing diagnoses outside your team.
Disable log fetching in automation with:
Output formats¶
Examples¶
klue why pod web-7fdc4f4d74-jj6hb -n default
klue why pod web-abc -n default --max-depth 2 --event-window 30m
klue why deployment api -n prod --disable-rule builtin/warning-events
klue why certificate my-cert -n cert-manager -o json
klue why certificate my-cert --api-version cert-manager.io/v1 -n cert-manager
Related¶
-
Examples of cluster-scoped kinds include
nodeandpersistentvolume. Related namespaced objects are still fetched from the specified namespace where relevant. ↩