Debugging kube-dns and using FQDNs

SHARE

Last week one of our clients started getting a lot of application errors after they migrated their main service to Google Kubernetes Engine. Quickly they found kube-dns is logging a lot of errors and is consuming a suspicious amount of cpu.

It was a relatively small GKE cluster (around 16 nodes in peak on average). One GKE kube-dns is deployed automatically and its manifests are synchronized from master nodes. You cannot simply change the kube-dns deployment.

If you take a look at

kubectl edit cm -n kube-system kube-dns-autoscaler

you see

apiVersion: v1

data:

 linear: ‘{“coresPerReplica”:256,”nodesPerReplica”:16,”preventSinglePointFailure”:true}’

kind: ConfigMap

this sets the number of kube-dns replicas to 2 (1 per 16 nodes plus 1 for HA).

Changing ”nodesPerReplica”:16 to ”nodesPerReplica”:4 made the kube-dns scale to 4 replicas and we got rid of a lot of errors.

We weren’t monitoring kube-dns and by default there are not metrics or verbose logs for kube-dns (this should change in newer k8s versions on GKE with CoreDNS I hope).

When we had time to do so, we tried the following

kubectl exec -it kube-dns-788979dc8f-9qmrz sh# apk add — update tcpdump

# timeout -t 60 — tcpdump -lvi any “udp port 53” | tee /tmp/tcpdumps

# grep -E ‘A\?’ /tmp/tcpdumps |sed -e ‘s/^.*A? //’ -e ‘s/ .*//’|sort | uniq -c | sort -nr | awk ‘{printf “%s %s\n”, $2, $1}’

this gives us a sorted list with the most requested DNS queries in the last minute of ONE kube-dns replica.

app-redis-cache-01.c.example-project-name.internal. 1688

app-elk-01.c.example-project-name.internal. 1430

app-redis-cache-01.cluster.local. 1148

app-redis-cache-01.svc.cluster.local. 1140

app-redis-cache-01.prod-namespace.svc.cluster.local. 1118

app-elk-01.svc.cluster.local. 984

app-elk-01.cluster.local. 982

app-elk-01.prod-namespace.svc.cluster.local. 922

www.googleapis.com.google.internal. 68

oauth2.googleapis.com. 50

and others…

The most interesting is the app-redis-cache-01. It’s the app cache stored in redis that was recently, for some reason, moved from GKE to GCE instance group. The app’s configuration is referencing the redis-cache as “app-redis-cache-01” which is a local DNS.

Given the configuration in /etc/resolv.conf (ndots:5, for more info check the https://pracucci.com/kubernetes-dns-resolution-ndots-options-and-why-it-may-affect-application-performances.html) it was trying to search for the app-redis-cache-01 in app-redis-cache-01.prod-namespace.svc.cluster.local., app-redis-cache-01.svc.cluster.local., app-redis-cache-01.cluster.local. and then finally app-redis-cache-01.c.example-project-name.internal. which finally resolved succesfully.

The app was querying ‘hey, where’s my redis cache’ 350 times a second.

Simple fix was to change the app’s config to use a FQDN app-redis-cache-01.c.example-project-name.internal. (notice the last dot) instead of app-redis-cache-01 and do the same for elk.

This decreased the load on kube-dns significantly and reduced the delay in resolving the DNS queries, making the request to cache faster. This is still a hotfix and more tweaking and thinking about how we use DNS and how this affects our app’s performance should be done.

We’ll probably end up playing a bit with ndots settings, HostAlias, dnsPolicy and look at caching dns requests in our app.

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy