Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Story Points:1.4
-
Epic Link:
-
Team:SQuaRE
-
Urgent?:No
Description
The summit ED is deployed on k3s on the efd-temp-k3s.cp.lsst.org server.
During the summit power up one issue was that the IP address of the server changed and we had to recreate the route53 records for the different services.
The other problem was that the DNS server changed but the k3s container was still using the old values.
From the coredns logs we were seeing things like:
2020-10-20T19:15:20.179Z [ERROR] plugin/errors: 2 cp-helm-charts-cp-kafka-1.cp-helm-charts-cp-kafka-headless.cp-helm-charts.lsst.org. A: unreachable backend: read udp 10.42.0.248:60041->139.229.162.22:53: i/o timeout
|
For some of the pods, for example argocd-server, we had:
Warning Unhealthy 13m (x2 over 13m) kubelet Readiness probe failed: Get http://10.42.0.5:8082/healthz: dial tcp 10.42.0.5:8082: connect: connection refused
|
Warning Unhealthy 13m (x6 over 14m) kubelet Liveness probe failed: Get http://10.42.0.5:8082/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
With Heinrich Reinking's help we noticed it was still using the old DNS addresses.
That's probably because when the k3s container was started, right after the power up, it copied the `/etc/resolv.conf` file from the host before puppet configured the new values.
/etc # cat resolv.conf
|
# File managed by puppet
|
search cp.lsst.org lsst.org
|
nameserver 139.229.162.22
|
nameserver 139.229.162.87
|
nameserver 208.67.222.222
|
After restarting k3s again, it picked up the right values:
/ # cat /etc/resolv.conf
|
# File managed by puppet
|
search cp.lsst.org lsst.org
|
nameserver 139.229.160.53
|
nameserver 139.229.160.54
|
nameserver 208.67.222.222
|
Attachments
Issue Links
- relates to
-
DM-27187 Upgrade versions of InfluxDB, Chronograf and Kapacitor in all EFD envs
- Done