Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27257

EFD support during summit power up

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The summit ED is deployed on k3s on the efd-temp-k3s.cp.lsst.org server.

      During the summit power up one issue was that the IP address of the server changed and we had to recreate the route53 records for the different services.

      The other problem was that the DNS server changed but the k3s container was still using the old values.

      From the coredns logs we were seeing things like:

      2020-10-20T19:15:20.179Z [ERROR] plugin/errors: 2 cp-helm-charts-cp-kafka-1.cp-helm-charts-cp-kafka-headless.cp-helm-charts.lsst.org. A: unreachable backend: read udp 10.42.0.248:60041->139.229.162.22:53: i/o timeout
      

      For some of the pods, for example argocd-server, we had:

      Warning  Unhealthy  13m (x2 over 13m)       kubelet            Readiness probe failed: Get http://10.42.0.5:8082/healthz: dial tcp 10.42.0.5:8082: connect: connection refused
        Warning  Unhealthy  13m (x6 over 14m)       kubelet            Liveness probe failed: Get http://10.42.0.5:8082/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
      

      With Heinrich Reinking's help we noticed it was still using the old DNS addresses.

      That's probably because when the k3s container was started, right after the power up, it copied the `/etc/resolv.conf` file from the host before puppet configured the new values.

      /etc # cat resolv.conf 
      # File managed by puppet
      search cp.lsst.org lsst.org
      nameserver 139.229.162.22
      nameserver 139.229.162.87
      nameserver 208.67.222.222
      

      After restarting k3s again, it picked up the right values:

      / # cat /etc/resolv.conf 
      # File managed by puppet
      search cp.lsst.org lsst.org
      nameserver 139.229.160.53
      nameserver 139.229.160.54
      nameserver 208.67.222.222
      

        Attachments

          Issue Links

            Activity

            There are no comments yet on this issue.

              People

              Assignee:
              afausti Angelo Fausti
              Reporter:
              afausti Angelo Fausti
              Watchers:
              Angelo Fausti
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.