Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-20485

expose kubernetes node name/etc. in jenkins build logs

    Details

      Description

      At present, when jenkins build failures are suspected to be caused by a specific kubernetes node, the process for determining the node name is tedious. It requires determining the name of the jenkins agent(s) the build was scheduled upon, then manually describing the running k8s pod(s) to see which node(s) said pod(s) are currently scheduled on. If the pod(s) have been killed/restarted since the suspect build, this information is essentially lost to the SQRE team (but perhaps could be reverse engineered from kubelet logs?)

        Attachments

          Activity

          jhoblitt Joshua Hoblitt created issue -
          jhoblitt Joshua Hoblitt made changes -
          Field Original Value New Value
          Epic Link DM-18634 [ 247027 ]
          jhoblitt Joshua Hoblitt made changes -
          Link This issue is triggered by IHS-2363 [ IHS-2363 ]
          jhoblitt Joshua Hoblitt made changes -
          Story Points 2.5
          jhoblitt Joshua Hoblitt made changes -
          Attachment screenshot-1.png [ 39101 ]
          Hide
          jhoblitt Joshua Hoblitt added a comment -

          After some research, I discovered that the "downwardapi" could provided access to the k8s node name along with resource limits information for all of the containers in the pod.

          https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#downwardapivolumefile-v1-core
          https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/

          The following env vars were added to the deployment:

          / $ printenv | grep K8S | sort
          K8S_DIND_LIMITS_CPU=8
          K8S_DIND_LIMITS_MEMORY_GI=64
          K8S_DIND_REQUESTS_CPU=8
          K8S_DIND_REQUESTS_MEMORY_GI=64
          K8S_DOCKER_GC_LIMITS_CPU_M=500
          K8S_DOCKER_GC_LIMITS_MEMORY_MI=512
          K8S_DOCKER_GC_REQUESTS_CPU_M=500
          K8S_DOCKER_GC_REQUESTS_MEMORY_MI=512
          K8S_NODE_NAME=lsst-kub017
          K8S_POD_IP=10.41.0.28
          K8S_POD_NAMESPACE=jenkins-jhoblitt-curly
          K8S_SWARM_LIMITS_CPU=1
          K8S_SWARM_LIMITS_MEMORY_GI=2
          K8S_SWARM_REQUESTS_CPU=1
          K8S_SWARM_REQUESTS_MEMORY_GI=2
          

          However, the terraform kubernetes provider did not have support for the resoureceFieldSelector divisor key. I added support to my working fork of this provider and opened an upstream PR:

          https://github.com/terraform-providers/terraform-provider-kubernetes/pull/538

          In order to get this information into the console log, it needs to be printed from inside of a jenkins pipeline node block, so that the env vars from the swarm container are visible, but outside of a docker block, as the env vars aren't present in the dind container nor would they be present inside of a container. This was accomplished by adding a wrap method around the node step named util.nodeWrap() and all existing pipelines were updated to use this wrapper method were appropriate.

          The console output of most jobs, which running an agent on top of k8s, should have output similar to the following:

          Show
          jhoblitt Joshua Hoblitt added a comment - After some research, I discovered that the "downwardapi" could provided access to the k8s node name along with resource limits information for all of the containers in the pod. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#downwardapivolumefile-v1-core https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/ The following env vars were added to the deployment: / $ printenv | grep K8S | sort K8S_DIND_LIMITS_CPU= 8 K8S_DIND_LIMITS_MEMORY_GI= 64 K8S_DIND_REQUESTS_CPU= 8 K8S_DIND_REQUESTS_MEMORY_GI= 64 K8S_DOCKER_GC_LIMITS_CPU_M= 500 K8S_DOCKER_GC_LIMITS_MEMORY_MI= 512 K8S_DOCKER_GC_REQUESTS_CPU_M= 500 K8S_DOCKER_GC_REQUESTS_MEMORY_MI= 512 K8S_NODE_NAME=lsst-kub017 K8S_POD_IP= 10.41 . 0.28 K8S_POD_NAMESPACE=jenkins-jhoblitt-curly K8S_SWARM_LIMITS_CPU= 1 K8S_SWARM_LIMITS_MEMORY_GI= 2 K8S_SWARM_REQUESTS_CPU= 1 K8S_SWARM_REQUESTS_MEMORY_GI= 2 However, the terraform kubernetes provider did not have support for the resoureceFieldSelector divisor key. I added support to my working fork of this provider and opened an upstream PR: https://github.com/terraform-providers/terraform-provider-kubernetes/pull/538 In order to get this information into the console log, it needs to be printed from inside of a jenkins pipeline node block, so that the env vars from the swarm container are visible, but outside of a docker block, as the env vars aren't present in the dind container nor would they be present inside of a container. This was accomplished by adding a wrap method around the node step named util.nodeWrap() and all existing pipelines were updated to use this wrapper method were appropriate. The console output of most jobs, which running an agent on top of k8s, should have output similar to the following:
          jhoblitt Joshua Hoblitt made changes -
          Resolution Done [ 10000 ]
          Status To Do [ 10001 ] Done [ 10002 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              jhoblitt Joshua Hoblitt
              Watchers:
              Adam Thornton, Joshua Hoblitt
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel