Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15651

Exclude non-qserv pods from qserv nodes in PDAC

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • None

    Description

      Try k8s node taint/toleration feature to keep k8s from scheduling non-qserv pods on qserv master and db nodes

      Attachments

        Activity

          cbanek Christine Banek added a comment - - edited

          To set the restriction, run this command:

          kubectl taint nodes lsst-qserv-db01 lsst-qserv-db02 lsst-qserv-db03 lsst-qserv-db04 lsst-qserv-db05 lsst-qserv-db06 lsst-qserv-db07 lsst-qserv-db08 lsst-qserv-db09 lsst-qserv-db10 lsst-qserv-db11 lsst-qserv-db12 lsst-qserv-db13 lsst-qserv-db14 lsst-qserv-db15 lsst-qserv-db16 lsst-qserv-db17 lsst-qserv-db18 lsst-qserv-db19 lsst-qserv-db20 lsst-qserv-db21 lsst-qserv-db22 lsst-qserv-db23 lsst-qserv-db24 lsst-qserv-db25 lsst-qserv-db26 lsst-qserv-db27 lsst-qserv-db28 lsst-qserv-db29 lsst-qserv-db30 lsst-qserv-master01 dedicated=qserv:NoSchedule

          What this does it it means that node won't be scheduled on unless you say your pod is dedicated to qserv.  You do this by adding this to the pod yaml:

           

          tolerations:

            - key: "dedicated"

              operator: "Equal"

              value: "qserv"

              effect: "NoSchedule"

           

          This will allow that pod to be scheduled on either a db or master node.

           

          To remove the restriction on one node:

          kubectl taint nodes lsst-qserv-db01 dedicated-

          To do it for all of them, put in the whole list from above:

          kubectl taint nodes lsst-qserv-db01 lsst-qserv-db02 lsst-qserv-db03 lsst-qserv-db04 lsst-qserv-db05 lsst-qserv-db06 lsst-qserv-db07 lsst-qserv-db08 lsst-qserv-db09 lsst-qserv-db10 lsst-qserv-db11 lsst-qserv-db12 lsst-qserv-db13 lsst-qserv-db14 lsst-qserv-db15 lsst-qserv-db16 lsst-qserv-db17 lsst-qserv-db18 lsst-qserv-db19 lsst-qserv-db20 lsst-qserv-db21 lsst-qserv-db22 lsst-qserv-db23 lsst-qserv-db24 lsst-qserv-db25 lsst-qserv-db26 lsst-qserv-db27 lsst-qserv-db28 lsst-qserv-db29 lsst-qserv-db30 lsst-qserv-master01 dedicated-

          cbanek Christine Banek added a comment - - edited To set the restriction, run this command: kubectl taint nodes lsst-qserv-db01 lsst-qserv-db02 lsst-qserv-db03 lsst-qserv-db04 lsst-qserv-db05 lsst-qserv-db06 lsst-qserv-db07 lsst-qserv-db08 lsst-qserv-db09 lsst-qserv-db10 lsst-qserv-db11 lsst-qserv-db12 lsst-qserv-db13 lsst-qserv-db14 lsst-qserv-db15 lsst-qserv-db16 lsst-qserv-db17 lsst-qserv-db18 lsst-qserv-db19 lsst-qserv-db20 lsst-qserv-db21 lsst-qserv-db22 lsst-qserv-db23 lsst-qserv-db24 lsst-qserv-db25 lsst-qserv-db26 lsst-qserv-db27 lsst-qserv-db28 lsst-qserv-db29 lsst-qserv-db30 lsst-qserv-master01 dedicated=qserv:NoSchedule What this does it it means that node won't be scheduled on unless you say your pod is dedicated to qserv.  You do this by adding this to the pod yaml:   tolerations:   - key: "dedicated"     operator: "Equal"     value: "qserv"     effect: "NoSchedule"   This will allow that pod to be scheduled on either a db or master node.   To remove the restriction on one node: kubectl taint nodes lsst-qserv-db01 dedicated- To do it for all of them, put in the whole list from above: kubectl taint nodes lsst-qserv-db01 lsst-qserv-db02 lsst-qserv-db03 lsst-qserv-db04 lsst-qserv-db05 lsst-qserv-db06 lsst-qserv-db07 lsst-qserv-db08 lsst-qserv-db09 lsst-qserv-db10 lsst-qserv-db11 lsst-qserv-db12 lsst-qserv-db13 lsst-qserv-db14 lsst-qserv-db15 lsst-qserv-db16 lsst-qserv-db17 lsst-qserv-db18 lsst-qserv-db19 lsst-qserv-db20 lsst-qserv-db21 lsst-qserv-db22 lsst-qserv-db23 lsst-qserv-db24 lsst-qserv-db25 lsst-qserv-db26 lsst-qserv-db27 lsst-qserv-db28 lsst-qserv-db29 lsst-qserv-db30 lsst-qserv-master01 dedicated-

          Also here's a good link for how to do these things: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

          cbanek Christine Banek added a comment - Also here's a good link for how to do these things:  https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

          Okay that work is done, but now I'm trying to redeploy the dax services to have them scheduled elsewhere to make sure that works.  This has cropped up some stuck kubernetes issues that are taking some serious time.

          cbanek Christine Banek added a comment - Okay that work is done, but now I'm trying to redeploy the dax services to have them scheduled elsewhere to make sure that works.  This has cropped up some stuck kubernetes issues that are taking some serious time.

          By restarting kubelet on the head node, we were able to get deployments to work again, but this took a bit of time to figure out.  +2 SP.

          cbanek Christine Banek added a comment - By restarting kubelet on the head node, we were able to get deployments to work again, but this took a bit of time to figure out.  +2 SP.

          People

            cbanek Christine Banek
            fritzm Fritz Mueller
            Christine Banek, Fritz Mueller, Vaikunth Thukral
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Jenkins

                No builds found.