Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15651

Exclude non-qserv pods from qserv nodes in PDAC

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Try k8s node taint/toleration feature to keep k8s from scheduling non-qserv pods on qserv master and db nodes

        Attachments

          Activity

          Hide
          cbanek Christine Banek added a comment - - edited

          To set the restriction, run this command:

          kubectl taint nodes lsst-qserv-db01 lsst-qserv-db02 lsst-qserv-db03 lsst-qserv-db04 lsst-qserv-db05 lsst-qserv-db06 lsst-qserv-db07 lsst-qserv-db08 lsst-qserv-db09 lsst-qserv-db10 lsst-qserv-db11 lsst-qserv-db12 lsst-qserv-db13 lsst-qserv-db14 lsst-qserv-db15 lsst-qserv-db16 lsst-qserv-db17 lsst-qserv-db18 lsst-qserv-db19 lsst-qserv-db20 lsst-qserv-db21 lsst-qserv-db22 lsst-qserv-db23 lsst-qserv-db24 lsst-qserv-db25 lsst-qserv-db26 lsst-qserv-db27 lsst-qserv-db28 lsst-qserv-db29 lsst-qserv-db30 lsst-qserv-master01 dedicated=qserv:NoSchedule

          What this does it it means that node won't be scheduled on unless you say your pod is dedicated to qserv.  You do this by adding this to the pod yaml:

           

          tolerations:

            - key: "dedicated"

              operator: "Equal"

              value: "qserv"

              effect: "NoSchedule"

           

          This will allow that pod to be scheduled on either a db or master node.

           

          To remove the restriction on one node:

          kubectl taint nodes lsst-qserv-db01 dedicated-

          To do it for all of them, put in the whole list from above:

          kubectl taint nodes lsst-qserv-db01 lsst-qserv-db02 lsst-qserv-db03 lsst-qserv-db04 lsst-qserv-db05 lsst-qserv-db06 lsst-qserv-db07 lsst-qserv-db08 lsst-qserv-db09 lsst-qserv-db10 lsst-qserv-db11 lsst-qserv-db12 lsst-qserv-db13 lsst-qserv-db14 lsst-qserv-db15 lsst-qserv-db16 lsst-qserv-db17 lsst-qserv-db18 lsst-qserv-db19 lsst-qserv-db20 lsst-qserv-db21 lsst-qserv-db22 lsst-qserv-db23 lsst-qserv-db24 lsst-qserv-db25 lsst-qserv-db26 lsst-qserv-db27 lsst-qserv-db28 lsst-qserv-db29 lsst-qserv-db30 lsst-qserv-master01 dedicated-

          Show
          cbanek Christine Banek added a comment - - edited To set the restriction, run this command: kubectl taint nodes lsst-qserv-db01 lsst-qserv-db02 lsst-qserv-db03 lsst-qserv-db04 lsst-qserv-db05 lsst-qserv-db06 lsst-qserv-db07 lsst-qserv-db08 lsst-qserv-db09 lsst-qserv-db10 lsst-qserv-db11 lsst-qserv-db12 lsst-qserv-db13 lsst-qserv-db14 lsst-qserv-db15 lsst-qserv-db16 lsst-qserv-db17 lsst-qserv-db18 lsst-qserv-db19 lsst-qserv-db20 lsst-qserv-db21 lsst-qserv-db22 lsst-qserv-db23 lsst-qserv-db24 lsst-qserv-db25 lsst-qserv-db26 lsst-qserv-db27 lsst-qserv-db28 lsst-qserv-db29 lsst-qserv-db30 lsst-qserv-master01 dedicated=qserv:NoSchedule What this does it it means that node won't be scheduled on unless you say your pod is dedicated to qserv.  You do this by adding this to the pod yaml:   tolerations:   - key: "dedicated"     operator: "Equal"     value: "qserv"     effect: "NoSchedule"   This will allow that pod to be scheduled on either a db or master node.   To remove the restriction on one node: kubectl taint nodes lsst-qserv-db01 dedicated- To do it for all of them, put in the whole list from above: kubectl taint nodes lsst-qserv-db01 lsst-qserv-db02 lsst-qserv-db03 lsst-qserv-db04 lsst-qserv-db05 lsst-qserv-db06 lsst-qserv-db07 lsst-qserv-db08 lsst-qserv-db09 lsst-qserv-db10 lsst-qserv-db11 lsst-qserv-db12 lsst-qserv-db13 lsst-qserv-db14 lsst-qserv-db15 lsst-qserv-db16 lsst-qserv-db17 lsst-qserv-db18 lsst-qserv-db19 lsst-qserv-db20 lsst-qserv-db21 lsst-qserv-db22 lsst-qserv-db23 lsst-qserv-db24 lsst-qserv-db25 lsst-qserv-db26 lsst-qserv-db27 lsst-qserv-db28 lsst-qserv-db29 lsst-qserv-db30 lsst-qserv-master01 dedicated-
          Hide
          cbanek Christine Banek added a comment -

          Also here's a good link for how to do these things: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

          Show
          cbanek Christine Banek added a comment - Also here's a good link for how to do these things:  https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
          Hide
          cbanek Christine Banek added a comment -

          Okay that work is done, but now I'm trying to redeploy the dax services to have them scheduled elsewhere to make sure that works.  This has cropped up some stuck kubernetes issues that are taking some serious time.

          Show
          cbanek Christine Banek added a comment - Okay that work is done, but now I'm trying to redeploy the dax services to have them scheduled elsewhere to make sure that works.  This has cropped up some stuck kubernetes issues that are taking some serious time.
          Hide
          cbanek Christine Banek added a comment -

          By restarting kubelet on the head node, we were able to get deployments to work again, but this took a bit of time to figure out.  +2 SP.

          Show
          cbanek Christine Banek added a comment - By restarting kubelet on the head node, we were able to get deployments to work again, but this took a bit of time to figure out.  +2 SP.

            People

            Assignee:
            cbanek Christine Banek
            Reporter:
            fritzm Fritz Mueller
            Watchers:
            Christine Banek, Fritz Mueller, Vaikunth Thukral
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.