Fix Version/s: None
Epic Name:Upgrade to Interim Kubernetes
Upgrade interim Kubernetes service for development use. Per conversations with the SLAC and SQuaRE teams, this will involve investigating Kubernetes versions with desired capabilities and installing requested system software and services on PDAC nodes. Ongoing administration is not covered in this epic.
||2||Andrew Loftus [X] (Inactive)||Done|
||2||Andrew Loftus [X] (Inactive)||Done|
||3||Andrew Loftus [X] (Inactive)||Done|
||6||Andrew Loftus [X] (Inactive)||Done|
||3||Andrew Loftus [X] (Inactive)||Done|
||4||Andrew Loftus [X] (Inactive)||Done|
|Field||Original Value||New Value|
|Cycle||Spring 2018 [ 10806 ]|
|Priority||Undefined [ 10000 ]||Major [ 3 ]|
|Watchers||Jacob Rundall, Joel Plutchak [ Jacob Rundall, Joel Plutchak ]||Andrew Loftus, Bill Glick, Jacob Rundall, Joel Plutchak, Steve Pietrowicz [ Andrew Loftus, Bill Glick, Jacob Rundall, Joel Plutchak, Steve Pietrowicz ]|
|Assignee||Steve Pietrowicz [ spietrowicz ]|
|Description||Upgrade interim Kubernetes service for development use||Upgrade interim Kubernetes service for development use. Per conversations with the SLAC and SQuaRE teams, this will involve investigating Kubernetes versions with desired capabilities and installing requested system software and services on PDAC nodes. Ongoing administration is not covered in this epic.|
|Remote Link||This issue links to "Page (Confluence)" [ 16103 ]|
Can you paste in here the Kubernetes and Docker software versions that are configured in those modules?
Kubernetes version is 1.8.5-0
Docker is currently not version controlled so it will get updated to the latest every monthly maintenance.
Please let me know what version you want, if you need Docker to be locked to a specific version.
Kubernetes is tested against specific versions of Docker.
"Continuous integration builds use Docker versions 1.11.2, 1.12.6, 1.13.1, and 17.03.2. These versions were validated on Kubernetes 1.8." - from https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#external-dependencies
So it would be interesting to know whether the Docker version you are installing is one of those.
I am working on getting yum-versionlock in place (in the puppet infrastructure) so that Docker and Kubernetes versions will be guaranteed to remain at the specified version until explicitly changed.
Can you please post the current versions that you prefer for Kubernetes and for Docker.
I've asked Fritz Mueller to weigh in on this, as I suspect his application is more sensitive to this than ours.
Yes, we'll want to have this version locked as soon we have a "matched set" that seems to be working for us. My experience with Docker so far is that it evolves so rapidly that there can be a lot of tail-chasing if you let it float free (we'll want to be keeping up, but taking updates with intention.)
I know that Steve Pietrowicz was doing quite a bit of work in December to identify a constellation of versions that would meet our needs (thanks, Steve!) He should probably comment here on the results of his testing, and we could start with that?
I've been testing a couple of things. First, the install of Kubernetes (k8s) itself. I have a set of scripts set up so I can deploy on a new node pretty quickly (meaning, it does the yum updates, installs the necessary repos, and installs the kubernetes packages). The second part of this is to see how the install works in various environments. The two I've been testing against are Openstack (on Nebula) and on a set of VMs we have hosted outside of that cluster. The Openstack installs have been pretty straight-forward. I was able to test various version of k8s (and Docker) with various overlays, and was able to get this them to work. The VMs were a bit more difficult, because of a few different factors. I was able to resolve those today.
The main issues there were 1) how k8s deals with pre-existing firewall rules and 2) how it expects from DNS configurations. I won't go into the specifics here (I'm writing this up in greater detail tomorrow), but it wasn't very obvious from the k8s logs what was happening. Since this also involved using Weave (because of the multicast requirements), that also needed to be tested. I was able to get multicast send/receive to work from all six nodes I tested against, making sure that they executed from pods on those specific nodes. After this all worked, we wiped everything, set the correct firewall rules in place, I re-installed everything and re-ran all the multicast tests. This confirmed that what we were testing all worked.
The VM work was done with Docker 1.12.6, Kubernetes 1.9.2, and Weave 2.1.3. I had been planning on doing this same test with 1.8.5-0 starting tomorrow, on both OpenStack and on the VMs I have been using. At that point, you can decide what you'd like to use.
As Fritz said, the changes for everything happen pretty often, so it would be good to see if we can settle on something and stick with it a while, unless a real blocker comes up. The main thing I ran across from the system standpoint is that even with log messages from k8s, it can be pretty opaque to what is really going on. Many of the errors I was seeing have multiple causes and resolutions, and in the end, none of the solutions I was seeing on the web were what was happening to us.
We believe this testing will help make the install for Fritz a bit easier (especially now we have a better handle on the firewall and DNS stuff), and will help us longer term with the upcoming k8s install we're doing on the hardware that's being set up.
A bit more detailed version of what was going on during that debug is forthcoming.
I tested against 1.8.5, on both OpenStack and on the VMs I've been using. I ran tests using the Weave network overlay and multicast tools. Everything worked fine.
|Status||To Do [ 10001 ]||In Progress [ 3 ]|
Documentation on what was done is available here: https://dmtn-071.lsst.io
Main thing to watch out for will be firewall issues that might prevent a service from reaching its destination.
Fritz Mueller we can start the version of Kubernetes/Docker you are comfortable with. I've done Docker 1.12.6 with both Kubernetes 1.8.5 and 1.9.2
Awesome, THANK YOU Steve Pietrowicz! Let's go ahead and plan to get started with 1.9.2 as soon as the WISE validation wraps up on the PDAC (which I understand is real soon now).
ok! I sent a note to Bill Glick [X] and Andrew Loftus [X] to let them know.
Gregory Dubois-Felsmann has informed me that the WISE validation will be wrapping up by close-of-business on Wednesday, 2/14. So a rollout at any opportunity soon after that would be great – thanks very much!
Hi Fritz Mueller, I heard there was inquiry when this change will be applied. Please respond in this ticket when you are ready to have the changes applied. This is a routine puppet change that can be rolled out as soon as you are ready.
|Assignee||Steve Pietrowicz [ spietrowicz ]||Andrew Loftus [ aloftus ]|
Hi folks – please proceed any time on or after Thur, 2/15. Thanks much!
Thanks, I will comment here when it's done (hopefully Thr, but the planned maintenance takes precedence).
Puppet changes are rolled out. Kubernetes version is enforced at version 1.9.3-0
I'll run some tests on the elast nodes I've been using to see if 1.9.3 works properly. Some version in the past (notably, 1.7.1) were broken. I'll post an update here when I'm done testing.
Kubernetes 1.9.3 installed here late this afternoon. That all works. I ran the multicast test with the Weave 1.7 overlay and that worked fine too. Installed the dashboard, and that responded as well. I say "responded" because I did that via wget, and not the browser, since I'm on a VPN here, and wasn't able to test it fully. The VPN is something to keep in mind for the iptables rules for those systems. I don't know how they're set since I don't have access to them, but we don't want that exposed on the open internet. Doing the dashboard via a VPN would be a better choice, I think.
I'm seeing an issue here with 1.9.3 as the control plane and client, and 1.9.2 client with 1.9.3 control plane I'm trying to track this down, and retesting something for 1.9.2's control plane with 1.9.2 client to be sure. I'll update here.
This was an issue where there was a race condition between when the firewall rules were set by kubernetes and when additional rules were put into place by puppet. An additional rule for port 6443 had to be introduced. This was tested under all configs listed above and works fine.
Hello Andrew Loftus [X], we will need to add the line:
...to the top section of /etc/systemd/system/kubelet.service.d/10-kubeadm.conf on the pdac nodes, if this file is under puppet control?
Additionally, we will need the kubelet systemd service enabled and started on all the nodes if systemd services are under puppet control?
|Labels||Environment_and_Tools pdac||Environment_and_Tools FY18a pdac|
Kubernetes installation has been up and running. Initial configuration changes in place and stable.
|Resolution||Done [ 10000 ]|
|Status||In Progress [ 3 ]||Done [ 10002 ]|
Initial kubernetes puppet modules are ready to roll out on qserv nodes. Awaiting word from Fritz Mueller to "go ahead".