I've been testing a couple of things. First, the install of Kubernetes (k8s) itself. I have a set of scripts set up so I can deploy on a new node pretty quickly (meaning, it does the yum updates, installs the necessary repos, and installs the kubernetes packages). The second part of this is to see how the install works in various environments. The two I've been testing against are Openstack (on Nebula) and on a set of VMs we have hosted outside of that cluster. The Openstack installs have been pretty straight-forward. I was able to test various version of k8s (and Docker) with various overlays, and was able to get this them to work. The VMs were a bit more difficult, because of a few different factors. I was able to resolve those today.
The main issues there were 1) how k8s deals with pre-existing firewall rules and 2) how it expects from DNS configurations. I won't go into the specifics here (I'm writing this up in greater detail tomorrow), but it wasn't very obvious from the k8s logs what was happening. Since this also involved using Weave (because of the multicast requirements), that also needed to be tested. I was able to get multicast send/receive to work from all six nodes I tested against, making sure that they executed from pods on those specific nodes. After this all worked, we wiped everything, set the correct firewall rules in place, I re-installed everything and re-ran all the multicast tests. This confirmed that what we were testing all worked.
The VM work was done with Docker 1.12.6, Kubernetes 1.9.2, and Weave 2.1.3. I had been planning on doing this same test with 1.8.5-0 starting tomorrow, on both OpenStack and on the VMs I have been using. At that point, you can decide what you'd like to use.
As Fritz said, the changes for everything happen pretty often, so it would be good to see if we can settle on something and stick with it a while, unless a real blocker comes up. The main thing I ran across from the system standpoint is that even with log messages from k8s, it can be pretty opaque to what is really going on. Many of the errors I was seeing have multiple causes and resolutions, and in the end, none of the solutions I was seeing on the web were what was happening to us.
We believe this testing will help make the install for Fritz a bit easier (especially now we have a better handle on the firewall and DNS stuff), and will help us longer term with the upcoming k8s install we're doing on the hardware that's being set up.
A bit more detailed version of what was going on during that debug is forthcoming.
Initial kubernetes puppet modules are ready to roll out on qserv nodes. Awaiting word from Fritz Mueller to "go ahead".