Maintenance starts Nov 17 2:30pm MST
0. Announce maintenance window on #summit-announce and #com-square-support one hour in advance
1. Stop producers at the Summit
2. Run Summit backups at NCSA
export KUBECONFIG=$HOME/.kube/config-summit
|
while true; do kubectl port-forward service/influxdb -n influxdb 8088:8088; echo "Restarting..."; done
|
|
backup-chronograf.sh summit
|
backup-kapacitor.sh summit
|
Backup current InfluxDB shard at the Summit, shard 718 starting 2021-11-15T00:00:00Z
influxd backup -portable -database efd -host 127.0.0.1:8088 -start 2021-11-15T00:00:00Z -end 2021-11-18T00:00:00Z summit-efd-2021-11-17.influx
|
3. Pause connectors at LDF
- replicator
- Influxdb-sink
- s3 Sink
4. Summit EFD server upgrades
4.1 Stop runningk3s efd container
4.2 Upgrade docker to 20.10.10
I had to stop the puppet agent and remove the yum locks on docker-ce* to upgrade docker.
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
|
|
sudo yum upgrade docker-ce docker-ce-cli containerd.io
|
4.3 Upgrade kubectl to 1.22.3
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
|
4.4 Install latest K3d
k3d version v5.1.0
|
k3s version v1.21.5-k3s2 (default)
|
|
curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
|
4.5 Create EFD cluster
export HOST_PATH=/data
|
export CONTAINER_PATH=/var/lib/rancher/k3s/storage/
|
|
sudo /usr/local/bin/k3d cluster create efd --network host --no-lb -v ${HOST_PATH}:${CONTAINER_PATH} --k3s-arg "--disable=traefik"
|
|
sudo /usr/local/bin/k3d kubeconfig get efd > k3s.yaml
|
export KUBECONFIG=$(pwd)/k3s.yaml
|
4.6 Install latest Argo CD
kubectl create namespace argocd
|
|
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
|
4.7 Install latest Argo CD client
curl -sSL -o bin/argocd https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
|
chmod +x bin/argocd
|
5. EFD deployment
5.1 Create EFD parent app
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
|
|
kubectl port-forward svc/argocd-server -n argocd 8080:443
|
|
argocd login --insecure localhost:8080
|
|
argocd app create efd --dest-namespace argocd --dest-server https://kubernetes.default.svc --repo https://github.com/lsst-sqre/argocd-efd.git --path apps/efd --helm-set env=summit
|
|
argocd app sync efd
|
5.2 Sync vault-secrets-operator
argocd app sync vault-secrets-operator
|
|
export VAULT_TOKEN=<read vault token for the summit>
|
export VAULT_TOKEN_LEASE_DURATION=86400
|
|
kubectl create secret generic vault-secrets-operator --from-literal=VAULT_TOKEN=$VAULT_TOKEN --from-literal=VAULT_TOKEN_LEASE_DURATION=$VAULT_TOKEN_LEASE_DURATION --namespace vault-secrets-operator
|
5.3 Sync ingress-nginx
argocd app sync nginx-ingress
|
5.4 Sync remaining apps
5.5 Argo CD TLS
# Create tls-certs for argocd
|
|
cat << EOF | kubectl apply -f -
|
apiVersion: ricoberger.de/v1alpha1
|
kind: VaultSecret
|
metadata:
|
name: tls-certs
|
namespace: argocd
|
spec:
|
path: secret/k8s_operator/summit-lsp.lsst.codes/efd/tls-certs
|
type: Opaque
|
EOF
|
6 Migrate EFD volumes to new deployment
6.1 OLD volumes
kubectl get pv
|
|
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
|
pvc-0ce8b1a0-3363-11eb-be16-d09466627ab2 8Gi RWO Retain Bound kapacitor/kapacitor-kapacitor local-path 351d
|
|
pvc-4f67178f-3362-11eb-be16-d09466627ab2 15Gi RWO Retain Bound influxdb/influxdb-data-influxdb-0 local-path 351d
|
|
pvc-f0bdff10-3362-11eb-be16-d09466627ab2 8Gi RWO Retain Bound chronograf/chronograf-chronograf local-path 351d
|
6.2 NEW volumes
6.3 Change reclaim policy of new volumes to Retain
kubectl patch pv <new pv> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
6.4 Chronograf data
sudo mv /data/pvc-f0bdff10-3362-11eb-be16-d09466627ab2_chronograf_chronograf-chronograf/* /data/<new pv>/
|
Restart Chronograf pod
6.5 Kapacitor
sudo mv /data/pvc-0ce8b1a0-3363-11eb-be16-d09466627ab2_kapacitor_kapacitor-kapacitor/* /data/<new pv>/
|
Restart Kapacitor pod
6.6 InfluxDB
sudo mv /data/pvc-4f67178f-3362-11eb-be16-d09466627ab2_influxdb_influxdb-data-influxdb-0/* /data/<new pv>/
|
Restart InfluxDB pod
7. Resume producers at the Summit
8. Resume connectors at LDF
Cristián Silva my understanding is that the EFD will continue running on efd-temp-k3s.cp.lsst.org at the Summit until we get more resources for the Andes cluster to migrate it over, see
DM-29576.In this case, I would like to upgrade k3s on efd-temp-k3s.cp.lsst.org since the version running there is quite old. I did install k3s manually in the past, but perhaps we should automate that part? should I talk to Heinrich or Josh to help on this? Any thoughts?