Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24688

Deploy the EFD on the Kueyen cluster at the base

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We need another instance of the EFD at the base to record data from ComCam while the Summit is shutdown.

      An EFD at the base was in the plans anyway and will serve as an intermediate step in replicating data from the Summit to LDF, and as a natural backup of the Summit EFD.

      SQR-034 needs to be updated to reflect retention policies for the Base EFD, replication configuration, etc.

        Attachments

          Issue Links

            Activity

            Hide
            afausti Angelo Fausti added a comment - - edited

            Deployment configuration for the base EFD was added to the argocd-efd repository and applied to the Kueyen cluster. The experience with Rancher was pretty slick. I was able to get a kubeconfig from the UI and discover the Storage Classes available, it also provides useful information about the cluster health https://rancher.ls.lsst.org/. Thanks Joshua Hoblitt for putting this together I think it complements very well the Argo CD UI https://kueyen.lsst.codes/argo-cd/applications

            Secrets for this deployment were added to Vault and service names created at route53, authentication for Chronograf and Control Center is currently done using GH OAuth.

            SQR-034 was updated with the URLs for the services:

            Chronograf: https://chronograf-base-efd.lsst.codes

            InfluxDB HTTP API: https://influxdb-base-efd.lsst.codes

            Kafka Control Center: https://control-center-base-efd.lsst.codes

            Kafka Schema Registry: https://schema-registry-base-efd.lsst.codes

            Kafka Broker: cp-helm-charts-cp-kafka-headless.cp-helm-charts:9092

            Show
            afausti Angelo Fausti added a comment - - edited Deployment configuration for the base EFD was added to the argocd-efd repository and applied to the Kueyen cluster. The experience with Rancher was pretty slick. I was able to get a kubeconfig from the UI and discover the Storage Classes available, it also provides useful information about the cluster health https://rancher.ls.lsst.org/ . Thanks Joshua Hoblitt for putting this together I think it complements very well the Argo CD UI https://kueyen.lsst.codes/argo-cd/applications Secrets for this deployment were added to Vault and service names created at route53, authentication for Chronograf and Control Center is currently done using GH OAuth. SQR-034 was updated with the URLs for the services: Chronograf: https://chronograf-base-efd.lsst.codes InfluxDB HTTP API: https://influxdb-base-efd.lsst.codes Kafka Control Center: https://control-center-base-efd.lsst.codes Kafka Schema Registry: https://schema-registry-base-efd.lsst.codes Kafka Broker: cp-helm-charts-cp-kafka-headless.cp-helm-charts:9092
            Hide
            afausti Angelo Fausti added a comment -

            Summit EFD dump restored (116G in 2h30min)

            bash-4.4# influxd restore -portable -db efd ./summit-efd-2020-03-19.influx
            2020/05/01 01:27:20 Restoring shard 87 live from backup 20200319T161138Z.s87.tar.gz
            2020/05/01 01:27:40 Restoring shard 127 live from backup 20200319T161138Z.s127.tar.gz
            2020/05/01 01:46:42 Restoring shard 147 live from backup 20200319T161138Z.s147.tar.gz
            2020/05/01 01:48:19 Restoring shard 57 live from backup 20200319T161138Z.s57.tar.gz
            2020/05/01 01:53:43 Restoring shard 67 live from backup 20200319T161138Z.s67.tar.gz
            2020/05/01 02:03:56 Restoring shard 107 live from backup 20200319T161138Z.s107.tar.gz
            2020/05/01 02:11:30 Restoring shard 49 live from backup 20200319T161138Z.s49.tar.gz
            2020/05/01 02:13:16 Restoring shard 21 live from backup 20200319T161138Z.s21.tar.gz
            2020/05/01 02:15:03 Restoring shard 13 live from backup 20200319T161138Z.s13.tar.gz
            2020/05/01 02:15:22 Restoring shard 77 live from backup 20200319T161138Z.s77.tar.gz
            2020/05/01 02:21:45 Restoring shard 97 live from backup 20200319T161138Z.s97.tar.gz
            2020/05/01 02:24:29 Restoring shard 50 live from backup 20200319T161138Z.s50.tar.gz
            2020/05/01 02:33:16 Restoring shard 5 live from backup 20200319T161138Z.s5.tar.gz
            2020/05/01 02:33:34 Restoring shard 117 live from backup 20200319T161138Z.s117.tar.gz
            2020/05/01 03:03:18 Restoring shard 138 live from backup 20200319T161138Z.s138.tar.gz
            

            Commands used:

            tar cpvf - summit-efd-2020-03-19.influx/ | split -d -b 1G - efd
            

            [afausti@bastion1 summit-efd]$ export KUBECONFIG=~/.kube/config-kueyen.yaml
            [afausti@bastion1 summit-efd]$ for file in $(ls efd*); do kubectl -n influxdb cp $file influxdb-0:/tmp/; done
            bash-4.4# cd /tmp
            bash-4.4# cat efd* | tar xpvf -
            bash-4.4# influxd restore -portable -db efd ./summit-efd-2020-03-19.influx
            

            Show
            afausti Angelo Fausti added a comment - Summit EFD dump restored (116G in 2h30min) bash-4.4# influxd restore -portable -db efd ./summit-efd-2020-03-19.influx 2020/05/01 01:27:20 Restoring shard 87 live from backup 20200319T161138Z.s87.tar.gz 2020/05/01 01:27:40 Restoring shard 127 live from backup 20200319T161138Z.s127.tar.gz 2020/05/01 01:46:42 Restoring shard 147 live from backup 20200319T161138Z.s147.tar.gz 2020/05/01 01:48:19 Restoring shard 57 live from backup 20200319T161138Z.s57.tar.gz 2020/05/01 01:53:43 Restoring shard 67 live from backup 20200319T161138Z.s67.tar.gz 2020/05/01 02:03:56 Restoring shard 107 live from backup 20200319T161138Z.s107.tar.gz 2020/05/01 02:11:30 Restoring shard 49 live from backup 20200319T161138Z.s49.tar.gz 2020/05/01 02:13:16 Restoring shard 21 live from backup 20200319T161138Z.s21.tar.gz 2020/05/01 02:15:03 Restoring shard 13 live from backup 20200319T161138Z.s13.tar.gz 2020/05/01 02:15:22 Restoring shard 77 live from backup 20200319T161138Z.s77.tar.gz 2020/05/01 02:21:45 Restoring shard 97 live from backup 20200319T161138Z.s97.tar.gz 2020/05/01 02:24:29 Restoring shard 50 live from backup 20200319T161138Z.s50.tar.gz 2020/05/01 02:33:16 Restoring shard 5 live from backup 20200319T161138Z.s5.tar.gz 2020/05/01 02:33:34 Restoring shard 117 live from backup 20200319T161138Z.s117.tar.gz 2020/05/01 03:03:18 Restoring shard 138 live from backup 20200319T161138Z.s138.tar.gz Commands used: tar cpvf - summit-efd-2020-03-19.influx/ | split -d -b 1G - efd [afausti@bastion1 summit-efd]$ export KUBECONFIG=~/.kube/config-kueyen.yaml [afausti@bastion1 summit-efd]$ for file in $(ls efd*); do kubectl -n influxdb cp $file influxdb-0:/tmp/; done bash-4.4# cd /tmp bash-4.4# cat efd* | tar xpvf - bash-4.4# influxd restore -portable -db efd ./summit-efd-2020-03-19.influx
            Hide
            afausti Angelo Fausti added a comment -

            As explained in IT-2212 I had some issues copying the dump to the InfluxDB pod, there must be a better wait to restore the dump, like using influxd -host option.

            Show
            afausti Angelo Fausti added a comment - As explained in IT-2212 I had some issues copying the dump to the InfluxDB pod, there must be a better wait to restore the dump, like using influxd -host option.
            Hide
            afausti Angelo Fausti added a comment -

            Information for this deployment was sent to INRIA and documented at https://sqr-034.lsst.io/#base-efd

             

             

            Show
            afausti Angelo Fausti added a comment - Information for this deployment was sent to INRIA and documented at  https://sqr-034.lsst.io/#base-efd    

              People

              • Assignee:
                afausti Angelo Fausti
                Reporter:
                afausti Angelo Fausti
                Watchers:
                Angelo Fausti
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel