Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15522

Google Cloud Proof of Concept

    Details

    • Epic Name:
      Google Cloud Proof of Concept
    • Team:
      System Management

      Description

      Following discussions from LSST Project and Community Workshop 2018 in Tucson, AZ, this "Super-epic" is part of the planning process and landing page for all work related to potential proofs of concept for cloud deployment of the Data Management System (https://dmtn-078.lsst.io

      I intend on having this epic as the go-to point for all LSST and Google personnel to add their thoughts, questions, concerns, ideas and management planning in the comment section. Any actual work-planning will go into topic specific epics that will be added to the "relates-to" section of this epic as that effort develops and is assigned.

      Logistics:

      • The main points of contact for Google and LSST respectively are J Ross Thomson and Vaikunth Thukral
      • First official meeting is set for Tuesday Sep 4th at 10am PST at ls.st/hhb (zoom)
      • The plan is to have bi-weekly meetings for all interested parties in the same timeslot – every other Tue at 10am PST
      • Work planning for Google will be added as stories within the "relates-to" epics and assigned to Ross, who will bring in relevant personnel from their Professional Services Organization (PSO) depending on the type of work
      • Work planning for LSST personnel, when required, will go under their respective team's epics/WBS. For example, for the DAX team, since we are at the 3 month mid-cycle point we will be creating a "Google PoC Support" epic
      • All relevant documentation should be linked in the comments in this epic

        Attachments

          Issue Links

          Stories in Epic (Custom Issue Matrix)

            Activity

            Hide
            vaikunth Vaikunth Thukral added a comment -

            As the first step, a Jira account for Ross (https://jira.lsstcorp.org/browse/IHS-1273) has been requested. 

            In addition, cloud organization accounts for managing the billing for cloud services are being resolved by the SQuaRE team.

            Show
            vaikunth Vaikunth Thukral added a comment - As the first step, a Jira account for Ross ( https://jira.lsstcorp.org/browse/IHS-1273 ) has been requested.  In addition, cloud organization accounts for managing the billing for cloud services are being resolved by the SQuaRE team.
            Hide
            vaikunth Vaikunth Thukral added a comment - - edited

            General work planning structure for the Qserv part will follow the points listed out in the PoC documentation as follows:

            1. Deploy Qserv on Google Cloud with local-attached disks.
            2. Load KPM30, WISE, and HSC datasets into Qserv.
            3. Execute queries from DMTR-52 and DMTR-16 against datasets.
            4. Adjust deployment to use network-attached disks.
            5. Re-execute queries against datasets.
            6. Select one or more Google database services (Cloud SQL, Datastore, Bigtable, etc.) that best matches data and queries; load data into the service.
            7. Re-execute queries (if possible) against database service.
            8. Analyze the potential for solving any query supportability issues (e.g. spherical geometry, near-neighbor queries, simultaneous users).
            9. Analyze the potential for "next-to-database" processing (DMTN-086) with data in Google database service.

            General work plan for Prompt Processing PoC:

            1. Perform high-bandwidth transmission of CCD images from machines at NCSA (e.g. in the Level 1 Test Stand) to Google Cloud storage.
            2. Stand up an instance of the Prompt Products Database. This currently is anticipated to run in Oracle; we could attempt to substitute a different technology here.
            3. Load master calibration images and templates into Google Cloud storage.
            4.  Deploy an instance of the Alert Distribution and Filtering service to Google Cloud.
            5. Deploy a set of simulated alert clients both within and outside Google Cloud.
            6. Execute Alert Production in batch mode on Google Cloud machines, sending alerts to the distribution service.
            7. Investigate the possibility of sharding the Prompt Products Database.
            8. Analyze the applicability of Google workflow/orchestration services (Cloud Composer, Cloud Dataflow) for executing Alert Production.

            General work plan for LSST Science Platform (LSP) PoC:

            1. Determine an appropriate configuration to allow JupyterHub at NCSA (and Chile ?) to create pods both locally and remotely in Google Cloud.
            2. Determine how authentication and authorization can be synchronized between NCSA and Google Cloud.
            3. Determine how user files and databases can be shared between NCSA and Google Cloud. VOSpace and/or WebDAV are potential mechanisms here.
            4. Determine whether LSST data products need to be permanently resident in Google Cloud or if Web APIs are sufficient.

             

            Show
            vaikunth Vaikunth Thukral added a comment - - edited General work planning structure for the Qserv part will follow the points listed out in the PoC documentation as follows: Deploy Qserv on Google Cloud with local-attached disks. Load KPM30, WISE, and HSC datasets into Qserv. Execute queries from DMTR-52 and DMTR-16 against datasets. Adjust deployment to use network-attached disks. Re-execute queries against datasets. Select one or more Google database services (Cloud SQL, Datastore, Bigtable, etc.) that best matches data and queries; load data into the service. Re-execute queries (if possible) against database service. Analyze the potential for solving any query supportability issues (e.g. spherical geometry, near-neighbor queries, simultaneous users). Analyze the potential for "next-to-database" processing ( DMTN-086 ) with data in Google database service. General work plan for Prompt Processing PoC: Perform high-bandwidth transmission of CCD images from machines at NCSA (e.g. in the Level 1 Test Stand) to Google Cloud storage. Stand up an instance of the Prompt Products Database. This currently is anticipated to run in Oracle; we could attempt to substitute a different technology here. Load master calibration images and templates into Google Cloud storage.  Deploy an instance of the Alert Distribution and Filtering service to Google Cloud. Deploy a set of simulated alert clients both within and outside Google Cloud. Execute Alert Production in batch mode on Google Cloud machines, sending alerts to the distribution service. Investigate the possibility of sharding the Prompt Products Database. Analyze the applicability of Google workflow/orchestration services (Cloud Composer, Cloud Dataflow) for executing Alert Production. General work plan for LSST Science Platform (LSP) PoC: Determine an appropriate configuration to allow JupyterHub at NCSA (and Chile ?) to create pods both locally and remotely in Google Cloud. Determine how authentication and authorization can be synchronized between NCSA and Google Cloud. Determine how user files and databases can be shared between NCSA and Google Cloud. VOSpace and/or WebDAV are potential mechanisms here. Determine whether LSST data products need to be permanently resident in Google Cloud or if Web APIs are sufficient.  
            Hide
            swinbank John Swinbank added a comment -

            It'd be helpful to describe what — if any! — work is being carried out here that is not related to Qserv. The “component” field in the ticket, and the text in DMTN-078, refer to Alert Production, for example — is there a proof of concept going ahead there? Is this epic supposed to cover it? 

            Show
            swinbank John Swinbank added a comment - It'd be helpful to describe what — if any! — work is being carried out here that is not related to Qserv. The “component” field in the ticket, and the text in DMTN-078, refer to Alert Production, for example — is there a proof of concept going ahead there? Is this epic supposed to cover it? 
            Hide
            vaikunth Vaikunth Thukral added a comment - - edited

            John Swinbank, the proposed work as mentioned on DMTN-078 does cover those topics, and is planned as part of the proof of concept if time and resources permit. I intend on covering them (as a non-expert) as we move along with the Qserv starting point as the path gets clearer, and within this super-epic with a plan guided by what is laid out in DMTN-078.

            For plannings sake I can make concurrent "bucket epics" for these aspects of the PoC, but I hesitate as of yet since I lack the knowledge to do it accurately. If this is ok with you (and others), I would like to make epics for that work when we get closer to doing it since the general plan is already laid out in the documentation. I'll at least add more details here and update the language to reflect the same.

            Show
            vaikunth Vaikunth Thukral added a comment - - edited John Swinbank , the proposed work as mentioned on DMTN-078 does cover those topics, and is planned as part of the proof of concept if time and resources permit. I intend on covering them (as a non-expert) as we move along with the Qserv starting point as the path gets clearer, and within this super-epic with a plan guided by what is laid out in DMTN-078. For plannings sake I can make concurrent "bucket epics" for these aspects of the PoC, but I hesitate as of yet since I lack the knowledge to do it accurately. If this is ok with you (and others), I would like to make epics for that work when we get closer to doing it since the general plan is already laid out in the documentation. I'll at least add more details here and update the language to reflect the same.
            Hide
            vaikunth Vaikunth Thukral added a comment -

            Meeting with John Swinbank for Alert Production aspect of the PoC on Oct 16th, decided to revisit again mid November, and plan on Andy Salnikov to work on PPDB testing in the cloud in the meantime.

            Show
            vaikunth Vaikunth Thukral added a comment - Meeting with John Swinbank for Alert Production aspect of the PoC on Oct 16th, decided to revisit again mid November, and plan on Andy Salnikov to work on PPDB testing in the cloud in the meantime.

              People

              • Assignee:
                Unassigned
                Reporter:
                vaikunth Vaikunth Thukral
                Watchers:
                Colin Slater, Fritz Mueller, Jacob Rundall, Jim Ballentine, John Swinbank, J Ross Thomson, Kian-Tat Lim, Leanne Guy, Margaret Gelman, Matthew Thomas Long, Michelle Butler, Steve Pietrowicz, Vaikunth Thukral, Wil O'Mullane
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Start date:
                  End date:

                  Summary Panel