Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15059

Investigate and report on the impact to the DM design to keep lossy-compressed PVIs on disk

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: In Review
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Design Documents
    • Labels:
      None
    • Team:
      Data Facility

      Description

      The Lossy Compression Working Group's work (DM-11819, DMTN-068) suggests that it may be realistic and useful to maintain an on-disk archive of at least one data release's PVIs for all visits, together with the AP-derived PVIs for more recent data that's not been through a data release yet.

      We now need a cost and design impact study for doing this, in order to be ready to process DM- and Project-level change requests to add this to the baseline.

      This should be considered together with DM-11880, the request to clarify whether we have already added resources for the storage of all raw image data on disk to the project baseline.

        Attachments

          Issue Links

            Activity

            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Marked as "critical" because deciding on this is a blocker for adding image access performance requirements to LSE-61 (the DM System Requirements document).

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Marked as "critical" because deciding on this is a blocker for adding image access performance requirements to LSE-61 (the DM System Requirements document).
            Hide
            mgelman2 Margaret Gelman added a comment -

            I plan to fold this data retention policy change into the impending refactored sizing model.

            Show
            mgelman2 Margaret Gelman added a comment - I plan to fold this data retention policy change into the impending refactored sizing model.
            Hide
            mbutler Michelle Butler [X] (Inactive) added a comment -

            Here is a spread sheet for what keeping the PVI's for the current release on disk, and then also keep the PVI's of the current "year" as it's not in a release yet, but will be.   

            https://docs.google.com/spreadsheets/d/1M-5xwaZNeIs21Z5UN3T9w2S-EvLvdCbqDxe8XLMtkqQ/edit?usp=sharing

            Show
            mbutler Michelle Butler [X] (Inactive) added a comment - Here is a spread sheet for what keeping the PVI's for the current release on disk, and then also keep the PVI's of the current "year" as it's not in a release yet, but will be.    https://docs.google.com/spreadsheets/d/1M-5xwaZNeIs21Z5UN3T9w2S-EvLvdCbqDxe8XLMtkqQ/edit?usp=sharing
            Hide
            mbutler Michelle Butler [X] (Inactive) added a comment -

            added a spread sheet for how much keeping 1 years of PVI's for a release on disk, and keeping the current "being built" release also on disk.    

            FYI 

            M. 

             

            Show
            mbutler Michelle Butler [X] (Inactive) added a comment - added a spread sheet for how much keeping 1 years of PVI's for a release on disk, and keeping the current "being built" release also on disk.     FYI  M.   
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            I stumbled across this still-open ticket, for which I am confusingly both assignee and reviewer. Michelle Butler [X], Kian-Tat Lim, what if anything still needs to be done to review and/or implement this?

            Show
            gpdf Gregory Dubois-Felsmann added a comment - I stumbled across this still-open ticket, for which I am confusingly both assignee and reviewer. Michelle Butler [X] , Kian-Tat Lim , what if anything still needs to be done to review and/or implement this?
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            To summarize the architectural considerations as I understand them at this point:

            • We would define a new Butler dataset type for lossy-compressed PVIs, perhaps pvi-lossy.  Much of the necessary support for triggering compression as a "formatter option" already exists in BG3.
            • Conversion of PVIs from their lossless form to their lossy form (i.e., from calexp to pvi-lossy) could be performed as a straightforward 1:1 PipelineTask (which does nothing but copy its input to its output).  This task could be appended to the pipeline that produces the PVIs themselves, in production, or it could be carried out as a separate "afterburner" pipeline.
            • Implementation of the "limited cache lifetime" for (native) PVIs will require the addition of metadata to the Butler Gen3 Registry to ensure that it is clear that the PVI was originally created, was expired from cache, and can be recreated from provenance.
            • The definition of virtual-data-product-recreation services is beyond the scope of RFC-325, but also needs to be done in order to produce a satisfactory overall design.
            • Both lossy-compressed PVIs and "native" PVIs should appear - as separate entries - in the ObsCore and CAOM2 image metadata tables.  
            • DataLink should be used to make clear to API clients the connections between these data products and the availability of PVI-recreation services.
            Show
            gpdf Gregory Dubois-Felsmann added a comment - To summarize the architectural considerations as I understand them at this point: We would define a new Butler dataset type for lossy-compressed PVIs, perhaps pvi-lossy .  Much of the necessary support for triggering compression as a "formatter option" already exists in BG3. Conversion of PVIs from their lossless form to their lossy form (i.e., from calexp to pvi-lossy ) could be performed as a straightforward 1:1 PipelineTask (which does nothing but copy its input to its output).  This task could be appended to the pipeline that produces the PVIs themselves, in production, or it could be carried out as a separate "afterburner" pipeline. Implementation of the "limited cache lifetime" for (native) PVIs will require the addition of metadata to the Butler Gen3 Registry to ensure that it is clear that the PVI was originally created, was expired from cache, and can be recreated from provenance. The definition of virtual-data-product-recreation services is beyond the scope of RFC-325 , but also needs to be done in order to produce a satisfactory overall design. Both lossy-compressed PVIs and "native" PVIs should appear - as separate entries - in the ObsCore and CAOM2 image metadata tables.   DataLink should be used to make clear to API clients the connections between these data products and the availability of PVI-recreation services.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Discussed at today's SST meeting. RFC-911 filed.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Discussed at today's SST meeting . RFC-911 filed.

              People

              Assignee:
              gpdf Gregory Dubois-Felsmann
              Reporter:
              gpdf Gregory Dubois-Felsmann
              Reviewers:
              Gregory Dubois-Felsmann
              Watchers:
              Fritz Mueller, Gregory Dubois-Felsmann, Kian-Tat Lim, Margaret Gelman, Michelle Butler [X] (Inactive), Michelle Gower, Tim Jenness, Wil O'Mullane
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.