Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11819

Lossy Compression Working Group

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Team:
      Data Facility

      Description

      Charter:
      This issue establishes a DM working group to evaluate and recommend options for lossy compression algorithms that compress LSST images in a way that continues to satisfy LSST science use cases (with dark energy being the primary one). All major types of images shall be considered as candidates for compression, including raw data, Processed Visit Images, co-adds, and templates.

      The working group is to:

      • define criteria for "science-usable" lossy-compressed processed images across all LSST image types
      • collect compression algorithm candidates (preferring existing in "off-the-shelf” tools/libraries)
      • evaluate their compression ratios (at "science-usable" quality)
      • evaluate constraints on processing that usage of compression may impose (e.g., avoidance of repeated re-compressions)
      • quantify the savings from application of lossy compression, in the context of the LSST Sizing Model (LDM-144)
      • make recommendations on which image types to lossy-compress, the algorithms to apply, and the description of processing constraints these would impose
        .

      Constraints and prior art:
      This group has emerged as the response to RFC-325 that recognized that user experience will be unacceptably impacted by the long latency required to access the LSST data from tape media. Unfortunately, preliminary analysis indicated that retaining all processed images on disk would be too costly and therefore not feasible, unless lossy compression is applied. The same analysis indicated that storing all raw data on disk (w/o lossy compression) is feasible.

      The LSST has traditionally avoided lossy compression for any of its image data products (including the large co-added images as well as templates retained for each data release). Anecdotal experience from DES and other surveys indicates that lossy compression can be applied, without loss of scientific fidelity. If this is the case, the reduced disk space needs may enable us to retain on low-latency media more data that we otherwise would (rather than regenerate or pull from tape. This group has been convened to study the problem and report on the results.

      The working group should rely as much as possible on prior art found in the literature, and prefer applications of off-the-shelf solution rather than developing custom LSST-specific compression tools.

      Deliverables:
      The deliverable of this group will be a technical report recommending a scientifically acceptable lossy compression strategy, with a quantification of its impact on the sizing model (see the list under “Charter” for details).

      Deadlines:
      This group shall complete its work by October 31th, 2017, with fortnightly status updates to the LSST DM Subsystem Scientist and Manager.

      Membership (tentative, except for the chair):

      If you (named above) cannot participate, let Robert Gruendl know. If you're interested in participating in this WG, please notify Robert Gruendl as well. For any questions about the WG, post a comment here.

        Attachments

          Issue Links

            Activity

            Hide
            zivezic Zeljko Ivezic added a comment -

            If the difference between A and B measurements is not negligible, we should do A.
            If the difference between A and B measurements is negligible, we can do A or B.
            Therefore, we should do A (which is what I thought we were considering doing - I think
            I read that in some doc).
            OK, there may be some scientifically interesting gray zone between "negligible"
            and "not negligible" but I cannot quickly think of any.

            Another quick point from today's PST: it is possible that we'll (we'd ?) switch from
            the observing strategy with 2x15 sec exposures per visit to 1x30 sec, and thus
            have only one half of the original image data volume.

            Show
            zivezic Zeljko Ivezic added a comment - If the difference between A and B measurements is not negligible, we should do A. If the difference between A and B measurements is negligible, we can do A or B. Therefore, we should do A (which is what I thought we were considering doing - I think I read that in some doc). OK, there may be some scientifically interesting gray zone between "negligible" and "not negligible" but I cannot quickly think of any. Another quick point from today's PST: it is possible that we'll (we'd ?) switch from the observing strategy with 2x15 sec exposures per visit to 1x30 sec, and thus have only one half of the original image data volume.
            Hide
            mjuric Mario Juric added a comment -

            There's a counter-argument to be made: if there's no difference in inferred results between compressed/non-compressed images (which is the goal), then scenario B is preferred as it makes the catalogs and images consistent.

            There's value in consistency; otherwise we may end up having to educate users (via the helpdesk) as to why (for example) the aperture mag they measure in a Jupyter notebook is different from the one quoted in the catalog.

            Show
            mjuric Mario Juric added a comment - There's a counter-argument to be made: if there's no difference in inferred results between compressed/non-compressed images (which is the goal), then scenario B is preferred as it makes the catalogs and images consistent. There's value in consistency; otherwise we may end up having to educate users (via the helpdesk) as to why (for example) the aperture mag they measure in a Jupyter notebook is different from the one quoted in the catalog.
            Hide
            womullan Wil O'Mullane added a comment -

            I consider https://github.com/lsst/LDM-582/blob/master/LDM-582.pdf
            to be accepted and offiicial - Tim Jenness it should go in docushare I already removed the draft from it.

            Show
            womullan Wil O'Mullane added a comment - I consider https://github.com/lsst/LDM-582/blob/master/LDM-582.pdf to be accepted and offiicial - Tim Jenness it should go in docushare I already removed the draft from it.
            Hide
            tjenness Tim Jenness added a comment -

            Wil O'Mullane can you say this on the RFC please?

            Show
            tjenness Tim Jenness added a comment - Wil O'Mullane can you say this on the RFC please?
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            The report from the Working Group is available at DMTN-068.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - The report from the Working Group is available at DMTN-068 .

              People

              • Assignee:
                gruendl Robert Gruendl
                Reporter:
                mjuric Mario Juric
                Watchers:
                Ben Emmons [X] (Inactive), Gregory Dubois-Felsmann, John Parejko, John Swinbank, Leanne Guy, Mario Juric, Paul Price, Pim Schellart [X] (Inactive), Robert Gruendl, Tim Jenness, Wil O'Mullane, Zeljko Ivezic
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved:

                  Summary Panel