Status: In Review
Fix Version/s: None
Component/s: Design Documents
The Lossy Compression Working Group's work (
DM-11819, DMTN-068) suggests that it may be realistic and useful to maintain an on-disk archive of at least one data release's PVIs for all visits, together with the AP-derived PVIs for more recent data that's not been through a data release yet.
We now need a cost and design impact study for doing this, in order to be ready to process DM- and Project-level change requests to add this to the baseline.
This should be considered together with
DM-11880, the request to clarify whether we have already added resources for the storage of all raw image data on disk to the project baseline.
I plan to fold this data retention policy change into the impending refactored sizing model.
Here is a spread sheet for what keeping the PVI's for the current release on disk, and then also keep the PVI's of the current "year" as it's not in a release yet, but will be.
added a spread sheet for how much keeping 1 years of PVI's for a release on disk, and keeping the current "being built" release also on disk.
I stumbled across this still-open ticket, for which I am confusingly both assignee and reviewer. Michelle Butler [X], Kian-Tat Lim, what if anything still needs to be done to review and/or implement this?
To summarize the architectural considerations as I understand them at this point:
- We would define a new Butler dataset type for lossy-compressed PVIs, perhaps pvi-lossy. Much of the necessary support for triggering compression as a "formatter option" already exists in BG3.
- Conversion of PVIs from their lossless form to their lossy form (i.e., from calexp to pvi-lossy) could be performed as a straightforward 1:1 PipelineTask (which does nothing but copy its input to its output). This task could be appended to the pipeline that produces the PVIs themselves, in production, or it could be carried out as a separate "afterburner" pipeline.
- Implementation of the "limited cache lifetime" for (native) PVIs will require the addition of metadata to the Butler Gen3 Registry to ensure that it is clear that the PVI was originally created, was expired from cache, and can be recreated from provenance.
- The definition of virtual-data-product-recreation services is beyond the scope of RFC-325, but also needs to be done in order to produce a satisfactory overall design.
- Both lossy-compressed PVIs and "native" PVIs should appear - as separate entries - in the ObsCore and CAOM2 image metadata tables.
- DataLink should be used to make clear to API clients the connections between these data products and the availability of PVI-recreation services.
Discussed at today's SST meeting. RFC-911 filed.
Marked as "critical" because deciding on this is a blocker for adding image access performance requirements to LSE-61 (the DM System Requirements document).