Status: To Do
Fix Version/s: None
Team:Data Access and Database
Some of our datasets, particularly exposures, have many subcomponents, such as WCSs, PSFs, and even bounding boxes. So far, we've implemented this by considering these to be "pieces" of an exposure to be extracted without loading the full exposure. The current pipeline flow is starting to demonstrate some of the shortcomings of this approach, however:
The WCS of an exposure can also be logically considered to be part of other datasets, such as the source table derived from that exposure. (The same is true of the PSF, photometric calibration, background, etc., but I'll use WCS as my example here).
Ubercal creates a new WCS to be associated with each exposure, and while these may be stored separately on disk, we'd like to be able to create a new dataset that would combine the calexp pixel data with the ubercal WCS etc. (See also Dataset Flavors and Versioning, below).
Some complex serialized objects, such as CoaddPsf themselves contain many other objects, in this case the Psfs and Wcss of the exposures that went into a coadd. Considering that a the CoaddPsf of a neighboring patch will contain many of the same Psfs and Wcss, it's highly desirable to store each of these only once within a data repo, and moreover, to be able to use an existing in-memory object instead of re-reading it from disk and creating a duplicate.
Possible implementations include:
- links embedded in fits files
- associations recorded in the registry
- other sym link system