There are many kinds of data that are essentially user curated that go along with the obs packages: e.g. defect maps, linearity, cross talk coefficients, camera geometry, brighter fatter kernels, and potentially nominal master calibration frames. Currently, we store the canonical version of these data in the obs package, but that creates several problems as the obs packages also hold information like config and code overrides.
Our (Simon Krughoff and Tim Jenness) proposal is:
1) break out these data into a separate package per obs package
2) specify standardized file formats (these should be human comprehensible)
3) Provide utilities to transform the standardized files into a form accessible by the (gen2 and gen3) butler.
This RFC is for approval of the general proposal for dealing with data of this type. We present the following implementation as an example of concretely we would implement the above proposal with a specific type of data. The current gen2 system has FITS concepts baked into it, so we recognize that certain types of data that do not serialize well as FITS files, e.g. camera geometry, may need special cases for the gen2 case. However, we expect the general approach of storing the source of primacy in time stamped files in a special obs data repository should still apply.
In addition to the outlined process, we are also making four more concrete proposals as part of this RFC:
- Adoption of the yaml camera as implemented by obs_lsst as the standard source of primacy for camera geometry descriptions
- Adoption of the file format proposed below for representing defects in the obs data packages
- RFCs for additional standardized file formats will be handled on a case by case basis
- That the per sensor data is split out by sensor name as defined by the camera geometry description.
- The data files are named after the calibration date. This should be an ISO compliant string. We suggest second resolution as we certainly need resolution finer than a day for some products, but any ISO compliant string should work.
We have used defect masks as a starting point. As a concrete example, we have used the HSC defect masks in `obs_subaru`.
To facilitate this work, Tim Jenness has implemented a container class for defects: lsst.meas.algorithms.Defects. This class behaves like a list, but has useful factory methods for constructing lists of defects from various inputs including our proposed standardized file format.
We have broken the defects out of obs_subaru into a new package called obs_subaru_data. The proposed format is a directory structure of the style: <instrument>/<data_set>/<sensor_name>/<calib_date>.<extension>. For defects, you will see hsc/defects/0_00/20130131T000000.dat.
The proposal as state above is:
1) That the per sensor data is split out by sensor name as defined by the camera geometry description.
2) The data files are named after the calibration date. This should be an ISO compliant string. We suggest second resolution as we certainly need resolution finer than a day for some products, but any ISO compliant string should work.
The proposed file format is intended to be human editable and simple to understand. Specifically for defects, we propose a simple txt file with four columns. Each column is an integer with:
column 0: x index of the pixel in the lower left of the defect
column 1: y index of the pixel in the lower left of the defect
column 2: x extent of the defect (single pixel defects have extent 1)
column 3: y extent of the defect (single pixel defects have extent 1)
Pixel coordinates are zero indexed and always represented in IMAGE coordinates.
Finally, the implementation of the ingestion task for defects is on a ticket branch of pipe_tasks. This just reads the standard format, writes temporary files and ingests them using the standard IngestCalibsTask.