Details
-
Type:
RFC
-
Status: Implemented
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
Currently, when output files are written from tasks, for example calibration products, the output headers are constructed without anything explicitly coming from the input files. For example the header for a processed bias file looks something like:
OBSTYPE = 'bias '
|
HIERARCH CALIB_CREATION_DATE = '2016-03-28'
|
HIERARCH CALIB_CREATION_TIME = '16:16:33 EDT'
|
HIERARCH CALIB_CREATION_ROOT = '/tigress/HSC/HSC/rerun/dm-5124/calib'
|
HIERARCH CALIB_INPUT_0 = '(904542,)'
|
HIERARCH CALIB_INPUT_1 = '(904544,)'
|
HIERARCH CALIB_INPUT_2 = '(904546,)'
|
HIERARCH CALIB_INPUT_3 = '(904548,)'
|
HIERARCH CALIB_INPUT_4 = '(904550,)'
|
HIERARCH CALIB_INPUT_5 = '(904552,)'
|
HIERARCH CALIB_INPUT_6 = '(904554,)'
|
HIERARCH CALIB_INPUT_7 = '(904556,)'
|
HIERARCH CALIB_INPUT_8 = '(904558,)'
|
HIERARCH CALIB_INPUT_9 = '(904560,)'
|
HIERARCH CALIB_INPUT_10 = '(904562,)'
|
HIERARCH CALIB_INPUT_11 = '(904564,)'
|
HIERARCH CALIB_INPUT_12 = '(904566,)'
|
CALIB_ID= 'filter=NONE calibDate=2013-11-03 ccd=50'
|
HIERARCH MD5_IMAGE = '6185bb72f20de7e81c45e6e6591eb6ad'
|
(eliding the WCS headers and the mandatory headers).
This header contains the provenance information via the input visit numbers and encodes the critical information for how this was formed in the CALIB_ID header. For a butler user this is sufficient to work out the what type of data was used (for gen 2 butler the instrument is fixed for the entire repository) and presumably what configuration was used.
From a metadata translation perspective I do not have sufficient information in this header to work out what the data are. I can't even tell which telescope it comes from. If this file is copied out of the butler repository it's now very difficult to know anything about it.
In this RFC I propose that if we are writing headers into FITS files we should write a full set of usable headers and not require that people use provenance to work out basic information.
Specifically:
- The output file should be seeded with a header created from all the input files, dropping any headers that have values that are different.
- We allow specific headers (via configuration) from the earliest file and latest file being processed to be added to the output headers.
The second point is to allow time-dependent headers to appear in the output files to give the reader of the header an idea of how conditions changed or the range of time covered by the output product. For example, we plan to write DATE-OBS and DATE-END headers to LSST files. Propagating the oldest DATE-OBS and the newest DATE-END will give an idea of how much time elapsed between the first and last observation. This is not to be confused with the validity date.
I have a prototype for this working in DM-16292 for calibrations. With these changes an AuxTel reduced bias now looks like:
HEADVER = 2 / Version number of header
|
INSTRUME= 'LSST_ATISS' / Instrument
|
TELESCOP= 'LSST_AuxTel' / Telescope
|
SEQFILE = 'ats_20180511.seq' / Sequencer file name
|
CCD_MANU= 'ITL ' / CCD Manufacturer
|
CCD_TYPE= '3800C ' / CCD Model Number
|
CCD_SERN= '20304 ' / Manufacturers' CCD Serial Number
|
LSST_NUM= 'ITL-3800C-098' / LSST Assigned CCD Number
|
DETSIZE = '[1:4072,1:4000]' / IRAF detector size
|
EXPTIME = 0. / Exposure Time in Seconds
|
TELCODE = 'AT ' / The "code" for AuxTel
|
CONTRLLR= 'C ' / The controller (e.g. O for OCS, C for CCS)
|
DAYOBS = '20180920' / The observation day as defined in the image nam
|
MJD-OBS = 58382.2355372338 / Modified Julian Date of image acquisition
|
DATE-OBS= '2018-09-21T05:39:10.417' / Date of the observation (image acquisition
|
DARKTIME= 0.
|
ROTTYPE = 'UNKNOWN ' / Type of rotation angle
|
OBSTYPE = 'bias '
|
HIERARCH CALIB_CREATION_DATE = '2019-02-15'
|
HIERARCH CALIB_CREATION_TIME = '15:13:07 MST'
|
DATE-AVG= '2018-09-21T00:00:00.00'
|
HIERARCH CALIB_INPUT_0 = '(2018092000028,)'
|
HIERARCH CALIB_INPUT_1 = '(2018092000029,)'
|
HIERARCH CALIB_INPUT_2 = '(2018092000030,)'
|
HIERARCH CALIB_INPUT_3 = '(2018092000031,)'
|
HIERARCH CALIB_INPUT_4 = '(2018092000032,)'
|
HIERARCH CALIB_INPUT_5 = '(2018092000033,)'
|
HIERARCH CALIB_INPUT_6 = '(2018092000034,)'
|
HIERARCH CALIB_INPUT_7 = '(2018092000035,)'
|
HIERARCH CALIB_INPUT_8 = '(2018092000036,)'
|
CALIB_ID= 'detector=0 detectorName=S00 filter=NONE calibDate=2018-09-21'
|
HIERARCH MD5_IMAGE = 'eda3f77dfa7ea894a0f796e24615a42a'
|
MD5_MASK= 'fb394efc9596d6865b4dc311839da4f9'
|
HIERARCH MD5_VARIANCE = '8df0c2327f53ee09cb391743817c3f94'
|
HIERARCH EXPINFO_V = 0
|
AR_HDU = 5 / HDU (1-indexed) containing the archive used to
|
FILTER = '_unknown_'
|
FLUXMAG0= 0.
|
HIERARCH FLUXMAG0ERR = 0.
|
This header is immediately understandable as being from an AuxTel observation.
As an aside in pipe_drivers calibrations ostensibly report the "average" date in the headers but it is stored in DATE-OBS rather than DATE-AVG and it seems to be the average of full days and not the real average.
While I have no problem with Tim adding any headers that he feels like to the calibration products, I do object to the idea that these files can now be copied out of the LSST system (in particular the calibration registry) with impunity. That would be an export, and at that point we would add any required metadata (e.g. the validity range, which is not and cannot be tracked by Tim's proposed new keywords).
To put it another way, if we add these keys we should add an explicit rule that no pipeline code may rely on any field in these headers.