Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22797

Various FITS header fixes

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: afw, pipe_tasks
    • Labels:
    • Team:
      External

      Description

      The HSC team requests some changes to the output FITS headers:

      • FLUXMAG0 missing from the calexp header
      • Please do not erase original FITS header keywords in raw data for calexp (and deepCoadd_calexp). For example, FRAME_ID, EXP_ID, DATA-TYP, DET-ID, OBJECT, FILTER01, AZIMUTH, ALTITUDE, PROP-ID, DATE-OBS, MJD, etc.
      • In the deepCoadd_*Warp, DATE-OBS, MJD values are wrong.
      • Please do not omit the FITS keyword CD1_1, CD2_2, CD1_2, CD2_1 even when its value = 0.

        Attachments

          Issue Links

            Activity

            price Paul Price created issue -
            Hide
            tjenness Tim Jenness added a comment -

            I completely agree that input headers should propagate to the output if there is one input file going to one output file.

            Deep coadds should merge headers as we do for calibrations: dropping things that are different. I generally prefer DATE-OBS and DATE-END to correspond to earliest DATE-OBS and oldest DATE-END. What value do HSC want for DATE-OBS of a coadd?

            CD keywords default to 0 by definition in the FITS standard. I can't quickly see an option in AST FitsChan to force them to always appear (CDMatrix option controls PC vs CD output). We might need a new attribute.

            Show
            tjenness Tim Jenness added a comment - I completely agree that input headers should propagate to the output if there is one input file going to one output file. Deep coadds should merge headers as we do for calibrations: dropping things that are different. I generally prefer DATE-OBS and DATE-END to correspond to earliest DATE-OBS and oldest DATE-END. What value do HSC want for DATE-OBS of a coadd? CD keywords default to 0 by definition in the FITS standard. I can't quickly see an option in AST FitsChan to force them to always appear (CDMatrix option controls PC vs CD output). We might need a new attribute.
            Hide
            price Paul Price added a comment -

            Masayuki Tanaka, Hisanori Furusawa: There are MAGZERO_RMS and MAGZERO_NOBJ keywords in the calexp headers, and it looks like MAGZERO should be there too, except for a bug. Is MAGZERO sufficient, or do you really need FLUXMAG0?

            Show
            price Paul Price added a comment - Masayuki Tanaka , Hisanori Furusawa : There are MAGZERO_RMS and MAGZERO_NOBJ keywords in the calexp headers, and it looks like MAGZERO should be there too, except for a bug. Is MAGZERO sufficient, or do you really need FLUXMAG0 ?
            Hide
            price Paul Price added a comment -

            Regarding writing CD keywords even when zero, I had a quick poke around AST, and I see in starlink_ast/fitschan.c:

                           if( val != 0.0 ) {
                               SetValue( this, FormatKey( "CD", i + 1, j + 1, s, status ), &val,
                                         AST__FLOAT, "Transformation matrix element", status );
                           }
            

            I therefore think that this "feature" of not writing zero-valued CD keywords is baked into AST on purpose. I think that changing this to the desired behaviour would therefore require some low-level changes, and I'm not sure it would be worth it.

            Show
            price Paul Price added a comment - Regarding writing CD keywords even when zero, I had a quick poke around AST, and I see in starlink_ast/fitschan.c : if( val != 0.0 ) { SetValue( this, FormatKey( "CD", i + 1, j + 1, s, status ), &val, AST__FLOAT, "Transformation matrix element", status ); } I therefore think that this "feature" of not writing zero-valued CD keywords is baked into AST on purpose. I think that changing this to the desired behaviour would therefore require some low-level changes, and I'm not sure it would be worth it.
            Hide
            masayuki.tanaka Masayuki Tanaka added a comment -

            We need either MAGZERO or FLUXMAG0, not both.  Given that there is already MAGZERO_RMS, it is probably better to have MAGZERO.  As for the CD matrix, it seems warps and coadds do not have these zero-valued CD components (there may be more).  I don't think it does any harm to explicitly fill these components with zero, but it is certainly not a high priority.  If it is difficult for you to fix, then I think that is fine.

            Show
            masayuki.tanaka Masayuki Tanaka added a comment - We need either MAGZERO or FLUXMAG0, not both.  Given that there is already MAGZERO_RMS, it is probably better to have MAGZERO.  As for the CD matrix, it seems warps and coadds do not have these zero-valued CD components (there may be more).  I don't think it does any harm to explicitly fill these components with zero, but it is certainly not a high priority.  If it is difficult for you to fix, then I think that is fine.
            Hide
            tjenness Tim Jenness added a comment -

            The FITS standard says:

            ...unspecified CDi_j keywords shall default to zero.

            although I can't see anything that requires them to be left out the header if they are 0.0. I can imagine that most people don't want them there if you have a 4D image with stokes vectors and the like.

            Show
            tjenness Tim Jenness added a comment - The FITS standard says: ...unspecified CDi_j keywords shall default to zero. although I can't see anything that requires them to be left out the header if they are 0.0. I can imagine that most people don't want them there if you have a 4D image with stokes vectors and the like.
            Hide
            price Paul Price added a comment -

            Aha! The bad DATE-OBS and MJD-OBS in the warp headers is coming from the SkyWcs.

            Show
            price Paul Price added a comment - Aha! The bad DATE-OBS and MJD-OBS in the warp headers is coming from the SkyWcs .
            Hide
            price Paul Price added a comment -

            I have filed two issues on Starlink AST to modify its behaviour as desired here:

            Show
            price Paul Price added a comment - I have filed two issues on Starlink AST to modify its behaviour as desired here: Don't write DATE-OBS, MJD-OBS Write CDi_j even when zero
            Hide
            price Paul Price added a comment -

            I discovered that we are causing AST to write DATE-OBS and MJD-OBS in the headers.
            For CDi_j, I was convinced that post-processing the headers produced by AST is the way to go (though there seems to be disagreement over whether LSST wants this).
            I've made changes to afw to effect these.

            Show
            price Paul Price added a comment - I discovered that we are causing AST to write DATE-OBS and MJD-OBS in the headers. For CDi_j , I was convinced that post-processing the headers produced by AST is the way to go (though there seems to be disagreement over whether LSST wants this). I've made changes to afw to effect these.
            Hide
            price Paul Price added a comment -

            The FITS header keywords being erased seem to be stripped by astro_metadata_translator, but it's not putting them back. I'm digging...

            Show
            price Paul Price added a comment - The FITS header keywords being erased seem to be stripped by astro_metadata_translator, but it's not putting them back. I'm digging...
            Hide
            tjenness Tim Jenness added a comment -

            obs_base insists that headers are stripped when a VisitInfo is constructed.

            Show
            tjenness Tim Jenness added a comment - obs_base insists that headers are stripped when a VisitInfo is constructed.
            Hide
            price Paul Price added a comment -

            I understand the stripping policy, but not everyone lives in our python+afw+butler world, and someone may want to get the exposure ID using wcstools. It looks like the values are written to new keywords out of the VisitInfo (see here). So the values aren't missing, it's just that the header keywords have changed (and it's possible the formats or units change too). Is that acceptable, Masayuki Tanaka and Hisanori Furusawa?

            Show
            price Paul Price added a comment - I understand the stripping policy, but not everyone lives in our python+afw+butler world, and someone may want to get the exposure ID using wcstools. It looks like the values are written to new keywords out of the VisitInfo (see here ). So the values aren't missing, it's just that the header keywords have changed (and it's possible the formats or units change too). Is that acceptable, Masayuki Tanaka and Hisanori Furusawa ?
            Hide
            tjenness Tim Jenness added a comment -

            I'm strongly in favor of the input header propagating to the output and being augmented by processing. I wouldn't want headers removed and then different ones added. If it's one file in and one file out I would want tools that parse headers to work without change.

            I don't think we should strip at all. For calibrations we build the output header from the unmodified input header. RFC-576 discusses this.

            Show
            tjenness Tim Jenness added a comment - I'm strongly in favor of the input header propagating to the output and being augmented by processing. I wouldn't want headers removed and then different ones added. If it's one file in and one file out I would want tools that parse headers to work without change. I don't think we should strip at all. For calibrations we build the output header from the unmodified input header. RFC-576 discusses this.
            Hide
            furusawa.hisanori Hisanori Furusawa added a comment -

            I'm not quite sure if I perfectly understand the history of this discussion. But, I thought we want 1) to keep all the header keywords listed in the origin raw data as they are, and 2) to have important keywords determined by processing like flux calibration in the human-friendly FITS header as well as dedicated objects. For 1), I would support @tjenness 's idea to propagate the original header to the output. If manipulating stripping stuff is difficult, is it worth considering to add another HDU just to list all the original header? I'd prefer to have this in the primary HDU for easy access, but in any HDU would be way more helpful than missing.

            Show
            furusawa.hisanori Hisanori Furusawa added a comment - I'm not quite sure if I perfectly understand the history of this discussion. But, I thought we want 1) to keep all the header keywords listed in the origin raw data as they are, and 2) to have important keywords determined by processing like flux calibration in the human-friendly FITS header as well as dedicated objects. For 1), I would support @tjenness 's idea to propagate the original header to the output. If manipulating stripping stuff is difficult, is it worth considering to add another HDU just to list all the original header? I'd prefer to have this in the primary HDU for easy access, but in any HDU would be way more helpful than missing.
            Hide
            price Paul Price added a comment -

            I'd be interested in Robert Lupton and Jim Bosch's opinions on how to proceed.

            Show
            price Paul Price added a comment - I'd be interested in Robert Lupton and Jim Bosch 's opinions on how to proceed.
            tjenness Tim Jenness made changes -
            Field Original Value New Value
            Link This issue relates to RFC-576 [ RFC-576 ]
            Hide
            rhl Robert Lupton added a comment -

            I do not want to change the policy of stripping keywords that are potentially inconsistent with the processed values (e.g. input WCS cards).  As John Swinbank discussed in RFC-576, we are mixing up keeping metadata with exporting files; Hisanori Furusawa is asking that we export the original metadata along with processing metadata (e.g. FLUXMAG0).

            As we are not (currently?) using an explicit export step, the best solution may be to put the requested fields in an extra HDU as proposed.

            To give my responses to the specific questions:

            • FLUXMAG0 missing from the calexp header
              • We should export it
            • Please do not erase original FITS header keywords in raw data for calexp (and deepCoadd_calexp). For example, FRAME_ID, EXP_ID, DATA-TYP, DET-ID, OBJECT, FILTER01, AZIMUTH, ALTITUDE, PROP-ID, DATE-OBS, MJD, etc.
              • I don't understand why things like PROP-ID are being stripped, as we don't use them internally.
              • We should export the other information
            • In the deepCoadd_*Warp, DATE-OBS, MJD values are wrong.
              • This is presumably a bug.  Is there a ticket for it?
            • Please do not omit the FITS keyword CD1_1, CD2_2, CD1_2, CD2_1 even when its value = 0.
              • I commented on the GitHub issue:

                While the standard is clear that you can omit CD elements, I think it's much clearer if we include them, especially if some (admittedly broken) readers rely on this. Missing CD values are slightly magic; according to the A&A papers:
                "if one or more  CDi_j cards are present then all unspecified  CDi_j default to zero. If no  CDi_j cards are present then the header is assumed to be in  PCi_j form whether or not any  PCi_j cards are present"

                so I support writing the {{0}}s

            Show
            rhl Robert Lupton added a comment - I do not want to change the policy of stripping keywords that are potentially inconsistent with the processed values (e.g. input WCS cards).  As John Swinbank discussed in RFC-576 , we are mixing up keeping metadata with exporting files; Hisanori Furusawa is asking that we export the original metadata along with processing metadata (e.g. FLUXMAG0 ). As we are not (currently?) using an explicit export step, the best solution may be to put the requested fields in an extra HDU as proposed. To give my responses to the specific questions: FLUXMAG0 missing from the calexp header We should export it Please do not erase original FITS header keywords in raw data for calexp (and deepCoadd_calexp ). For example, FRAME_ID, EXP_ID, DATA-TYP, DET-ID, OBJECT, FILTER01, AZIMUTH, ALTITUDE, PROP-ID, DATE-OBS, MJD, etc. I don't understand why things like PROP-ID are being stripped, as we don't use them internally. We should export the other information In the deepCoadd_*Warp , DATE-OBS, MJD values are wrong. This is presumably a bug.  Is there a ticket for it? Please do not omit the FITS keyword CD1_1, CD2_2, CD1_2, CD2_1 even when its value = 0. I commented on the GitHub issue: While the standard is clear that you can omit CD elements, I think it's much clearer if we include them, especially if some (admittedly broken) readers rely on this. Missing CD values are slightly magic; according to the A&A papers: "if one or more CDi_j cards are present then all unspecified CDi_j default to zero. If no CDi_j cards are present then the header is assumed to be in PCi_j form whether or not any PCi_j cards are present" so I support writing the {{0}}s
            Hide
            tjenness Tim Jenness added a comment -

            Stripping WCS cards is orthogonal to stripping general headers like INSTRUME and EXPTIME. We absolutely should be stripping all the WCS cards if we are going to write an updated WCS to the file. In RFC-576 we agreed that we would put all input headers in the output files (for coadds dropping any that differ) and I'd rather we didn't reopen that discussion. Robert Lupton I assume you are solely objecting to the case where VisitInfo has a value and not the headers that VisitInfo has not interest in?

            I understand that there is a tension between propagating instrument headers to the output files and using LSST standardized headers in the output files. If instrument headers appear then all the observatory tooling that knows about raw data headers can still work on processed headers. Conversely, if everything is standardized in LSST headers all processed files regardless of instrument can be examined in a unified way.

            Note that headers are stripped based on ObservationInfo which is a superset of VisitInfo content, so when headers are stripped as part of VisitInfo construction you have no way to reconstruct some of the information.

            I would much prefer a scheme where the raw header is always propagated to the output files (dropping things that differ) and at the end of the header there is a standardized LSST block of cards detailing the LSST view of the header and any additional information calculated by processing. Yes, you end up with some information present in two places but I don't see that as a disaster. Raw headers propagated to the output will not be updated by processing and users will know that. This way we don't lose anything.

            Regarding CD matrix, the only motivation I can see for adding missing CD entries is to support FITS readers that aren't following the FITS standard. If we have to do it we have to do it but I would rather the reader code conformed to the standard. I'm not going to try to block the addition of them though.

            Show
            tjenness Tim Jenness added a comment - Stripping WCS cards is orthogonal to stripping general headers like INSTRUME and EXPTIME. We absolutely should be stripping all the WCS cards if we are going to write an updated WCS to the file. In RFC-576 we agreed that we would put all input headers in the output files (for coadds dropping any that differ) and I'd rather we didn't reopen that discussion. Robert Lupton I assume you are solely objecting to the case where VisitInfo has a value and not the headers that VisitInfo has not interest in? I understand that there is a tension between propagating instrument headers to the output files and using LSST standardized headers in the output files. If instrument headers appear then all the observatory tooling that knows about raw data headers can still work on processed headers. Conversely, if everything is standardized in LSST headers all processed files regardless of instrument can be examined in a unified way. Note that headers are stripped based on ObservationInfo which is a superset of VisitInfo content, so when headers are stripped as part of VisitInfo construction you have no way to reconstruct some of the information. I would much prefer a scheme where the raw header is always propagated to the output files (dropping things that differ) and at the end of the header there is a standardized LSST block of cards detailing the LSST view of the header and any additional information calculated by processing. Yes, you end up with some information present in two places but I don't see that as a disaster. Raw headers propagated to the output will not be updated by processing and users will know that. This way we don't lose anything. Regarding CD matrix, the only motivation I can see for adding missing CD entries is to support FITS readers that aren't following the FITS standard. If we have to do it we have to do it but I would rather the reader code conformed to the standard. I'm not going to try to block the addition of them though.
            Hide
            rhl Robert Lupton added a comment -

            I don't think that there's any good solution;  it turns out that some of the header-parsing code recognises PROP-ID and uses it to set an internal field, but that we are not reinstating it.

            In the short-to-medium run I am OK with Tim's proposal that we not strip any header keywords except those related to WCS, and he's OK with writing the 0s for the empty CD cards!  Is that enough to make progress?

            Show
            rhl Robert Lupton added a comment - I don't think that there's any good solution;  it turns out that some of the header-parsing code recognises PROP-ID and uses it to set an internal field, but that we are not reinstating it. In the short-to-medium run I am OK with Tim's proposal that we not strip any header keywords except those related to WCS, and he's OK with writing the 0s for the empty CD cards!  Is that enough to make progress?
            Hide
            price Paul Price added a comment -

            I like that solution. I'll implement it on a separate ticket.

            Show
            price Paul Price added a comment - I like that solution. I'll implement it on a separate ticket.
            Hide
            furusawa.hisanori Hisanori Furusawa added a comment -

            Thank you for the discussion. The solution sounds reasonable to me for now, too.

            Show
            furusawa.hisanori Hisanori Furusawa added a comment - Thank you for the discussion. The solution sounds reasonable to me for now, too.
            Hide
            Parejkoj John Parejko added a comment -

            I'd like to get some clarification for this point: "FLUXMAG0 missing from the calexp header"

            PhotoCalib does not use FLUXMAG0 and is potentially spatially varying, so the concept of an "all image zero point" may be invalid for a given image. PhotoCalib is also defined as a multiplicative factor going from counts to nanojansky, not a magnitude zero point. We do not write FLUXMAG0 to any of our headers now because of these reasons. Is the question about propagating a value from the raw header? That seems like it would fall under the same "this can cause problems" issue as the WCS keys.

            Show
            Parejkoj John Parejko added a comment - I'd like to get some clarification for this point: "FLUXMAG0 missing from the calexp header" PhotoCalib does not use FLUXMAG0 and is potentially spatially varying, so the concept of an "all image zero point" may be invalid for a given image. PhotoCalib is also defined as a multiplicative factor going from counts to nanojansky, not a magnitude zero point. We do not write FLUXMAG0 to any of our headers now because of these reasons. Is the question about propagating a value from the raw header? That seems like it would fall under the same "this can cause problems" issue as the WCS keys.
            Hide
            furusawa.hisanori Hisanori Furusawa added a comment -

            I believe the request is to have an averaged (representative) reference flux scale or magnitude zeropoint per image, even if the flux scale spatially varies. This is actually necessary for users working off the LSST stack to assess crude magnitudes of detected sources. I believe the keyword name does not have to be FLUXMAG0 if it is too obsoleted in the current LSST stack.

            Show
            furusawa.hisanori Hisanori Furusawa added a comment - I believe the request is to have an averaged (representative) reference flux scale or magnitude zeropoint per image, even if the flux scale spatially varies. This is actually necessary for users working off the LSST stack to assess crude magnitudes of detected sources. I believe the keyword name does not have to be FLUXMAG0 if it is too obsoleted in the current LSST stack.
            Hide
            price Paul Price added a comment -

            Propagating the raw headers will be done in DM-23062.

            Show
            price Paul Price added a comment - Propagating the raw headers will be done in DM-23062 .
            price Paul Price made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Hide
            rhl Robert Lupton added a comment -

            Furusawa-san's request seems reasonable. An average zero-point for a flat nu-F_nu source or something like that

            Show
            rhl Robert Lupton added a comment - Furusawa-san's request seems reasonable. An average zero-point for a flat nu-F_nu source or something like that
            Hide
            price Paul Price added a comment -

            With the proposed changes, CalibrateTask will write a MAGZERO header keyword with its CCD zeropoint, in addition to the MAGZERO_RMS and MAGZERO_NOBJ header keywords it's already writing.

            Show
            price Paul Price added a comment - With the proposed changes, CalibrateTask will write a MAGZERO header keyword with its CCD zeropoint, in addition to the MAGZERO_RMS and MAGZERO_NOBJ header keywords it's already writing.
            Hide
            price Paul Price added a comment -

            Yusra AlSayyad, would you please review these changes (afw and pipe_tasks)?

            Show
            price Paul Price added a comment - Yusra AlSayyad , would you please review these changes ( afw and pipe_tasks )?
            price Paul Price made changes -
            Reviewers Yusra AlSayyad [ yusra ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            yusra Yusra AlSayyad added a comment -

            Looks good

            Show
            yusra Yusra AlSayyad added a comment - Looks good
            yusra Yusra AlSayyad made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            Hide
            price Paul Price added a comment -

            Thanks Yusra.

            Merged to master.

            Show
            price Paul Price added a comment - Thanks Yusra. Merged to master.
            price Paul Price made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]
            Hide
            price Paul Price added a comment -
            Show
            price Paul Price added a comment - ( Jenkins is green .)
            price Paul Price made changes -
            Component/s afw [ 10714 ]

              People

              • Assignee:
                price Paul Price
                Reporter:
                price Paul Price
                Reviewers:
                Yusra AlSayyad
                Watchers:
                Hisanori Furusawa, John Parejko, Masayuki Tanaka, Paul Price, Robert Lupton, Tim Jenness, Yusra AlSayyad
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel