Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12672

Missing filter information in ap_pipe repo

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Won't Fix
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ap_verify
    • Labels:
      None
    • Story Points:
      4
    • Sprint:
      AP S18-5, AP S18-6
    • Team:
      Alert Production

      Description

      Something in the ingestion and/or image processing steps of ap_pipe loses information about which visits are in which bands. This breaks the use of coadd templates unless a filter is explicitly provided as part of the data ID.

      While we can work around this bug with a more specific data ID, it will make it harder to scale up ap_pipe to generic datasets (see DM-12535). Fixing this bug may also be a good opportunity to make our repository handling more idiomatic/flexible.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment -

            Now that DM-13451 is merged, repository issues require changes to ap_verify, not ap_pipe.

            Show
            krzys Krzysztof Findeisen added a comment - Now that DM-13451 is merged, repository issues require changes to ap_verify , not ap_pipe .
            Hide
            krzys Krzysztof Findeisen added a comment -

            Meredith Rawls found a lot of extra info in the registry that seems to relate different datasets to each other. A good regression test for this issue would try to query this relationship info in addition to filters.

            Show
            krzys Krzysztof Findeisen added a comment - Meredith Rawls found a lot of extra info in the registry that seems to relate different datasets to each other. A good regression test for this issue would try to query this relationship info in addition to filters.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            The bug appears to be the result of too-simplistic handling of the data ID in ApPipeTask and ap_verify, rather than anything to do with the registry. That explains why it doesn't appear when running ImageDifferenceTask directly from the command line (at least, on HSC data). Many thanks to Yusra AlSayyad for her patience with my questions!

            Show
            krzys Krzysztof Findeisen added a comment - - edited The bug appears to be the result of too-simplistic handling of the data ID in ApPipeTask and ap_verify , rather than anything to do with the registry. That explains why it doesn't appear when running ImageDifferenceTask directly from the command line (at least, on HSC data). Many thanks to Yusra AlSayyad for her patience with my questions!
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Further investigation shows that, in fact, ImageDifferenceTask cannot handle DECam data if the user doesn't provide a filter. Since it's not clear how or even whether to fix this behavior in ImageDifferenceTask and/or obs_decam (see below), and I can't think of a workaround in ap_verify that wouldn't require special-case code, I'm marking this issue as Won't Fix and opening DM-14359 to address the data ID handling problems I found in the meantime.

            For the record, the process by which image differencing runs without an explicit filter fail on DECam is the following:

            • ImageDifferenceTask's argument parser interprets data IDs as references to calexps.
            • obs_decam gives calexps a URI that depends only on the visit and ccd.
            • Based on parsing the mapping information, obs_base considers the "keys" for calexp data IDs to be visit and ccd.
            • The Butler defines complete dataIds as those that have values for all their keys.
            • Therefore, data IDs for DECam calexps never have a filter added, even though this information can be extracted from the repository. (Note that HSC's mapper does include the filter in a calexp's URI, so by the preceding logic a complete HSC calexp dataref has an explicit filter.)
            • ip_diffim tries to look up templates (in my case, a deepCoaddPsfMatched) based on the calexp dataref, supplemented with tract and patch IDs.
            • obs_base identifies coadds using tract, patch, and filter. The Butler tries to look up the missing filter information (based on all available information, i.e. visit, ccd, patch, and tract), but there are no tables listed in which to do the lookup. The information is provided in raw, but coadds are not, themselves, raw data, and not all lookups for coadds are guaranteed to have a visit in their dataId.

            This is a contract mismatch problem: each step makes sense in isolation, but the pieces don't work together. Since there's no single piece that's obviously wrong, and changing any of the behaviors I listed above will almost certainly have side effects, I defer the question of what, if anything, can be done about it to people more familiar with the framework than I.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Further investigation shows that, in fact, ImageDifferenceTask cannot handle DECam data if the user doesn't provide a filter. Since it's not clear how or even whether to fix this behavior in ImageDifferenceTask and/or obs_decam (see below), and I can't think of a workaround in ap_verify that wouldn't require special-case code, I'm marking this issue as Won't Fix and opening DM-14359 to address the data ID handling problems I found in the meantime. For the record, the process by which image differencing runs without an explicit filter fail on DECam is the following: ImageDifferenceTask 's argument parser interprets data IDs as references to calexps . obs_decam gives calexps a URI that depends only on the visit and ccd . Based on parsing the mapping information , obs_base considers the "keys" for calexp data IDs to be visit and ccd. The Butler defines complete dataIds as those that have values for all their keys . Therefore, data IDs for DECam calexps never have a filter added, even though this information can be extracted from the repository. (Note that HSC's mapper does include the filter in a calexp's URI, so by the preceding logic a complete HSC calexp dataref has an explicit filter.) ip_diffim tries to look up templates (in my case, a deepCoaddPsfMatched ) based on the calexp dataref, supplemented with tract and patch IDs . obs_base identifies coadds using tract, patch, and filter . The Butler tries to look up the missing filter information (based on all available information, i.e. visit, ccd, patch, and tract), but there are no tables listed in which to do the lookup. The information is provided in raw , but coadds are not, themselves, raw data, and not all lookups for coadds are guaranteed to have a visit in their dataId. This is a contract mismatch problem: each step makes sense in isolation, but the pieces don't work together. Since there's no single piece that's obviously wrong, and changing any of the behaviors I listed above will almost certainly have side effects, I defer the question of what, if anything, can be done about it to people more familiar with the framework than I.
            Hide
            ctslater Colin Slater added a comment -

            FWIW, I believe this is the same as or related to DM-9148.

            Show
            ctslater Colin Slater added a comment - FWIW, I believe this is the same as or related to DM-9148 .
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            I'm not convinced it's the same (my stack trace does not contain CameraMapper._setFilter, and the registry does have all necessary info), but it's similar enough to note.

            Show
            krzys Krzysztof Findeisen added a comment - - edited I'm not convinced it's the same (my stack trace does not contain CameraMapper._setFilter , and the registry does have all necessary info), but it's similar enough to note.
            Hide
            ktl Kian-Tat Lim added a comment -

            The intent of the Gen2 Butler design was to eventually be able to determine that filter can be derived from visit in the dataId using (e.g.) the raw table.  I'm not sure that the current policy is sufficient to accomplish this for coadds; it might involve a combination of adding "tables: raw" and something for the apparently-undocumented "columns:" if it is indeed possible.

            Show
            ktl Kian-Tat Lim added a comment - The intent of the Gen2 Butler design was to eventually be able to determine that filter can be derived from visit in the dataId using (e.g.) the raw table.  I'm not sure that the current policy is sufficient to accomplish this for coadds; it might involve a combination of adding "tables: raw" and something for the apparently-undocumented "columns:" if it is indeed possible.

              People

              Assignee:
              krzys Krzysztof Findeisen
              Reporter:
              krzys Krzysztof Findeisen
              Watchers:
              Colin Slater, Eric Bellm, Kian-Tat Lim, Krzysztof Findeisen, Meredith Rawls, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.