Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-30349

Source count metrics include fake sources

    XMLWordPrintable

    Details

    • Story Points:
      16
    • Sprint:
      AP F21-1 (June), AP F21-2 (July)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      Per DM with Eric Bellm, our source count metrics should be agnostic to whether or not we are running fakes processing. However, the pipeline does not distinguish between fake and natural sources; even ProcessCcdWithFakesTask forgets this information as soon as it modifies the image. There is a FAKE mask plane and corresponding catalog flags, but these flags are not suitable for source filtering.

      The current best way to identify fake sources is to cross-match them to the original fakes catalog, as is done for the existing fakes metrics. This is adds a dependency on a dataset that does not exist in non-fakes pipelines, though this can be turned on and off in pipeline configurations at the ap_verify level. Add support for such cross-matching to the existing metrics, preferably in a way that leaves the Diffim and SFP metrics portable across pipelines.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment -

            I've encountered a problem with the four ap_association metrics affected by this issue. Three of these, numNewDiaObjects, numUnassociatedDiaObjects, and fracUpdatedDiaObjects, are actually computed by AssociationTask during the matching process. To support these with my current approach, I'd have to cross-match the fakes to both the DIASources and the DIAObjects (separately from the "official" match to associated DIASources, which must be run after AssociationTask), then pass both sets of matches into DiaPipelineTask and thence to AssociationTask.

            While I can do this, it would be a very intrusive change to ap_association. On the other hand, I don't see a way to compute these three metrics from only data products; you can tell that a DIAObject was associated with a DIASource, but the new/updated distinction requires knowledge of the association order.

            Chris Morrison [X], do you know of some other way to figure out which DIASources/DIAObjects are fakes within AssociationTask?

            Show
            krzys Krzysztof Findeisen added a comment - I've encountered a problem with the four ap_association metrics affected by this issue. Three of these, numNewDiaObjects , numUnassociatedDiaObjects , and fracUpdatedDiaObjects , are actually computed by AssociationTask during the matching process. To support these with my current approach, I'd have to cross-match the fakes to both the DIASources and the DIAObjects (separately from the "official" match to associated DIASources, which must be run after AssociationTask ), then pass both sets of matches into DiaPipelineTask and thence to AssociationTask . While I can do this, it would be a very intrusive change to ap_association . On the other hand, I don't see a way to compute these three metrics from only data products; you can tell that a DIAObject was associated with a DIASource, but the new/updated distinction requires knowledge of the association order. Chris Morrison [X] , do you know of some other way to figure out which DIASources/DIAObjects are fakes within AssociationTask ?
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            For Eric Bellm, another alternative would be to give up on making the metrics fakes-invariant, and handle this at the SQuaSH level, by running each dataset with and without fakes and comparing the numbers. But that would eat more into our (still vague?) space and runtime budget.

            Show
            krzys Krzysztof Findeisen added a comment - - edited For Eric Bellm , another alternative would be to give up on making the metrics fakes-invariant, and handle this at the SQuaSH level, by running each dataset with and without fakes and comparing the numbers. But that would eat more into our (still vague?) space and runtime budget.
            Hide
            cmorrison Chris Morrison [X] (Inactive) added a comment -

            Krzysztof Findeisen right now we don't write the per-visit/ccd DiaObject table to disk. If we did, you could do a bit of post processing to combine the DiaSource fakes matched from that visit to remove the amount of number counts that come from fakes.

            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - Krzysztof Findeisen right now we don't write the per-visit/ccd DiaObject table to disk. If we did, you could do a bit of post processing to combine the DiaSource fakes matched from that visit to remove the amount of number counts that come from fakes.
            Hide
            ebellm Eric Bellm added a comment -

            I was going to echo Chris Morrison [X]'s comment about reconstructing the association order in post-processing (and then JIRA ate my comment).

            However, I'm not sure it's worth the effort, at least right now. Those specific metrics are largely meant to catch gross failures in the pipelines--the baseline itself is not particularly informative. While changing the number of inserted fakes will cause a step in our metrics, we can annotate those rare changes with our usual tools.

            As we continue to build up metrics that use fakes, sky objects, and known SSObjects I expect that we'll rely less and less on these relatively simple metrics anyway.

            (Also, you mentioned problems with four metrics but only listed three--which was the fourth?)

            Show
            ebellm Eric Bellm added a comment - I was going to echo Chris Morrison [X] 's comment about reconstructing the association order in post-processing (and then JIRA ate my comment). However, I'm not sure it's worth the effort, at least right now. Those specific metrics are largely meant to catch gross failures in the pipelines--the baseline itself is not particularly informative. While changing the number of inserted fakes will cause a step in our metrics, we can annotate those rare changes with our usual tools. As we continue to build up metrics that use fakes, sky objects, and known SSObjects I expect that we'll rely less and less on these relatively simple metrics anyway. (Also, you mentioned problems with four metrics but only listed three--which was the fourth?)
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            The fourth is totalUnassociatedDiaObjects, which is unusual because it needs to be calculated from the APDB. However, I think I can use the standard fakes match catalog for it, so I'm not sure why I thought it would be a problem.

            So what's the verdict? Should I try to use the per-visit tables (though I don't see a config flag for that), should I stop with what I have (ip_diffim metrics converted, but ap_association ones untouched), should I abandon this ticket entirely...?

            Show
            krzys Krzysztof Findeisen added a comment - - edited The fourth is totalUnassociatedDiaObjects , which is unusual because it needs to be calculated from the APDB. However, I think I can use the standard fakes match catalog for it, so I'm not sure why I thought it would be a problem. So what's the verdict? Should I try to use the per-visit tables (though I don't see a config flag for that), should I stop with what I have ( ip_diffim metrics converted, but ap_association ones untouched), should I abandon this ticket entirely...?
            Hide
            krzys Krzysztof Findeisen added a comment -

            Discussed at sprint planning today; decision was to drop ap_association metrics from the scope of the ticket, and to finish integrating the modified ip_diffim metrics into the ApVerifyWithFakes pipeline.

            Show
            krzys Krzysztof Findeisen added a comment - Discussed at sprint planning today; decision was to drop ap_association metrics from the scope of the ticket, and to finish integrating the modified ip_diffim metrics into the ApVerifyWithFakes pipeline.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Eventual solution ended up being completely different, because it turns out that the pipeline does distinguish fake and real sources through SFP, putting them into distinct datasets. Most metrics were using non-fakes datasets to begin with, so the only ip_diffim or pipe_tasks metric that needed fixing was fracDiaSourcesToSciSources. I've added a new difference imaging task that creates a clean diaSource catalog. I still don't think this trick can be extended up to the diaPipe metrics, because we only have one APDB.

            Since even this, much simpler, solution involved some hacking of ApVerifyWithFakes, I suggest either Chris Morrison [X] or Eric Bellm as the reviewer, to check that I haven't inadvertently broken the intent of that pipeline.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Eventual solution ended up being completely different, because it turns out that the pipeline does distinguish fake and real sources through SFP, putting them into distinct datasets. Most metrics were using non-fakes datasets to begin with, so the only ip_diffim or pipe_tasks metric that needed fixing was fracDiaSourcesToSciSources . I've added a new difference imaging task that creates a clean diaSource catalog. I still don't think this trick can be extended up to the diaPipe metrics, because we only have one APDB. Since even this, much simpler, solution involved some hacking of ApVerifyWithFakes , I suggest either Chris Morrison [X] or Eric Bellm as the reviewer, to check that I haven't inadvertently broken the intent of that pipeline.
            Hide
            cmorrison Chris Morrison [X] (Inactive) added a comment -

            Looks good, tried to have some, semi coherent thought about how we could possibly setup two separate Apdbs and use them in different DiaPipe tasks. Not sure if it was anything useful.

            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - Looks good, tried to have some, semi coherent thought about how we could possibly setup two separate Apdbs and use them in different DiaPipe tasks. Not sure if it was anything useful.

              People

              Assignee:
              krzys Krzysztof Findeisen
              Reporter:
              krzys Krzysztof Findeisen
              Reviewers:
              Chris Morrison [X] (Inactive)
              Watchers:
              Chris Morrison [X] (Inactive), Eric Bellm, Ian Sullivan, Krzysztof Findeisen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.