Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32128

Investigate astrometry failure in w_2021_40 processing of DC2

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Story Points:
      2
    • Epic Link:
    • Sprint:
      DRP S21b
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      Investigate the astrometry failure encountered during the Gen3 bps DC2 processing on DM-32024.  The error (reported on Slack is:

      _calibrate_265317_8.5407061.err:    raise pipeBase.TaskError(_calibrate_265317_8.5407061.err:lsst.pipe.base.task.TaskError: Fit failed: median scatter on sky = 13.437 arcsec > 10.000 config.maxScatterArcsec
      

      and this is associated with:

      {instrument: 'LSSTCam-imSim', detector: 8, visit: 265317, ...}
      

      (which is z band and R01 S22).

      Note: this was not seen in previous Gen3 runs as the single frame processing for those ran only on the visit/detector combinations that explicitly overlap one of the two tracts (and this particular dataId was not included). It is not seen in recent Gen2 runs (w_2021_40: DM-32071, w_2021_36: DM-31665, w_2021_32: DM-31351) as it overlaps with the 3828 tract that was recently dropped from the weekly processing (sigh...). However, interestingly (and perhaps somewhat frighteningly), this detector did get a successful fit in the w_2021_28 Gen2 run of DM-31067. Figure out where and what changed between then and now!

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            I'm all for shrinking the existing threshold for declaring failures.

            I'm less certain that adding a mag cut right now is a good idea - it seems safe, but we have a feature freeze precisely because lots of things that seem safe aren't.

            I'm also not certain we need DM-32128 now, unless there examples where the existing logic (with the new threshold) would not have caught a catastrophic failure but the one proposed there would have; I get that it'd be nice to use the same definitions in various places, but again, I want to aim for minimal possible changes for DP0.2 at this point. I am quite skeptical that giving the fitter additional iterations will ever allow it to recover; it's a greedy optimizer, so once it starts down into a bad local minimum, it's very hard for it to get out.

            Show
            jbosch Jim Bosch added a comment - I'm all for shrinking the existing threshold for declaring failures. I'm less certain that adding a mag cut right now is a good idea - it seems safe, but we have a feature freeze precisely because lots of things that seem safe aren't. I'm also not certain we need DM-32128 now, unless there examples where the existing logic (with the new threshold) would not have caught a catastrophic failure but the one proposed there would have; I get that it'd be nice to use the same definitions in various places, but again, I want to aim for minimal possible changes for DP0.2 at this point. I am quite skeptical that giving the fitter additional iterations will ever allow it to recover; it's a greedy optimizer, so once it starts down into a bad local minimum, it's very hard for it to get out.
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks, Jim.  I agree that the mag cut on the reference catalog may be a tad risky at this point as it should really undergo reasonably substantial testing to calibrate what band/threshold should be set.
             
            As for DM-32129 (I think that’s what you meant to refer to above), I have no problem with abandoning it if that’s what you think is best, but with a bit of a caveat. While I would feel very comfortable with the proposal there, based largely on the fact that I have a significant sample of actual data — simulations and real — providing general (“math”-confirmed!) justification as well as explicit guidelines for the selected threshold and what its effect will be per dataset, I have no such comfort level for the existing intra-fitter threshold config (that doesn’t actually exist in all of our fitters, e.g. fitAffineWcs). While I am almost certain all you say above is absolutely true, the “almost” qualifier is a sticking point for someone as paranoid as myself...I would need to actually see the numbers that come out and their distributions (and possible changes with iteration) to feel confident on the consequences of setting a number (especially for edgy-cases, which is sort of what brought this on). Since that stat is never persisted anywhere, obtaining the above dataset would require a fair amount of special purposed processing, and I just don’t have the cycles for that at the moment (i.e. on the timescales of DP0.2).

            To summarize, I’m a-ok with whatever path you feel is best, but if it’s not DM-32129, then I’ll leave it to you/others to find a more suitable (prolly just less paranoid!) candidate to set the numbers press the buttons!

            Show
            lauren Lauren MacArthur added a comment - Thanks, Jim.  I agree that the mag cut on the reference catalog may be a tad risky at this point as it should really undergo reasonably substantial testing to calibrate what band/threshold should be set.   As for  DM-32129 (I think that’s what you meant to refer to above), I have no problem with abandoning it if that’s what you think is best, but with a bit of a caveat. While I would feel very comfortable with the proposal there, based largely on the fact that I have a significant sample of actual data — simulations and real — providing general (“math”-confirmed!) justification as well as explicit guidelines for the selected threshold and what its effect will be per dataset, I have no such comfort level for the existing intra-fitter threshold config (that doesn’t actually exist in all of our fitters, e.g. fitAffineWcs ). While I am almost certain all you say above is absolutely true, the “almost” qualifier is a sticking point for someone as paranoid as myself...I would need to actually see the numbers that come out and their distributions (and possible changes with iteration) to feel confident on the consequences of setting a number (especially for edgy-cases, which is sort of what brought this on). Since that stat is never persisted anywhere, obtaining the above dataset would require a fair amount of special purposed processing, and I just don’t have the cycles for that at the moment (i.e. on the timescales of DP0.2). To summarize, I’m a-ok with whatever path you feel is best, but if it’s not DM-32129 , then I’ll leave it to you/others to find a more suitable (prolly just less paranoid!) candidate to set the numbers press the buttons!
            Hide
            jbosch Jim Bosch added a comment -

            Yes, DM-32129 is what I meant, and the piece that was missing in my mind was the fact that you know what the threshold should be in that new logic but don't know what it should be with the existing logic. Please proceed with that ticket; no more objection from me.

            Show
            jbosch Jim Bosch added a comment - Yes, DM-32129 is what I meant, and the piece that was missing in my mind was the fact that you know what the threshold should be in that new logic but don't know what it should be with the existing logic. Please proceed with that ticket; no more objection from me.
            Hide
            lauren Lauren MacArthur added a comment -

            Ok, great!  Thanks again.

            Show
            lauren Lauren MacArthur added a comment - Ok, great!  Thanks again.
            Hide
            erykoff Eli Rykoff added a comment -

            I think this is cleared up now, but an additional relevant point is that maxScatterArcsec is an internal configuration of the particular wcs fitter we are using (fitTanSipWcs), while the change we're talking about here is a general and reproducible check on the final output of whatever astrometry matcher and fitter we are configured to use.
            Lauren noted this above ("the existing intra-fitter threshold config (that doesn’t actually exist in all of our fitters, e.g. fitAffineWcs)"), but I wanted to pull this out as what I see as the primary reason why this check is necessary.

            Show
            erykoff Eli Rykoff added a comment - I think this is cleared up now, but an additional relevant point is that maxScatterArcsec is an internal configuration of the particular wcs fitter we are using ( fitTanSipWcs ), while the change we're talking about here is a general and reproducible check on the final output of whatever astrometry matcher and fitter we are configured to use. Lauren noted this above ("the existing intra-fitter threshold config (that doesn’t actually exist in all of our fitters, e.g. fitAffineWcs)"), but I wanted to pull this out as what I see as the primary reason why this check is necessary.

              People

              Assignee:
              lauren Lauren MacArthur
              Reporter:
              lauren Lauren MacArthur
              Reviewers:
              Jim Bosch
              Watchers:
              Brock Brendal [X] (Inactive), Eli Rykoff, Jim Bosch, Lauren MacArthur, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.