Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17843

A new set of processCcd failure in HSC-RC2 reprocessing

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Comparing HSC-RC2 reprocessing by stack w_2019_02 and w_2019_06, we've got a new set of processCcd errors. They fail at calibrate.astrometry.matcher with

        File "/software/lsstsw/stack_20181012/stack/miniconda3-4.5.4-fcd27eb/Linux64/meas_astrom/16.0-24-gfa57b64+1/python/lsst/meas/astrom/matchPessimisticB.py", line 278, in matchObjectsToSources
          raise RuntimeError("Unable to match sources")
      

      The 60 data ids of the new failure set is as follows

      --id visit=34338 ccd=75
      --id visit=34362 ccd=38
      --id visit=34382 ccd=43
      --id visit=34384 ccd=88
      --id visit=34402 ccd=61
      --id visit=34422 ccd=42
      --id visit=34424 ccd=2
      --id visit=34448 ccd=79
      --id visit=34450 ccd=25
      --id visit=34478 ccd=21
      --id visit=34482 ccd=50
      --id visit=34644 ccd=72
      --id visit=34690 ccd=39
      --id visit=36140 ccd=30
      --id visit=36170 ccd=18
      --id visit=36182 ccd=80
      --id visit=36212 ccd=34
      --id visit=36234 ccd=82
      --id visit=36240 ccd=0
      --id visit=36258 ccd=73
      --id visit=36428 ccd=61
      --id visit=36432 ccd=23
      --id visit=36492 ccd=50
      --id visit=34942 ccd=44
      --id visit=36762 ccd=18
      --id visit=36774 ccd=24
      --id visit=36792 ccd=34
      --id visit=36808 ccd=26
      --id visit=36828 ccd=16
      --id visit=36828 ccd=73
      --id visit=26046 ccd=32
      --id visit=26050 ccd=74
      --id visit=26058 ccd=34
      --id visit=26060 ccd=70
      -id visit=26072 ccd=12
      --id visit=26080 ccd=66
      --id visit=26084 ccd=64
      --id visit=23864 ccd=12
      --id visit=1308 ccd=40
      --id visit=23224 ccd=59
      --id visit=23232 ccd=32
      --id visit=27106 ccd=33
      --id visit=27106 ccd=41
      --id visit=27128 ccd=26
      --id visit=27134 ccd=28
      --id visit=440 ccd=92
      --id visit=452 ccd=35
      --id visit=452 ccd=42
      --id visit=472 ccd=30
      --id visit=1246 ccd=19
      --id visit=19696 ccd=20
      --id visit=30500 ccd=20
      --id visit=1180 ccd=89
      --id visit=1184 ccd=26
      --id visit=1194 ccd=26
      --id visit=360 ccd=26
      --id visit=22628 ccd=18
      --id visit=22664 ccd=26
      --id visit=23046 ccd=26
      --id visit=23050 ccd=84
      

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Agree that this is a release blocker.  I'd be open to reverting to OptimisticB for just HSC to avoid impacting the v17 timeline, though a real fix would of course be even better.

            Show
            jbosch Jim Bosch added a comment - Agree that this is a release blocker.  I'd be open to reverting to OptimisticB for just HSC to avoid impacting the v17 timeline, though a real fix would of course be even better.
            Hide
            cmorrison Chris Morrison [X] (Inactive) added a comment -

            Hey, I believe I've locked in on the problem after rerunning the 60 failed ccds.

            The pessimistic matcher computes an automated match tolerance. Basically this automated tolerance gets set very low in some unlucky cases and the tolerance is allowed to soften enough to get a match. By my estimates it would never be allowed larger than 1 arcsec. I also found a bug in this tolerance that let's it be set to very small, unphysical values.

            I'll fix the bug and see if I can't tweak things slightly to get all 60 to succeed.

            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - Hey, I believe I've locked in on the problem after rerunning the 60 failed ccds. The pessimistic matcher computes an automated match tolerance. Basically this automated tolerance gets set very low in some unlucky cases and the tolerance is allowed to soften enough to get a match. By my estimates it would never be allowed larger than 1 arcsec. I also found a bug in this tolerance that let's it be set to very small, unphysical values. I'll fix the bug and see if I can't tweak things slightly to get all 60 to succeed.
            Hide
            cmorrison Chris Morrison [X] (Inactive) added a comment -

            Okay, so tweak I made to the distance tolerance softening iterations and was able to get all 60 ccds to succeed. As I said previously, the problem was that the match distances was never softened with enough iterations to produce a match given the distortions in HSC. I've run the same settings through validate_drp and everything checks out as fine. I'll clean up the ticket once I head into the office. John Swinbank could you find me a reviewer in the man time please?

            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - Okay, so tweak I made to the distance tolerance softening iterations and was able to get all 60 ccds to succeed. As I said previously, the problem was that the match distances was never softened with enough iterations to produce a match given the distortions in HSC. I've run the same settings through validate_drp and everything checks out as fine. I'll clean up the ticket once I head into the office. John Swinbank could you find me a reviewer in the man time please?
            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - - edited Jenkins run:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29442/pipeline
            Hide
            ctslater Colin Slater added a comment -

            Fixes look good to me. The transition to multiple-pattern consensus mode above 2000 stars in the refcat seems reasonable, but it might be worth pulling that number out into a config option; that would at least give it some visibility and a doc string, etc (though it seems unlikely to need to be changed much).

            To record some offline discussions here: because the main change here is to allow more iterations of softening the match tolerances before giving up, we think this is unlikely to break any other CCDs, and the testing on validation_data_* hasn't shown any other breakage or degradation of the metrics. 

            Show
            ctslater Colin Slater added a comment - Fixes look good to me. The transition to multiple-pattern consensus mode above 2000 stars in the refcat seems reasonable, but it might be worth pulling that number out into a config option; that would at least give it some visibility and a doc string, etc (though it seems unlikely to need to be changed much). To record some offline discussions here: because the main change here is to allow more iterations of softening the match tolerances before giving up, we think this is unlikely to break any other CCDs, and the testing on validation_data_* hasn't shown any other breakage or degradation of the metrics. 

              People

              Assignee:
              cmorrison Chris Morrison [X] (Inactive)
              Reporter:
              hchiang2 Hsin-Fang Chiang
              Reviewers:
              Colin Slater
              Watchers:
              Chris Morrison [X] (Inactive), Colin Slater, Eric Bellm, Hsin-Fang Chiang, Jim Bosch, John Swinbank
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.