Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32246

Trace problem that caused diaPipe to think it had nothing to do

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • None
    • None
    • 8
    • AP F21-5 (October), AP F21-6 (November)
    • Alert Production
    • No

    Description

      In DM-30703, basically everything ran successfully in bps, but diaPipe silently thought it had nothing to do and did not write anything to the postgres APDB. This ticket is to figure out what happened.

      Attachments

        Issue Links

          Activity

            The issue happened further upstream, during lsst.ip.diffim.getTemplate.GetMultiTractCoaddTemplateTask. This task takes constituents from an upstream output (calexp.wcs and calexp.bbox). Jim and Nate say they are aware of this bug and will be fixing it later this week. As a result, no difference images, DIA source catalogs, or APDB records were created even though bps thought it had succeeded.

            mrawls Meredith Rawls added a comment - The issue happened further upstream, during lsst.ip.diffim.getTemplate.GetMultiTractCoaddTemplateTask. This task takes constituents from an upstream output (calexp.wcs and calexp.bbox). Jim and Nate say they are aware of this bug and will be fixing it later this week. As a result, no difference images, DIA source catalogs, or APDB records were created even though bps thought it had succeeded.

            The ticket in question is DM-31769. I was the first one to encounter this problem, and Nate says that ticket should resolve it as a bonus "side effect." I want to wait until that ticket is merged and run a small test pipeline to verify this before closing this ticket.

            mrawls Meredith Rawls added a comment - The ticket in question is DM-31769 . I was the first one to encounter this problem, and Nate says that ticket should resolve it as a bonus "side effect." I want to wait until that ticket is merged and run a small test pipeline to verify this before closing this ticket.

            Can confirm this problem is resolved now! (In w_2021_45 for sure, and likely whichever weekly DM-31769 first made it into.)

            As a precursor to DM-32245, I ran four tracts of HiTS through the same AP pipeline I used for DM-30703 (by submitting 4 separate jobs to bps, since DM-31964 isn't quite ready yet).

            The result was the same small number of characterize failures as before, and a pipeline which generally completed. One notable exception - I intentionally chose two pairs of overlapping tracts (9812 & 9813, as well as 8604 & 8605). Whichever of these tracts ran second had hundreds of diaPipe quanta fail due to unique constraint failures (and a helpful error message! "Duplicate DiaObjects created after association. This is likely due to re-running data with an already populated Apdb. If this was not the case then there was an unexpected failure in Association while matching and creating new DiaObjects and should be reported. Exiting.")

            The fraction of failed diaPipe quanta appears at a glance to be proportionate to the fraction of tract overlap, which makes sense. It will be interesting to see if this problem manifests differently in DM-32245, when multiple tracts are run at the same time instead of in separate jobs.

            mrawls Meredith Rawls added a comment - Can confirm this problem is resolved now! (In w_2021_45 for sure, and likely whichever weekly DM-31769 first made it into.) As a precursor to DM-32245 , I ran four tracts of HiTS through the same AP pipeline I used for DM-30703 (by submitting 4 separate jobs to bps, since DM-31964 isn't quite ready yet). The result was the same small number of characterize failures as before, and a pipeline which generally completed. One notable exception - I intentionally chose two pairs of overlapping tracts (9812 & 9813, as well as 8604 & 8605). Whichever of these tracts ran second had hundreds of diaPipe quanta fail due to unique constraint failures (and a helpful error message! "Duplicate DiaObjects created after association. This is likely due to re-running data with an already populated Apdb. If this was not the case then there was an unexpected failure in Association while matching and creating new DiaObjects and should be reported. Exiting.") The fraction of failed diaPipe quanta appears at a glance to be proportionate to the fraction of tract overlap, which makes sense. It will be interesting to see if this problem manifests differently in DM-32245 , when multiple tracts are run at the same time instead of in separate jobs.

            There is no code to review here, but I can confirm things seem to be working now - please sign off if you agree, Ian.

            mrawls Meredith Rawls added a comment - There is no code to review here, but I can confirm things seem to be working now - please sign off if you agree, Ian.
            sullivan Ian Sullivan added a comment -

            Yes, this looks like you have things working, and have identified the multi-tract issue as the remaining problem. If there are new failures after that is being used, we can file a new ticket.

            sullivan Ian Sullivan added a comment - Yes, this looks like you have things working, and have identified the multi-tract issue as the remaining problem. If there are new failures after that is being used, we can file a new ticket.

            People

              mrawls Meredith Rawls
              mrawls Meredith Rawls
              Ian Sullivan
              Ian Sullivan, Meredith Rawls
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.