Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-19915

Run Gen3 pipelines tract=9615

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      attempt runs using the HSC-RC2 repo built in DM-18829

        Attachments

          Activity

          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          Skip ForcedPhotCcdTask due to the DM-19942 bug

          Show
          hchiang2 Hsin-Fang Chiang added a comment - Skip ForcedPhotCcdTask due to the DM-19942 bug
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          tract-scale runs have had failures from the DM-19988 bug 

          Show
          hchiang2 Hsin-Fang Chiang added a comment - tract-scale runs have had failures from the DM-19988 bug 
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          For the report on DMLT Jun 4:

          • No skyCorrection or jointcal
          • Stack w_2019_21 & daf_butler tickets/DM-19808 + DM-19851 was used
          • Some manual fixups such as adding dataset types were needed
          • Used lsst-dm/ctrl_bps branch tickets/DM-19846, generated the quantum graph separately 
          • Used partitionable HTCondor slots
          Show
          hchiang2 Hsin-Fang Chiang added a comment - For the report on DMLT Jun 4: No skyCorrection or jointcal Stack w_2019_21 & daf_butler tickets/ DM-19808 + DM-19851 was used Some manual fixups such as adding dataset types were needed Used lsst-dm/ctrl_bps branch tickets/ DM-19846 , generated the quantum graph separately  Used partitionable HTCondor slots
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          The most recent attempts were based on the w_2019_28 stack (after DM-19808 merged), ctrl_bps tickets/DM-20348 branch, and the new gen3-hsc-rc2 config files. Also will 

          • Move data to /scratch 
          • Increase memory requirements as needed 
          • Ignore failed patches of DM-20695 
          Show
          hchiang2 Hsin-Fang Chiang added a comment - The most recent attempts were based on the w_2019_28 stack (after DM-19808  merged), ctrl_bps tickets/ DM-20348 branch, and the new gen3-hsc-rc2 config files. Also will  Move data to /scratch  Increase memory requirements as needed  Ignore failed patches of  DM-20695  
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          The MergeCoaddMeasurements error affecting 19/79 patches is DM-20705

          Show
          hchiang2 Hsin-Fang Chiang added a comment - The MergeCoaddMeasurements error affecting 19/79 patches is DM-20705
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          Also filed DM-20758 for the issues from quantum pickles

          Show
          hchiang2 Hsin-Fang Chiang added a comment - Also filed DM-20758 for the issues from quantum pickles
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          The last run with the DM-20705 fix has finished successfully. The outputs are at /scratch/hchiang2/G3M19c_000001_035/ and /project/hchiang2/bps/submit/G3M19c/000001/drp/035/. The old butler repo from June /project/hchiang2/gen3repos/w_2019_20/repo/ was used with the new Oracle environment at lsst-oradb-test.ncsa.illinois.edu. The software stack was w_2019_28 plus the DM-20705 fix. It used the latest gen3-hsc-rc2 package (116a2fa) including the config files and pipeline.sh for generating a quantum graph pickle file, which is then used by ctrl_bps. Partitionable HTCondor slots were used.

          Compared to the Gen2 run, the main differences include:

          • Ignore two troublesome patches for tract=9615 (DM-20695 Selector) ({{-d "tract=9615 and patch!=28 and patch!=72" }}) for assembleCoadd.
          • No Gen3 skyCorrection or jointcal (DM-19470)

          p.s. Neither include ForcedPhotCcdTask (DM-19942)

           

          The execution breakdown from pegasus-statistics is (time unit: second)

          Transformation           Count     Succeeded Failed  Min       Max       Mean          Total      
          cit                      5050      5050      0       7.401     325.836   96.669        488178.172 
          ct                       5050      5050      0       3.989     83.851    41.16         207858.989 
          cwact                    396       396       0       2.912     324.306   198.656       78667.866  
          dagman::post             22616     22616     0       0.0       16.0      1.664         37640.0    
          dcsst                    396       396       0       2.468     420.753   221.337       87649.595  
          dcst                     396       396       0       3.207     71.902    55.128        21830.835  
          fpct                     396       396       0       3.426     3532.421  2149.721      851289.45  
          isr                      5050      5050      0       5.637     49.916    28.363        143234.315 
          mdt                      80        80        0       2.868     50.633    36.501        2920.078   
          mmcst                    396       396       0       4.032     4077.043  2401.424      950963.914 
          mmt                      80        80        0       2.456     31.86     22.111        1768.894   
          mwt                      3266      3266      0       3.025     151.34    87.323        285197.588 
          pegasus::dirmanager      1         1         0       2.184     2.184     2.184         2.184      
          pegasus::transfer        2059      2059      0       3.237     13.994    8.891         18307.199   
          

          Show
          hchiang2 Hsin-Fang Chiang added a comment - The last run with the DM-20705 fix has finished successfully. The outputs are at /scratch/hchiang2/G3M19c_000001_035/ and /project/hchiang2/bps/submit/G3M19c/000001/drp/035/ . The old butler repo from June /project/hchiang2/gen3repos/w_2019_20/repo/ was used with the new Oracle environment at lsst-oradb-test.ncsa.illinois.edu . The software stack was w_2019_28 plus the DM-20705 fix. It used the latest gen3-hsc-rc2 package ( 116a2fa ) including the config files and pipeline.sh for generating a quantum graph pickle file, which is then used by ctrl_bps. Partitionable HTCondor slots were used. Compared to the Gen2 run, the main differences include: Ignore two troublesome patches for tract=9615 ( DM-20695 Selector) ({{-d "tract=9615 and patch!=28 and patch!=72" }}) for assembleCoadd. No Gen3 skyCorrection or jointcal ( DM-19470 ) p.s. Neither include ForcedPhotCcdTask ( DM-19942 )   The execution breakdown from pegasus-statistics is (time unit: second) Transformation Count Succeeded Failed Min Max Mean Total cit 5050 5050 0 7.401 325.836 96.669 488178.172 ct 5050 5050 0 3.989 83.851 41.16 207858.989 cwact 396 396 0 2.912 324.306 198.656 78667.866 dagman::post 22616 22616 0 0.0 16.0 1.664 37640.0 dcsst 396 396 0 2.468 420.753 221.337 87649.595 dcst 396 396 0 3.207 71.902 55.128 21830.835 fpct 396 396 0 3.426 3532.421 2149.721 851289.45 isr 5050 5050 0 5.637 49.916 28.363 143234.315 mdt 80 80 0 2.868 50.633 36.501 2920.078 mmcst 396 396 0 4.032 4077.043 2401.424 950963.914 mmt 80 80 0 2.456 31.86 22.111 1768.894 mwt 3266 3266 0 3.025 151.34 87.323 285197.588 pegasus::dirmanager 1 1 0 2.184 2.184 2.184 2.184 pegasus::transfer 2059 2059 0 3.237 13.994 8.891 18307.199
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          ------------------------------------------------------------------------------
          Type           Succeeded Failed  Incomplete  Total     Retries   Total+Retries
          Tasks          20556     0       0           20556     0         20556        
          Jobs           22616     0       0           22616     0         22616        
          Sub-Workflows  0         0       0           0         0         0            
          ------------------------------------------------------------------------------
           
          Workflow wall time                                       : 8 hrs, 50 mins
          Cumulative job wall time                                 : 36 days, 7 hrs
          Cumulative job wall time as seen from submit side        : 36 days, 9 hrs
          Cumulative job badput wall time                          : 0.0 secs
          Cumulative job badput wall time as seen from submit side : 0.0 secs
          

          Show
          hchiang2 Hsin-Fang Chiang added a comment - ------------------------------------------------------------------------------ Type Succeeded Failed Incomplete Total Retries Total+Retries Tasks 20556 0 0 20556 0 20556 Jobs 22616 0 0 22616 0 22616 Sub-Workflows 0 0 0 0 0 0 ------------------------------------------------------------------------------   Workflow wall time : 8 hrs, 50 mins Cumulative job wall time : 36 days, 7 hrs Cumulative job wall time as seen from submit side : 36 days, 9 hrs Cumulative job badput wall time : 0.0 secs Cumulative job badput wall time as seen from submit side : 0.0 secs

            People

            Assignee:
            hchiang2 Hsin-Fang Chiang
            Reporter:
            hchiang2 Hsin-Fang Chiang
            Watchers:
            Hsin-Fang Chiang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                CI Builds

                No builds found.