Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22521

enable partial image reads in cp_pipe combine to avoid memory issues

    XMLWordPrintable

Details

    • Improvement
    • Status: Done
    • Resolution: Done
    • None
    • cp_pipe

    Description

      cpCombineTask currently loads all input images as full images. It would be safer for large number of inputs if this could defer loading and iterate over chunks instead (as with the "rows" option in the previous generation).

      Attachments

        Issue Links

          Activity

            After working with erykoff during pair coding, this should soon be into review.  A comparison between a quick baseline bias run and the same run with this code results in identical images (based both on image stats and MD5 sum).  This should be into review pending the regular CI checks.

            czw Christopher Waters added a comment - After working with erykoff  during pair coding, this should soon be into review.  A comparison between a quick baseline bias run and the same run with this code results in identical images (based both on image stats and MD5 sum).  This should be into review pending the regular CI checks.
            czw Christopher Waters added a comment - https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/36377/pipeline/
            lskelvin Lee Kelvin added a comment - - edited

            Thanks Chris, this is a fantastic ticket. I've tested it by building some Merian DECam biases using this pipetask run:

            INSTTYPE="instrument='DECam' AND exposure.observation_type='zero'"
            DATERANGE="exposure.day_obs > 20210901 AND exposure.day_obs < 20210930"
             
            pipetask --long-log run --register-dataset-types -j 12 \
            -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
            -i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \
            --output u/lskelvin/scratch/DM-22521-ticket \
            -p $CP_PIPE_DIR/pipelines/cpBias.yaml \
            -d "$INSTTYPE AND $DATERANGE AND detector=1" \
            -c isr:overscan.fitType='MEDIAN_PER_ROW' \
            -c isr:doEmpiricalReadNoise=True
            

            I ran the above twice: once on w16, and once on w16+DM-22521. As you reported above, the output bias images look identical before and after. I was keen to check the memory usage, in Python:

            refs = list(set(butler.registry.queryDatasets(
                'cpBiasCombine_metadata',
                collections='u/lskelvin/scratch/DM-22521-vanilla',
                components=False)))
            meta = butler.getDirect(refs[0])
            mem_start_vanilla = meta['quantum.startMaxResidentSetSize']
            mem_end_vanilla = meta['quantum.endMaxResidentSetSize']
            mem_used_vanilla = (mem_end_vanilla - mem_start_vanilla)
            print(f'{mem_used_vanilla = }')
            

            finding these results:

            mem_used_vanilla = 9129881600
            mem_used_ticket = 550572032
            mem_used_ticket / mem_used_vanilla = 0.060
            

            So, all things considered, a huge success. I'm hopeful that this ticket will significantly reduce out-of-memory issues previously experienced which necessitated the use of afterburners to clear up any failed quanta. Thanks for putting this together! I don't have any significant comments on the PR, and as Jenkins doesn't complain, then I think this looks good to merge to me - congrats!

            lskelvin Lee Kelvin added a comment - - edited Thanks Chris, this is a fantastic ticket. I've tested it by building some Merian DECam biases using this pipetask run: INSTTYPE="instrument='DECam' AND exposure.observation_type='zero'" DATERANGE="exposure.day_obs > 20210901 AND exposure.day_obs < 20210930"   pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \ --output u/lskelvin/scratch/DM-22521-ticket \ -p $CP_PIPE_DIR/pipelines/cpBias.yaml \ -d "$INSTTYPE AND $DATERANGE AND detector=1" \ -c isr:overscan.fitType='MEDIAN_PER_ROW' \ -c isr:doEmpiricalReadNoise=True I ran the above twice: once on w16, and once on w16+ DM-22521 . As you reported above, the output bias images look identical before and after. I was keen to check the memory usage, in Python: refs = list ( set (butler.registry.queryDatasets( 'cpBiasCombine_metadata' , collections = 'u/lskelvin/scratch/DM-22521-vanilla' , components = False ))) meta = butler.getDirect(refs[ 0 ]) mem_start_vanilla = meta[ 'quantum.startMaxResidentSetSize' ] mem_end_vanilla = meta[ 'quantum.endMaxResidentSetSize' ] mem_used_vanilla = (mem_end_vanilla - mem_start_vanilla) print (f '{mem_used_vanilla = }' ) finding these results: mem_used_vanilla = 9129881600 mem_used_ticket = 550572032 mem_used_ticket / mem_used_vanilla = 0.060 So, all things considered, a huge success. I'm hopeful that this ticket will significantly reduce out-of-memory issues previously experienced which necessitated the use of afterburners to clear up any failed quanta. Thanks for putting this together! I don't have any significant comments on the PR, and as Jenkins doesn't complain, then I think this looks good to merge to me - congrats!
            erykoff Eli Rykoff added a comment -

            Lee- thanks for your detailed report! I’m glad it uses about 5% of the memory as expected from the default config. I know that Jim also wondering about the running time comparison. Is that in the metadata as well?

            erykoff Eli Rykoff added a comment - Lee- thanks for your detailed report! I’m glad it uses about 5% of the memory as expected from the default config. I know that Jim also wondering about the running time comparison. Is that in the metadata as well?
            lskelvin Lee Kelvin added a comment - - edited

            erykoff, checking the running time using something similar to:

            meta['quantum.endCpuTime'] - meta['quantum.startCpuTime']
            meta['quantum.endUserTime'] - meta['quantum.startUserTime']
            

            gives these times in w16:

            CPU: 93.02
            Usr: 89.80
            

            and these on w16+DM-22521:

            CPU: 119.29
            Usr: 116.46
            

            I should also add, with regards memory usage, there has been some discussion as to whether we should subtract the initial 'mem_used_start' values, or just use the 'mem_used_end' values instead of the difference to report memory usage. If I limit myself to looking at the end values only, the memory usage numbers change to this:

            mem_end_vanilla = 9491832832
            mem_end_ticket = 912302080
            mem_end_ticket / mem_end_vanilla = 0.096
            

            So, about a 30% increase in data processing time, but anywhere from a 90-95% reduction in memory usage. I think the trade-off is worth it here, but I'm keen to hear your thoughts.

            lskelvin Lee Kelvin added a comment - - edited erykoff , checking the running time using something similar to: meta[ 'quantum.endCpuTime' ] - meta[ 'quantum.startCpuTime' ] meta[ 'quantum.endUserTime' ] - meta[ 'quantum.startUserTime' ] gives these times in w16: CPU: 93.02 Usr: 89.80 and these on w16+ DM-22521 : CPU: 119.29 Usr: 116.46 I should also add, with regards memory usage, there has been some discussion as to whether we should subtract the initial ' mem_used_start ' values, or just use the ' mem_used_end ' values instead of the difference to report memory usage. If I limit myself to looking at the end values only, the memory usage numbers change to this: mem_end_vanilla = 9491832832 mem_end_ticket = 912302080 mem_end_ticket / mem_end_vanilla = 0.096 So, about a 30% increase in data processing time, but anywhere from a 90-95% reduction in memory usage. I think the trade-off is worth it here, but I'm keen to hear your thoughts.
            erykoff Eli Rykoff added a comment -

            Thanks! We were just curious. I definitely think the trade-off is worth it, but we didn't know exactly what that trade-off was.

            erykoff Eli Rykoff added a comment - Thanks! We were just curious. I definitely think the trade-off is worth it, but we didn't know exactly what that trade-off was.
            lskelvin Lee Kelvin added a comment -

            PS: I should add, the above is for one DECam detector combining 88 'zero' raws. I'm not sure how much these numbers may vary when moving to different instruments, or with a larger number of input datasets.

            lskelvin Lee Kelvin added a comment - PS: I should add, the above is for one DECam detector combining 88 'zero' raws. I'm not sure how much these numbers may vary when moving to different instruments, or with a larger number of input datasets.

            People

              czw Christopher Waters
              czw Christopher Waters
              Lee Kelvin
              Christopher Waters, Eli Rykoff, James Chiang, Lee Kelvin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.