Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22521

enable partial image reads in cp_pipe combine to avoid memory issues

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: cp_pipe
    • Labels:

      Description

      cpCombineTask currently loads all input images as full images. It would be safer for large number of inputs if this could defer loading and iterate over chunks instead (as with the "rows" option in the previous generation).

        Attachments

          Issue Links

            Activity

            Hide
            czw Christopher Waters added a comment -

            After working with Eli Rykoff during pair coding, this should soon be into review.  A comparison between a quick baseline bias run and the same run with this code results in identical images (based both on image stats and MD5 sum).  This should be into review pending the regular CI checks.

            Show
            czw Christopher Waters added a comment - After working with Eli Rykoff  during pair coding, this should soon be into review.  A comparison between a quick baseline bias run and the same run with this code results in identical images (based both on image stats and MD5 sum).  This should be into review pending the regular CI checks.
            Show
            czw Christopher Waters added a comment - https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/36377/pipeline/
            Hide
            lskelvin Lee Kelvin added a comment - - edited

            Thanks Chris, this is a fantastic ticket. I've tested it by building some Merian DECam biases using this pipetask run:

            INSTTYPE="instrument='DECam' AND exposure.observation_type='zero'"
            DATERANGE="exposure.day_obs > 20210901 AND exposure.day_obs < 20210930"
             
            pipetask --long-log run --register-dataset-types -j 12 \
            -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
            -i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \
            --output u/lskelvin/scratch/DM-22521-ticket \
            -p $CP_PIPE_DIR/pipelines/cpBias.yaml \
            -d "$INSTTYPE AND $DATERANGE AND detector=1" \
            -c isr:overscan.fitType='MEDIAN_PER_ROW' \
            -c isr:doEmpiricalReadNoise=True
            

            I ran the above twice: once on w16, and once on w16+DM-22521. As you reported above, the output bias images look identical before and after. I was keen to check the memory usage, in Python:

            refs = list(set(butler.registry.queryDatasets(
                'cpBiasCombine_metadata',
                collections='u/lskelvin/scratch/DM-22521-vanilla',
                components=False)))
            meta = butler.getDirect(refs[0])
            mem_start_vanilla = meta['quantum.startMaxResidentSetSize']
            mem_end_vanilla = meta['quantum.endMaxResidentSetSize']
            mem_used_vanilla = (mem_end_vanilla - mem_start_vanilla)
            print(f'{mem_used_vanilla = }')
            

            finding these results:

            mem_used_vanilla = 9129881600
            mem_used_ticket = 550572032
            mem_used_ticket / mem_used_vanilla = 0.060
            

            So, all things considered, a huge success. I'm hopeful that this ticket will significantly reduce out-of-memory issues previously experienced which necessitated the use of afterburners to clear up any failed quanta. Thanks for putting this together! I don't have any significant comments on the PR, and as Jenkins doesn't complain, then I think this looks good to merge to me - congrats!

            Show
            lskelvin Lee Kelvin added a comment - - edited Thanks Chris, this is a fantastic ticket. I've tested it by building some Merian DECam biases using this pipetask run: INSTTYPE="instrument='DECam' AND exposure.observation_type='zero'" DATERANGE="exposure.day_obs > 20210901 AND exposure.day_obs < 20210930"   pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \ --output u/lskelvin/scratch/DM-22521-ticket \ -p $CP_PIPE_DIR/pipelines/cpBias.yaml \ -d "$INSTTYPE AND $DATERANGE AND detector=1" \ -c isr:overscan.fitType='MEDIAN_PER_ROW' \ -c isr:doEmpiricalReadNoise=True I ran the above twice: once on w16, and once on w16+ DM-22521 . As you reported above, the output bias images look identical before and after. I was keen to check the memory usage, in Python: refs = list ( set (butler.registry.queryDatasets( 'cpBiasCombine_metadata' , collections = 'u/lskelvin/scratch/DM-22521-vanilla' , components = False ))) meta = butler.getDirect(refs[ 0 ]) mem_start_vanilla = meta[ 'quantum.startMaxResidentSetSize' ] mem_end_vanilla = meta[ 'quantum.endMaxResidentSetSize' ] mem_used_vanilla = (mem_end_vanilla - mem_start_vanilla) print (f '{mem_used_vanilla = }' ) finding these results: mem_used_vanilla = 9129881600 mem_used_ticket = 550572032 mem_used_ticket / mem_used_vanilla = 0.060 So, all things considered, a huge success. I'm hopeful that this ticket will significantly reduce out-of-memory issues previously experienced which necessitated the use of afterburners to clear up any failed quanta. Thanks for putting this together! I don't have any significant comments on the PR, and as Jenkins doesn't complain, then I think this looks good to merge to me - congrats!
            Hide
            erykoff Eli Rykoff added a comment -

            Lee- thanks for your detailed report! I’m glad it uses about 5% of the memory as expected from the default config. I know that Jim also wondering about the running time comparison. Is that in the metadata as well?

            Show
            erykoff Eli Rykoff added a comment - Lee- thanks for your detailed report! I’m glad it uses about 5% of the memory as expected from the default config. I know that Jim also wondering about the running time comparison. Is that in the metadata as well?
            Hide
            lskelvin Lee Kelvin added a comment - - edited

            Eli Rykoff, checking the running time using something similar to:

            meta['quantum.endCpuTime'] - meta['quantum.startCpuTime']
            meta['quantum.endUserTime'] - meta['quantum.startUserTime']
            

            gives these times in w16:

            CPU: 93.02
            Usr: 89.80
            

            and these on w16+DM-22521:

            CPU: 119.29
            Usr: 116.46
            

            I should also add, with regards memory usage, there has been some discussion as to whether we should subtract the initial 'mem_used_start' values, or just use the 'mem_used_end' values instead of the difference to report memory usage. If I limit myself to looking at the end values only, the memory usage numbers change to this:

            mem_end_vanilla = 9491832832
            mem_end_ticket = 912302080
            mem_end_ticket / mem_end_vanilla = 0.096
            

            So, about a 30% increase in data processing time, but anywhere from a 90-95% reduction in memory usage. I think the trade-off is worth it here, but I'm keen to hear your thoughts.

            Show
            lskelvin Lee Kelvin added a comment - - edited Eli Rykoff , checking the running time using something similar to: meta[ 'quantum.endCpuTime' ] - meta[ 'quantum.startCpuTime' ] meta[ 'quantum.endUserTime' ] - meta[ 'quantum.startUserTime' ] gives these times in w16: CPU: 93.02 Usr: 89.80 and these on w16+ DM-22521 : CPU: 119.29 Usr: 116.46 I should also add, with regards memory usage, there has been some discussion as to whether we should subtract the initial ' mem_used_start ' values, or just use the ' mem_used_end ' values instead of the difference to report memory usage. If I limit myself to looking at the end values only, the memory usage numbers change to this: mem_end_vanilla = 9491832832 mem_end_ticket = 912302080 mem_end_ticket / mem_end_vanilla = 0.096 So, about a 30% increase in data processing time, but anywhere from a 90-95% reduction in memory usage. I think the trade-off is worth it here, but I'm keen to hear your thoughts.
            Hide
            erykoff Eli Rykoff added a comment -

            Thanks! We were just curious. I definitely think the trade-off is worth it, but we didn't know exactly what that trade-off was.

            Show
            erykoff Eli Rykoff added a comment - Thanks! We were just curious. I definitely think the trade-off is worth it, but we didn't know exactly what that trade-off was.
            Hide
            lskelvin Lee Kelvin added a comment -

            PS: I should add, the above is for one DECam detector combining 88 'zero' raws. I'm not sure how much these numbers may vary when moving to different instruments, or with a larger number of input datasets.

            Show
            lskelvin Lee Kelvin added a comment - PS: I should add, the above is for one DECam detector combining 88 'zero' raws. I'm not sure how much these numbers may vary when moving to different instruments, or with a larger number of input datasets.

              People

              Assignee:
              czw Christopher Waters
              Reporter:
              czw Christopher Waters
              Reviewers:
              Lee Kelvin
              Watchers:
              Christopher Waters, Eli Rykoff, James Chiang, Lee Kelvin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.