Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22139

AP association bug with >1000 diaSources

    Details

      Description

      I'm seeing a bug in ap_pipe when running association if the number of diaSources is large (>1000). I've posted the log starting with image differencing for one observation where the bug appeared, though this was not the only observation that failed. Other observations that had only ~980 or fewer diaSources ran successfully, while every observation with more than 1000 diaSources that I looked at failed.

      I took my best guess at the story points and epic, so please correct those if I got them wrong!

      apPipe.differencer INFO: Subtracting images
      apPipe.differencer.subtract INFO: Template Wcs : 2.619998,0.050177 -> 2.614735,0.052809
      apPipe.differencer.subtract INFO: Science Wcs : 2.614754,0.052794 -> 2.619984,0.050191
      apPipe.differencer.subtract INFO: Astrometrically registering template to science image
      apPipe.differencer.subtract INFO: templateFwhmPix: 4.78053065883556
      apPipe.differencer.subtract INFO: scienceFwhmPix: 3.957219530945321
      ip.diffim.generateAlardLuptonBasisList INFO: Target psf fwhm is the greater, deconvolution mode
      apPipe.differencer.subtract.selectDetection INFO: Detected 1020 positive peaks in 898 footprints to 10 sigma
      apPipe.differencer.subtract.selectMeasurement INFO: Measuring 898 sources (898 parents, 0 children) 
      apPipe.differencer.subtract INFO: Growing 898 kernel candidate stars by 21 pixels
      apPipe.differencer.subtract INFO: Selected 272 / 898 sources for KernelCandidacy
      apPipe.differencer.subtract INFO: Matching Psf FWHM 4.78 -> 3.96 pix
      ip.diffim.generateAlardLuptonBasisList INFO: Target psf fwhm is the greater, deconvolution mode
      apPipe.differencer.subtract INFO: Final spatial kernel sum 26.409
      apPipe.differencer.subtract INFO: Spatial model condition number 6.205e+09
      apPipe.differencer.subtract INFO: Doing stats of kernel candidates used in the spatial fit.
      apPipe.differencer.subtract INFO: 272 candidates total, 16 rejected, 237 used
      apPipe.differencer.subtract INFO: Spatial kernel model well constrained; 237 candidates, 3 terms, 30 bases
      apPipe.differencer.subtract INFO: Spatial background model appears well constrained; 237 candidates, 6 terms
      apPipe.differencer INFO: Computing diffim PSF
      apPipe.differencer.decorrelate INFO: Running A&L decorrelation: spatiallyVarying=False
      ip_diffim_decorrelateALKernel INFO: Using matching kernel computed at (1024, 2048)
      ip_diffim_decorrelateALKernel INFO: Variance (science, template): (92.492551, 0.004940)
      ip_diffim_decorrelateALKernel INFO: Variance (uncorrected diffim): 98.240851
      ip_diffim_decorrelateALKernel INFO: Variance (corrected diffim): 97.975441
      apPipe.differencer INFO: Running diaSource detection
      apPipe.differencer.detection INFO: Detected 7429 positive peaks in 1708 footprints and 139 negative peaks in 56 footprints to 5 sigma
      apPipe.differencer INFO: Merging detections into 1228 sources
      apPipe.differencer INFO: Running diaSource measurement: newDipoleFitting=True
      apPipe.differencer.measurement INFO: Measuring 1228 sources (1228 parents, 0 children) 
      apPipe.differencer.forcedMeasurement INFO: Performing forced measurement on 1228 sources
      apPipe INFO: Running Association...
      apPipe.associator.diaCalculation WARN: Input diaSourceCat is indexed on column(s) incompatible with this task. Should be indexed on 'multi-index, ['diaObjectId', 'filterName', 'diaSourceId'].
      apPipe FATAL: Failed on dataId={'ccdnum': 5, 'filter': 'g', 'visit': 288970, 'date': '2014-03-01', 'hdu': 31, 'object': 'Blind14A_04'}: KeyError: 124094082061435674
      /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pex_config/18.1.0-5-gc286bb7+4/python/lsst/pex/config/config.py:1279: FutureWarning: Config field ccdProcessor.isr.doAddDistortionModel is deprecated: Camera geometry is incorporated when reading the raw files. This option no longer is used, and will be removed after v19.
        FutureWarning)
      /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/mapApData.py:185: YAMLLoadWarning: calling yaml.load_all() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
        table_list = list(yaml.load_all(yaml_stream))
      /software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/astropy/units/function/logarithmic.py:46: RuntimeWarning: invalid value encountered in log10
        return dex.to(self._function_unit, np.log10(x))
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
          return self._engine.get_loc(key)
        File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
        File "pandas/_libs/index.pyx", line 127, in pandas._libs.index.IndexEngine.get_loc
        File "pandas/_libs/index.pyx", line 153, in pandas._libs.index.IndexEngine._get_loc_duplicates
        File "pandas/_libs/index_class_helper.pxi", line 122, in pandas._libs.index.Int64Engine._maybe_get_bool_indexer
      KeyError: 124094082061435674
       
      During handling of the above exception, another exception occurred:
       
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/cmdLineTask.py", line 388, in __call__
          result = self.runTask(task, dataRef, kwargs)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/cmdLineTask.py", line 447, in runTask
          return task.runDataRef(dataRef, **kwargs)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper
          res = func(self, *args, **keyArgs)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_pipe/18.1.0-4-gcd8f52c+37/python/lsst/ap/pipe/ap_pipe.py", line 218, in runDataRef
          associationResults = self.runAssociation(calexpRef)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper
          res = func(self, *args, **keyArgs)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_pipe/18.1.0-4-gcd8f52c+37/python/lsst/ap/pipe/ap_pipe.py", line 306, in runAssociation
          results = self.associator.run(dia_sources, diffim, self.ppdb)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper
          res = func(self, *args, **keyArgs)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/association.py", line 139, in run
          ppdb)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper
          res = func(self, *args, **keyArgs)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/association.py", line 315, in update_dia_objects
          filter_name)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper
          res = func(self, *args, **keyArgs)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/diaCalculation.py", line 331, in run
          filterName)
        File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/diaCalculation.py", line 392, in callCompute
          objDiaSources = diaSourceCat.loc[objId]
        File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexing.py", line 1500, in __getitem__
          return self._getitem_axis(maybe_callable, axis=axis)
        File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexing.py", line 1913, in _getitem_axis
          return self._get_label(key, axis=axis)
        File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexing.py", line 141, in _get_label
          return self.obj._xs(label, axis=axis)
        File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/generic.py", line 3585, in xs
          loc = self.index.get_loc(key)
        File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
          return self._engine.get_loc(self._maybe_cast_indexer(key))
        File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
        File "pandas/_libs/index.pyx", line 127, in pandas._libs.index.IndexEngine.get_loc
        File "pandas/_libs/index.pyx", line 153, in pandas._libs.index.IndexEngine._get_loc_duplicates
        File "pandas/_libs/index_class_helper.pxi", line 122, in pandas._libs.index.Int64Engine._maybe_get_bool_indexer
      KeyError: 124094082061435674
      

        Attachments

          Activity

          Hide
          cmorrison Chris Morrison added a comment -

          I don't think this is due to the length of the diaSource catalog. If you look at the output associateor.diaCalculation throws a WARN about the catalog not being indexed properly. Could you check what the indexing of this dataset is?

          If you could please add a pdb to line 310 in association.py. Once there you can ask what dia_sources.index is. 

          Show
          cmorrison Chris Morrison added a comment - I don't think this is due to the length of the diaSource catalog. If you look at the output associateor.diaCalculation throws a WARN about the catalog not being indexed properly. Could you check what the indexing of this dataset is? If you could please add a pdb to line 310 in association.py. Once there you can ask what dia_sources.index is. 
          Hide
          sullivan Ian Sullivan added a comment - - edited

          print(dia_sources.index) after line 310 on that observation gives:

          Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                      ...
                      218, 219, 220, 221, 222, 223, 224, 225, 226, 227],
                     dtype='int64', length=1228)
          

          Show
          sullivan Ian Sullivan added a comment - - edited print(dia_sources.index) after line 310 on that observation gives: Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 218, 219, 220, 221, 222, 223, 224, 225, 226, 227], dtype='int64', length=1228)
          Hide
          cmorrison Chris Morrison added a comment -

          Cool, there's the culprit. Not sure why it's setting that as Int64 instead of RangeIndex but maybe that's a pandas versioning thing? This still produces the crash I assume?

          Anywho, I've added a check for this index now and pushed it to this ticket branch. Pull it and let me know if it resolves the issue.

          Show
          cmorrison Chris Morrison added a comment - Cool, there's the culprit. Not sure why it's setting that as Int64 instead of RangeIndex but maybe that's a pandas versioning thing? This still produces the crash I assume? Anywho, I've added a check for this index now and pushed it to this ticket branch. Pull it and let me know if it resolves the issue.
          Show
          cmorrison Chris Morrison added a comment - Jenkins:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30720/pipeline
          Hide
          sullivan Ian Sullivan added a comment -

          I've verified that this fixes the original bug. It's a little worrisome that it could be either a Int64Index or a RangeIndex, so if it was straightforward to enforce the type at an earlier step I would suggest you do that instead. However, I don't think this is worth investing very much time in, so I'm fine with it as it is.

          Show
          sullivan Ian Sullivan added a comment - I've verified that this fixes the original bug. It's a little worrisome that it could be either a Int64Index or a RangeIndex, so if it was straightforward to enforce the type at an earlier step I would suggest you do that instead. However, I don't think this is worth investing very much time in, so I'm fine with it as it is.
          Hide
          cmorrison Chris Morrison added a comment -

          Sorry for the delay. If simplified the `if` statement on pull request to check more edge cases that isinstance likely wouldn't catch. I'm about to add a unittest for the case of reading >1000 diaSources from the ppdb but do you mind trying your pipeline again to see the new change performs as expect for you, Ian Sullivan

          Show
          cmorrison Chris Morrison added a comment - Sorry for the delay. If simplified the `if` statement on pull request to check more edge cases that isinstance likely wouldn't catch. I'm about to add a unittest for the case of reading >1000 diaSources from the ppdb but do you mind trying your pipeline again to see the new change performs as expect for you, Ian Sullivan
          Hide
          sullivan Ian Sullivan added a comment -

          I've tested the latest version you pushed, and it also fixes the bug I was seeing.

          For my own future reference, the commands I'm running to test this are:

          output="/project/sullivan/test_hits2014/ap_pipe/cw_processed"
          make_ppdb.py -c ppdb.isolation_level=READ_UNCOMMITTED -c ppdb.db_url="sqlite:///$output/association.db"
           
          rerun="/project/sullivan/test_hits2014/ap_pipe/cw_processed"
          calib="/project/mrawls/hits2014/ingested/calib2"
          template="/project/sullivan/hits2015_template/cwcoadds_processed"
          db_config=(-c ppdb.db_url="sqlite:////project/sullivan/test_hits2014/ap_pipe/cw_processed/association.db" -c ppdb.isolation_level="READ_UNCOMMITTED" -c ppdb.connection_timeout=240)
          more_config=(-C /project/sullivan/test_hits2014/ap_pipe_cw_config.py)  # UPDATE THIS FILE WITH YOUR CONFIGS!!!
           
          # Command to run
          ap_pipe.py /project/mrawls/hits2014/ingested/rerun/processed2/ --calib /project/mrawls/hits2014/ingested/calib2 --template /project/sullivan/hits2015_template/cwcoadds_processed --rerun /project/sullivan/test_hits2014/ap_pipe/cw_processed -c ppdb.db_url="sqlite:////project/sullivan/test_hits2014/ap_pipe/cw_processed/association.db" -c ppdb.isolation_level="READ_UNCOMMITTED" -c ppdb.connection_timeout=240 -C /project/sullivan/test_hits2014/ap_pipe_cw_config.py --id ccdnum=5 filter=g visit=288970 --clobber-versions --clobber-config
          

          Show
          sullivan Ian Sullivan added a comment - I've tested the latest version you pushed, and it also fixes the bug I was seeing. For my own future reference, the commands I'm running to test this are: output="/project/sullivan/test_hits2014/ap_pipe/cw_processed" make_ppdb.py -c ppdb.isolation_level=READ_UNCOMMITTED -c ppdb.db_url="sqlite:///$output/association.db"   rerun="/project/sullivan/test_hits2014/ap_pipe/cw_processed" calib="/project/mrawls/hits2014/ingested/calib2" template="/project/sullivan/hits2015_template/cwcoadds_processed" db_config=(-c ppdb.db_url="sqlite:////project/sullivan/test_hits2014/ap_pipe/cw_processed/association.db" -c ppdb.isolation_level="READ_UNCOMMITTED" -c ppdb.connection_timeout=240) more_config=(-C /project/sullivan/test_hits2014/ap_pipe_cw_config.py) # UPDATE THIS FILE WITH YOUR CONFIGS!!!   # Command to run ap_pipe.py /project/mrawls/hits2014/ingested/rerun/processed2/ --calib /project/mrawls/hits2014/ingested/calib2 --template /project/sullivan/hits2015_template/cwcoadds_processed --rerun /project/sullivan/test_hits2014/ap_pipe/cw_processed -c ppdb.db_url="sqlite:////project/sullivan/test_hits2014/ap_pipe/cw_processed/association.db" -c ppdb.isolation_level="READ_UNCOMMITTED" -c ppdb.connection_timeout=240 -C /project/sullivan/test_hits2014/ap_pipe_cw_config.py --id ccdnum=5 filter=g visit=288970 --clobber-versions --clobber-config
          Hide
          cmorrison Chris Morrison added a comment - - edited

          New Jenkins run after simpler test: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30772/pipeline

           

          I fully understand why this was happening. dax_ppdb loads diaSources by diaObjectId in batches of 1000. Greater than that and the code concatenates them dataframes together hence the strange indexing that was causing the bug.

          Show
          cmorrison Chris Morrison added a comment - - edited New Jenkins run after simpler test:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30772/pipeline   I fully understand why this was happening. dax_ppdb loads diaSources by diaObjectId in batches of 1000. Greater than that and the code concatenates them dataframes together hence the strange indexing that was causing the bug.

            People

            • Assignee:
              cmorrison Chris Morrison
              Reporter:
              sullivan Ian Sullivan
              Reviewers:
              Ian Sullivan
              Watchers:
              Chris Morrison, Ian Sullivan
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel