# AP association bug with >1000 diaSources

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
2
• Team:

#### Description

I'm seeing a bug in ap_pipe when running association if the number of diaSources is large (>1000). I've posted the log starting with image differencing for one observation where the bug appeared, though this was not the only observation that failed. Other observations that had only ~980 or fewer diaSources ran successfully, while every observation with more than 1000 diaSources that I looked at failed.

I took my best guess at the story points and epic, so please correct those if I got them wrong!

 apPipe.differencer INFO: Subtracting images apPipe.differencer.subtract INFO: Template Wcs : 2.619998,0.050177 -> 2.614735,0.052809 apPipe.differencer.subtract INFO: Science Wcs : 2.614754,0.052794 -> 2.619984,0.050191 apPipe.differencer.subtract INFO: Astrometrically registering template to science image apPipe.differencer.subtract INFO: templateFwhmPix: 4.78053065883556 apPipe.differencer.subtract INFO: scienceFwhmPix: 3.957219530945321 ip.diffim.generateAlardLuptonBasisList INFO: Target psf fwhm is the greater, deconvolution mode apPipe.differencer.subtract.selectDetection INFO: Detected 1020 positive peaks in 898 footprints to 10 sigma apPipe.differencer.subtract.selectMeasurement INFO: Measuring 898 sources (898 parents, 0 children)  apPipe.differencer.subtract INFO: Growing 898 kernel candidate stars by 21 pixels apPipe.differencer.subtract INFO: Selected 272 / 898 sources for KernelCandidacy apPipe.differencer.subtract INFO: Matching Psf FWHM 4.78 -> 3.96 pix ip.diffim.generateAlardLuptonBasisList INFO: Target psf fwhm is the greater, deconvolution mode apPipe.differencer.subtract INFO: Final spatial kernel sum 26.409 apPipe.differencer.subtract INFO: Spatial model condition number 6.205e+09 apPipe.differencer.subtract INFO: Doing stats of kernel candidates used in the spatial fit. apPipe.differencer.subtract INFO: 272 candidates total, 16 rejected, 237 used apPipe.differencer.subtract INFO: Spatial kernel model well constrained; 237 candidates, 3 terms, 30 bases apPipe.differencer.subtract INFO: Spatial background model appears well constrained; 237 candidates, 6 terms apPipe.differencer INFO: Computing diffim PSF apPipe.differencer.decorrelate INFO: Running A&L decorrelation: spatiallyVarying=False ip_diffim_decorrelateALKernel INFO: Using matching kernel computed at (1024, 2048) ip_diffim_decorrelateALKernel INFO: Variance (science, template): (92.492551, 0.004940) ip_diffim_decorrelateALKernel INFO: Variance (uncorrected diffim): 98.240851 ip_diffim_decorrelateALKernel INFO: Variance (corrected diffim): 97.975441 apPipe.differencer INFO: Running diaSource detection apPipe.differencer.detection INFO: Detected 7429 positive peaks in 1708 footprints and 139 negative peaks in 56 footprints to 5 sigma apPipe.differencer INFO: Merging detections into 1228 sources apPipe.differencer INFO: Running diaSource measurement: newDipoleFitting=True apPipe.differencer.measurement INFO: Measuring 1228 sources (1228 parents, 0 children)  apPipe.differencer.forcedMeasurement INFO: Performing forced measurement on 1228 sources apPipe INFO: Running Association... apPipe.associator.diaCalculation WARN: Input diaSourceCat is indexed on column(s) incompatible with this task. Should be indexed on 'multi-index, ['diaObjectId', 'filterName', 'diaSourceId']. apPipe FATAL: Failed on dataId={'ccdnum': 5, 'filter': 'g', 'visit': 288970, 'date': '2014-03-01', 'hdu': 31, 'object': 'Blind14A_04'}: KeyError: 124094082061435674 /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pex_config/18.1.0-5-gc286bb7+4/python/lsst/pex/config/config.py:1279: FutureWarning: Config field ccdProcessor.isr.doAddDistortionModel is deprecated: Camera geometry is incorporated when reading the raw files. This option no longer is used, and will be removed after v19.  FutureWarning) /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/mapApData.py:185: YAMLLoadWarning: calling yaml.load_all() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.  table_list = list(yaml.load_all(yaml_stream)) /software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/astropy/units/function/logarithmic.py:46: RuntimeWarning: invalid value encountered in log10  return dex.to(self._function_unit, np.log10(x)) Traceback (most recent call last):  File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc  return self._engine.get_loc(key)  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc  File "pandas/_libs/index.pyx", line 127, in pandas._libs.index.IndexEngine.get_loc  File "pandas/_libs/index.pyx", line 153, in pandas._libs.index.IndexEngine._get_loc_duplicates  File "pandas/_libs/index_class_helper.pxi", line 122, in pandas._libs.index.Int64Engine._maybe_get_bool_indexer KeyError: 124094082061435674   During handling of the above exception, another exception occurred:   Traceback (most recent call last):  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/cmdLineTask.py", line 388, in __call__  result = self.runTask(task, dataRef, kwargs)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/cmdLineTask.py", line 447, in runTask  return task.runDataRef(dataRef, **kwargs)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper  res = func(self, *args, **keyArgs)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_pipe/18.1.0-4-gcd8f52c+37/python/lsst/ap/pipe/ap_pipe.py", line 218, in runDataRef  associationResults = self.runAssociation(calexpRef)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper  res = func(self, *args, **keyArgs)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_pipe/18.1.0-4-gcd8f52c+37/python/lsst/ap/pipe/ap_pipe.py", line 306, in runAssociation  results = self.associator.run(dia_sources, diffim, self.ppdb)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper  res = func(self, *args, **keyArgs)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/association.py", line 139, in run  ppdb)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper  res = func(self, *args, **keyArgs)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/association.py", line 315, in update_dia_objects  filter_name)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/pipe_base/18.1.0-9-gee19f03+2/python/lsst/pipe/base/timer.py", line 150, in wrapper  res = func(self, *args, **keyArgs)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/diaCalculation.py", line 331, in run  filterName)  File "/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/ap_association/18.1.0-14-g371438c+2/python/lsst/ap/association/diaCalculation.py", line 392, in callCompute  objDiaSources = diaSourceCat.loc[objId]  File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexing.py", line 1500, in __getitem__  return self._getitem_axis(maybe_callable, axis=axis)  File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexing.py", line 1913, in _getitem_axis  return self._get_label(key, axis=axis)  File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexing.py", line 141, in _get_label  return self.obj._xs(label, axis=axis)  File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/generic.py", line 3585, in xs  loc = self.index.get_loc(key)  File "/software/lsstsw/stack_20191101/python/miniconda3-4.5.12/envs/lsst-scipipe/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc  return self._engine.get_loc(self._maybe_cast_indexer(key))  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc  File "pandas/_libs/index.pyx", line 127, in pandas._libs.index.IndexEngine.get_loc  File "pandas/_libs/index.pyx", line 153, in pandas._libs.index.IndexEngine._get_loc_duplicates  File "pandas/_libs/index_class_helper.pxi", line 122, in pandas._libs.index.Int64Engine._maybe_get_bool_indexer KeyError: 124094082061435674 

#### Activity

Hide
Chris Morrison added a comment -

I don't think this is due to the length of the diaSource catalog. If you look at the output associateor.diaCalculation throws a WARN about the catalog not being indexed properly. Could you check what the indexing of this dataset is?

If you could please add a pdb to line 310 in association.py. Once there you can ask what dia_sources.index is.

Show
Chris Morrison added a comment - I don't think this is due to the length of the diaSource catalog. If you look at the output associateor.diaCalculation throws a WARN about the catalog not being indexed properly. Could you check what the indexing of this dataset is? If you could please add a pdb to line 310 in association.py. Once there you can ask what dia_sources.index is.
Hide
Ian Sullivan added a comment - - edited

print(dia_sources.index) after line 310 on that observation gives:

 Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,  ...  218, 219, 220, 221, 222, 223, 224, 225, 226, 227],  dtype='int64', length=1228) 

Show
Ian Sullivan added a comment - - edited print(dia_sources.index) after line 310 on that observation gives: Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 218, 219, 220, 221, 222, 223, 224, 225, 226, 227], dtype='int64', length=1228)
Hide
Chris Morrison added a comment -

Cool, there's the culprit. Not sure why it's setting that as Int64 instead of RangeIndex but maybe that's a pandas versioning thing? This still produces the crash I assume?

Anywho, I've added a check for this index now and pushed it to this ticket branch. Pull it and let me know if it resolves the issue.

Show
Chris Morrison added a comment - Cool, there's the culprit. Not sure why it's setting that as Int64 instead of RangeIndex but maybe that's a pandas versioning thing? This still produces the crash I assume? Anywho, I've added a check for this index now and pushed it to this ticket branch. Pull it and let me know if it resolves the issue.
Hide
Chris Morrison added a comment -
Show
Chris Morrison added a comment - Jenkins:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30720/pipeline
Hide
Ian Sullivan added a comment -

I've verified that this fixes the original bug. It's a little worrisome that it could be either a Int64Index or a RangeIndex, so if it was straightforward to enforce the type at an earlier step I would suggest you do that instead. However, I don't think this is worth investing very much time in, so I'm fine with it as it is.

Show
Ian Sullivan added a comment - I've verified that this fixes the original bug. It's a little worrisome that it could be either a Int64Index or a RangeIndex, so if it was straightforward to enforce the type at an earlier step I would suggest you do that instead. However, I don't think this is worth investing very much time in, so I'm fine with it as it is.
Hide
Chris Morrison added a comment -

Sorry for the delay. If simplified the if statement on pull request to check more edge cases that isinstance likely wouldn't catch. I'm about to add a unittest for the case of reading >1000 diaSources from the ppdb but do you mind trying your pipeline again to see the new change performs as expect for you, Ian Sullivan

Show
Chris Morrison added a comment - Sorry for the delay. If simplified the if statement on pull request to check more edge cases that isinstance likely wouldn't catch. I'm about to add a unittest for the case of reading >1000 diaSources from the ppdb but do you mind trying your pipeline again to see the new change performs as expect for you, Ian Sullivan
Hide
Ian Sullivan added a comment -

I've tested the latest version you pushed, and it also fixes the bug I was seeing.

For my own future reference, the commands I'm running to test this are:

 output="/project/sullivan/test_hits2014/ap_pipe/cw_processed" make_ppdb.py -c ppdb.isolation_level=READ_UNCOMMITTED -c ppdb.db_url="sqlite:///$output/association.db"   rerun="/project/sullivan/test_hits2014/ap_pipe/cw_processed" calib="/project/mrawls/hits2014/ingested/calib2" template="/project/sullivan/hits2015_template/cwcoadds_processed" db_config=(-c ppdb.db_url="sqlite:////project/sullivan/test_hits2014/ap_pipe/cw_processed/association.db" -c ppdb.isolation_level="READ_UNCOMMITTED" -c ppdb.connection_timeout=240) more_config=(-C /project/sullivan/test_hits2014/ap_pipe_cw_config.py) # UPDATE THIS FILE WITH YOUR CONFIGS!!!   # Command to run ap_pipe.py /project/mrawls/hits2014/ingested/rerun/processed2/ --calib /project/mrawls/hits2014/ingested/calib2 --template /project/sullivan/hits2015_template/cwcoadds_processed --rerun /project/sullivan/test_hits2014/ap_pipe/cw_processed -c ppdb.db_url="sqlite:////project/sullivan/test_hits2014/ap_pipe/cw_processed/association.db" -c ppdb.isolation_level="READ_UNCOMMITTED" -c ppdb.connection_timeout=240 -C /project/sullivan/test_hits2014/ap_pipe_cw_config.py --id ccdnum=5 filter=g visit=288970 --clobber-versions --clobber-config  Show Ian Sullivan added a comment - I've tested the latest version you pushed, and it also fixes the bug I was seeing. For my own future reference, the commands I'm running to test this are: output="/project/sullivan/test_hits2014/ap_pipe/cw_processed" make_ppdb.py -c ppdb.isolation_level=READ_UNCOMMITTED -c ppdb.db_url="sqlite:///$output/association.db"   rerun="/project/sullivan/test_hits2014/ap_pipe/cw_processed" calib="/project/mrawls/hits2014/ingested/calib2" template="/project/sullivan/hits2015_template/cwcoadds_processed" db_config=(-c ppdb.db_url="sqlite:////project/sullivan/test_hits2014/ap_pipe/cw_processed/association.db" -c ppdb.isolation_level="READ_UNCOMMITTED" -c ppdb.connection_timeout=240) more_config=(-C /project/sullivan/test_hits2014/ap_pipe_cw_config.py) # UPDATE THIS FILE WITH YOUR CONFIGS!!!   # Command to run ap_pipe.py /project/mrawls/hits2014/ingested/rerun/processed2/ --calib /project/mrawls/hits2014/ingested/calib2 --template /project/sullivan/hits2015_template/cwcoadds_processed --rerun /project/sullivan/test_hits2014/ap_pipe/cw_processed -c ppdb.db_url="sqlite:////project/sullivan/test_hits2014/ap_pipe/cw_processed/association.db" -c ppdb.isolation_level="READ_UNCOMMITTED" -c ppdb.connection_timeout=240 -C /project/sullivan/test_hits2014/ap_pipe_cw_config.py --id ccdnum=5 filter=g visit=288970 --clobber-versions --clobber-config
Hide
Chris Morrison added a comment - - edited

New Jenkins run after simpler test: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30772/pipeline

I fully understand why this was happening. dax_ppdb loads diaSources by diaObjectId in batches of 1000. Greater than that and the code concatenates them dataframes together hence the strange indexing that was causing the bug.

Show
Chris Morrison added a comment - - edited New Jenkins run after simpler test:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30772/pipeline   I fully understand why this was happening. dax_ppdb loads diaSources by diaObjectId in batches of 1000. Greater than that and the code concatenates them dataframes together hence the strange indexing that was causing the bug.

#### People

Assignee:
Chris Morrison
Reporter:
Ian Sullivan
Reviewers:
Ian Sullivan
Watchers:
Chris Morrison, Ian Sullivan