# Fix unexpected floating point values in drpAssociation task

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s: None
2
• Team: Data Release Production
No

#### Description

As brought up on this slack thread, when running step5 for tract 3828 on IDF (monitoring output at https://panda-doma.cern.ch/jobs/?jeditaskid=7004&jobstatus=failed&display_limit=100 ) a large number of drpAssociation jobs (10 of 49) failed due to unexpected floating point values.  An example stderr is here, with the most relevant lines being the following:

 File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-147-g0e635eb1+1acddb5be5/python/lsst/pipe/tasks/simpleAssociation.py", line 191, in run  diaSources.set_index("diaSourceId", inplace=True, verify_integrity=True)  File "/opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-0.7.0/lib/python3.8/site-packages/pandas/core/frame.py", line 4779, in set_index  raise ValueError(f"Index has duplicate keys: {duplicates}") ValueError: Index has duplicate keys: Float64Index([1.0374117693849605e+17, 1.0374117693849606e+17,  1.1895663545548805e+17, 1.1895663545548806e+17,  2.4602324290764803e+17, 2.4602324290764806e+17,  1.0374117747536699e+17, 1.03741177475367e+17,  1.0374117747536702e+17, 2.4602324344451894e+17,  ...  1.0163888976822278e+17, 1.2930653275501376e+17,  1.4244108015396456e+17, 1.4244108015396458e+17,  1.0163889137883547e+17, 1.0163889137883549e+17,  1.424410806908355e+17, 1.4244108069083552e+17,  1.4244108069083554e+17, 1.4244108069083555e+17],  dtype='float64', name='diaSourceId', length=593) 

The slack discussion indicated a connection to the problem in RFC-808 (Set sentinel values for non-floating point columns for missing bands in Object tables), but in a different context.

#### Activity

Brock Brendal [X] (Inactive) added a comment -

I also encountered this problem with DC2 w40 also for tract 3828. Running tract 3829 now. Here's an example:

  File "/software/lsstsw/stack_20210813/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-151-g12957622+e8b61e2e81/python/lsst/pipe/tasks/simpleAssociation.py", line 191, in run  diaSources.set_index("diaSourceId", inplace=True, verify_integrity=True)  File "/software/lsstsw/stack_20210813/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe/lib/python3.8/site-packages/pandas/core/frame.py", line 4779, in set_index  raise ValueError(f"Index has duplicate keys: {duplicates}") ValueError: Index has duplicate keys: Float64Index([ 2.021647709739424e+16, 2.0217044569948304e+16,  2.0217045106819164e+16, 2.021704510681917e+16,  2.0217045106819172e+16, 2.0217045106819176e+16,  2.021704510681918e+16, 2.021704618056096e+16,  2.0217046717431844e+16, 2.021704671743185e+16,  ...  2.5749409553291677e+17, 2.6680346667489702e+17,  2.668034672117679e+17, 2.6680346828550963e+17,  2.6680346828550966e+17, 2.668034688223806e+17,  2.668190289520231e+17, 2.6681902895202314e+17,  2.668190327101195e+17, 2.6681903271011952e+17],  dtype='float64', name='diaSourceId', length=529)  

Path to the jobs is /scratch/brendal4/bps-gen3-dc2_OLD/submit/2.2i/runs/test-med-1/w_2021_40/DM-32024/20211014T205139Z/jobs/drpAssociation/3828

Brock Brendal [X] (Inactive) added a comment - I also encountered this problem with DC2 w40 also for tract 3828. Running tract 3829 now. Here's an example: File "/software/lsstsw/stack_20210813/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-151-g12957622+e8b61e2e81/python/lsst/pipe/tasks/simpleAssociation.py" , line 191 , in run diaSources.set_index( "diaSourceId" , inplace=True, verify_integrity=True) File "/software/lsstsw/stack_20210813/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe/lib/python3.8/site-packages/pandas/core/frame.py" , line 4779 , in set_index raise ValueError(f "Index has duplicate keys: {duplicates}" ) ValueError: Index has duplicate keys: Float64Index([ 2 .021647709739424e+ 16 , 2 .0217044569948304e+ 16 , 2 .0217045106819164e+ 16 , 2 .021704510681917e+ 16 , 2 .0217045106819172e+ 16 , 2 .0217045106819176e+ 16 , 2 .021704510681918e+ 16 , 2 .021704618056096e+ 16 , 2 .0217046717431844e+ 16 , 2 .021704671743185e+ 16 , ... 2 .5749409553291677e+ 17 , 2 .6680346667489702e+ 17 , 2 .668034672117679e+ 17 , 2 .6680346828550963e+ 17 , 2 .6680346828550966e+ 17 , 2 .668034688223806e+ 17 , 2 .668190289520231e+ 17 , 2 .6681902895202314e+ 17 , 2 .668190327101195e+ 17 , 2 .6681903271011952e+ 17 ], dtype= 'float64' , name= 'diaSourceId' , length= 529 ) Path to the jobs is /scratch/brendal4/bps-gen3-dc2_OLD/submit/2.2i/runs/test-med-1/w_2021_40/ DM-32024 /20211014T205139Z/jobs/drpAssociation/3828
Assigning to me because I spent some time trying to reproduce this on Friday and am going to assume that as the DM-38125 author I could prob fix it the fastest.

Yusra AlSayyad added a comment - Assigning to me because I spent some time trying to reproduce this on Friday and am going to assume that as the DM-38125 author I could prob fix it the fastest.
What's different in w_2021_36 and w_2021_40 is the schema of the {{goodSeeingDiff_diaSrcTable}}s when they have no rows.

Looks like I screwed something up in DM-31825. because they were fine in w36 and have funny column names in w40:
w36 vs w40 empty goodSeeing_diaSrcTables.html

Removing the line that in drpAssociation fixes up this symptom and makes this reported failure go away.
I also want to also fix-up the bad transforms on empty diaSource tables.

Yusra AlSayyad added a comment - What's different in w_2021_36 and w_2021_40 is the schema of the {{goodSeeingDiff_diaSrcTable}}s when they have no rows. Looks like I screwed something up in DM-31825 . because they were fine in w36 and have funny column names in w40: w36 vs w40 empty goodSeeing_diaSrcTables.html Removing the line that in drpAssociation fixes up this symptom and makes this reported failure go away. I also want to also fix-up the bad transforms on empty diaSource tables.
The root cause was actually the pixelID functor producing something not shaped like a column when no rows were present. The pixelId functor was removed between w_2021_41 and w_2021_42, so we wouldn't see this again even without this ticket. We need to backport DM-32046 too. Note, the referenceBand functor would suffer the same fate, but I'll take care of that on DM-32306.

Robustness layer #1) Add a check that the functor outputs something that is shaped like a column. Raise a RuntimeError if it isn't. This would prevent the empty table with the garbage schema from being written out in the first place.

Robustness layer #2) in drpAssociation, remove the line that concatenates an empty DataFrame, if no diaSources overlap patch. Not necessary.

Ran this on all of DC2 step4 and step5 combined with DM-32124.

Yusra AlSayyad added a comment - - edited This tickets adds simply adds couple layers of robustness. The root cause was actually the pixelID functor producing something not shaped like a column when no rows were present. The pixelId functor was removed between w_2021_41 and w_2021_42, so we wouldn't see this again even without this ticket. We need to backport DM-32046 too. Note, the referenceBand functor would suffer the same fate, but I'll take care of that on DM-32306 . Robustness layer #1) Add a check that the functor outputs something that is shaped like a column. Raise a RuntimeError if it isn't. This would prevent the empty table with the garbage schema from being written out in the first place. Robustness layer #2) in drpAssociation, remove the line that concatenates an empty DataFrame, if no diaSources overlap patch. Not necessary. Ran this on all of DC2 step4 and step5 combined with DM-32124 .
Chris, do you have time to review this either today or tomorrow? I'd like to get it in for the w_2021_44 weekly.

Note the ONLY changes are on pipe_tasks.

Jenkins passes here: https://ci.lsst.codes/job/stack-os-matrix/35232/display/redirect

Yusra AlSayyad added a comment - Chris, do you have time to review this either today or tomorrow? I'd like to get it in for the w_2021_44 weekly. Note the ONLY changes are on pipe_tasks. Jenkins passes here: https://ci.lsst.codes/job/stack-os-matrix/35232/display/redirect
Chris Morrison [X] (Inactive) added a comment -

One more Jenkins with ci_hsc run and you should be good per our pair coding conversation.

Chris Morrison [X] (Inactive) added a comment - One more Jenkins with ci_hsc run and you should be good per our pair coding conversation.
After testing how the downstream tasks handle empty the tables, it turns out that drpDiaCalculation isn't ready to take empty tables and turn them into goodSeeingDiff_fullDiaObjTable with the same schemas as those with data. It returns a goodSeeingDiff_fullDiaObjTable as a Series{{Index([], dtype='object') Series([], dtype: object)

I'm going to flip the new doWriteEmptyTables to False by default and re-Jenkins.

Yusra AlSayyad added a comment - After testing how the downstream tasks handle empty the tables, it turns out that drpDiaCalculation isn't ready to take empty tables and turn them into goodSeeingDiff_fullDiaObjTable with the same schemas as those with data. It returns a goodSeeingDiff_fullDiaObjTable as a Series{{Index([], dtype='object') Series([], dtype: object) I'm going to flip the new doWriteEmptyTables to False by default and re-Jenkins.

