# Fix unexpected floating point values in drpAssociation task

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s: None
• Labels:
• Story Points:
2
• Team:
Data Release Production
• Urgent?:
No

#### Description

As brought up on this slack thread, when running step5 for tract 3828 on IDF (monitoring output at https://panda-doma.cern.ch/jobs/?jeditaskid=7004&jobstatus=failed&display_limit=100 ) a large number of drpAssociation jobs (10 of 49) failed due to unexpected floating point values.  An example stderr is here, with the most relevant lines being the following:

 File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-147-g0e635eb1+1acddb5be5/python/lsst/pipe/tasks/simpleAssociation.py", line 191, in run  diaSources.set_index("diaSourceId", inplace=True, verify_integrity=True)  File "/opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-0.7.0/lib/python3.8/site-packages/pandas/core/frame.py", line 4779, in set_index  raise ValueError(f"Index has duplicate keys: {duplicates}") ValueError: Index has duplicate keys: Float64Index([1.0374117693849605e+17, 1.0374117693849606e+17,  1.1895663545548805e+17, 1.1895663545548806e+17,  2.4602324290764803e+17, 2.4602324290764806e+17,  1.0374117747536699e+17, 1.03741177475367e+17,  1.0374117747536702e+17, 2.4602324344451894e+17,  ...  1.0163888976822278e+17, 1.2930653275501376e+17,  1.4244108015396456e+17, 1.4244108015396458e+17,  1.0163889137883547e+17, 1.0163889137883549e+17,  1.424410806908355e+17, 1.4244108069083552e+17,  1.4244108069083554e+17, 1.4244108069083555e+17],  dtype='float64', name='diaSourceId', length=593) 

The slack discussion indicated a connection to the problem in RFC-808 (Set sentinel values for non-floating point columns for missing bands in Object tables), but in a different context.

#### Activity

Hide
Brock Brendal [X] (Inactive) added a comment -

I also encountered this problem with DC2 w40 also for tract 3828. Running tract 3829 now. Here's an example:

  File "/software/lsstsw/stack_20210813/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-151-g12957622+e8b61e2e81/python/lsst/pipe/tasks/simpleAssociation.py", line 191, in run  diaSources.set_index("diaSourceId", inplace=True, verify_integrity=True)  File "/software/lsstsw/stack_20210813/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe/lib/python3.8/site-packages/pandas/core/frame.py", line 4779, in set_index  raise ValueError(f"Index has duplicate keys: {duplicates}") ValueError: Index has duplicate keys: Float64Index([ 2.021647709739424e+16, 2.0217044569948304e+16,  2.0217045106819164e+16, 2.021704510681917e+16,  2.0217045106819172e+16, 2.0217045106819176e+16,  2.021704510681918e+16, 2.021704618056096e+16,  2.0217046717431844e+16, 2.021704671743185e+16,  ...  2.5749409553291677e+17, 2.6680346667489702e+17,  2.668034672117679e+17, 2.6680346828550963e+17,  2.6680346828550966e+17, 2.668034688223806e+17,  2.668190289520231e+17, 2.6681902895202314e+17,  2.668190327101195e+17, 2.6681903271011952e+17],  dtype='float64', name='diaSourceId', length=529)  

Path to the jobs is /scratch/brendal4/bps-gen3-dc2_OLD/submit/2.2i/runs/test-med-1/w_2021_40/DM-32024/20211014T205139Z/jobs/drpAssociation/3828

Show
Brock Brendal [X] (Inactive) added a comment - I also encountered this problem with DC2 w40 also for tract 3828. Running tract 3829 now. Here's an example: File "/software/lsstsw/stack_20210813/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-151-g12957622+e8b61e2e81/python/lsst/pipe/tasks/simpleAssociation.py" , line 191 , in run diaSources.set_index( "diaSourceId" , inplace=True, verify_integrity=True) File "/software/lsstsw/stack_20210813/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe/lib/python3.8/site-packages/pandas/core/frame.py" , line 4779 , in set_index raise ValueError(f "Index has duplicate keys: {duplicates}" ) ValueError: Index has duplicate keys: Float64Index([ 2 .021647709739424e+ 16 , 2 .0217044569948304e+ 16 , 2 .0217045106819164e+ 16 , 2 .021704510681917e+ 16 , 2 .0217045106819172e+ 16 , 2 .0217045106819176e+ 16 , 2 .021704510681918e+ 16 , 2 .021704618056096e+ 16 , 2 .0217046717431844e+ 16 , 2 .021704671743185e+ 16 , ... 2 .5749409553291677e+ 17 , 2 .6680346667489702e+ 17 , 2 .668034672117679e+ 17 , 2 .6680346828550963e+ 17 , 2 .6680346828550966e+ 17 , 2 .668034688223806e+ 17 , 2 .668190289520231e+ 17 , 2 .6681902895202314e+ 17 , 2 .668190327101195e+ 17 , 2 .6681903271011952e+ 17 ], dtype= 'float64' , name= 'diaSourceId' , length= 529 ) Path to the jobs is /scratch/brendal4/bps-gen3-dc2_OLD/submit/2.2i/runs/test-med-1/w_2021_40/ DM-32024 /20211014T205139Z/jobs/drpAssociation/3828
Hide

Assigning to me because I spent some time trying to reproduce this on Friday and am going to assume that as the DM-38125 author I could prob fix it the fastest.

Show
Yusra AlSayyad added a comment - Assigning to me because I spent some time trying to reproduce this on Friday and am going to assume that as the DM-38125 author I could prob fix it the fastest.
Hide

What's different in w_2021_36 and w_2021_40 is the schema of the {{goodSeeingDiff_diaSrcTable}}s when they have no rows.

Looks like I screwed something up in DM-31825. because they were fine in w36 and have funny column names in w40:
w36 vs w40 empty goodSeeing_diaSrcTables.html

Removing the line that in drpAssociation fixes up this symptom and makes this reported failure go away.
I also want to also fix-up the bad transforms on empty diaSource tables.

Show
Yusra AlSayyad added a comment - What's different in w_2021_36 and w_2021_40 is the schema of the {{goodSeeingDiff_diaSrcTable}}s when they have no rows. Looks like I screwed something up in DM-31825 . because they were fine in w36 and have funny column names in w40: w36 vs w40 empty goodSeeing_diaSrcTables.html Removing the line that in drpAssociation fixes up this symptom and makes this reported failure go away. I also want to also fix-up the bad transforms on empty diaSource tables.
Hide

The root cause was actually the pixelID functor producing something not shaped like a column when no rows were present. The pixelId functor was removed between w_2021_41 and w_2021_42, so we wouldn't see this again even without this ticket. We need to backport DM-32046 too. Note, the referenceBand functor would suffer the same fate, but I'll take care of that on DM-32306.

Robustness layer #1) Add a check that the functor outputs something that is shaped like a column. Raise a RuntimeError if it isn't. This would prevent the empty table with the garbage schema from being written out in the first place.

Robustness layer #2) in drpAssociation, remove the line that concatenates an empty DataFrame, if no diaSources overlap patch. Not necessary.

Ran this on all of DC2 step4 and step5 combined with DM-32124.

Show
Yusra AlSayyad added a comment - - edited This tickets adds simply adds couple layers of robustness. The root cause was actually the pixelID functor producing something not shaped like a column when no rows were present. The pixelId functor was removed between w_2021_41 and w_2021_42, so we wouldn't see this again even without this ticket. We need to backport DM-32046 too. Note, the referenceBand functor would suffer the same fate, but I'll take care of that on DM-32306 . Robustness layer #1) Add a check that the functor outputs something that is shaped like a column. Raise a RuntimeError if it isn't. This would prevent the empty table with the garbage schema from being written out in the first place. Robustness layer #2) in drpAssociation, remove the line that concatenates an empty DataFrame, if no diaSources overlap patch. Not necessary. Ran this on all of DC2 step4 and step5 combined with DM-32124 .
Hide

Chris, do you have time to review this either today or tomorrow? I'd like to get it in for the w_2021_44 weekly.

Note the ONLY changes are on pipe_tasks.

Jenkins passes here: https://ci.lsst.codes/job/stack-os-matrix/35232/display/redirect

Show
Yusra AlSayyad added a comment - Chris, do you have time to review this either today or tomorrow? I'd like to get it in for the w_2021_44 weekly. Note the ONLY changes are on pipe_tasks. Jenkins passes here: https://ci.lsst.codes/job/stack-os-matrix/35232/display/redirect
Hide
Chris Morrison [X] (Inactive) added a comment -

One more Jenkins with ci_hsc run and you should be good per our pair coding conversation.

Show
Chris Morrison [X] (Inactive) added a comment - One more Jenkins with ci_hsc run and you should be good per our pair coding conversation.
Hide

After testing how the downstream tasks handle empty the tables, it turns out that drpDiaCalculation isn't ready to take empty tables and turn them into goodSeeingDiff_fullDiaObjTable with the same schemas as those with data. It returns a goodSeeingDiff_fullDiaObjTable as a Series{{Index([], dtype='object') Series([], dtype: object)

I'm going to flip the new doWriteEmptyTables to False by default and re-Jenkins.

Show
Yusra AlSayyad added a comment - After testing how the downstream tasks handle empty the tables, it turns out that drpDiaCalculation isn't ready to take empty tables and turn them into goodSeeingDiff_fullDiaObjTable with the same schemas as those with data. It returns a goodSeeingDiff_fullDiaObjTable as a Series{{Index([], dtype='object') Series([], dtype: object) I'm going to flip the new doWriteEmptyTables to False by default and re-Jenkins.

#### People

Assignee:
Reporter:
Huan Lin
Reviewers:
Chris Morrison [X] (Inactive)
Watchers:
Brock Brendal [X] (Inactive), Chris Morrison [X] (Inactive), Eli Rykoff, Hsin-Fang Chiang, Huan Lin, Kenneth Herner, Yusra AlSayyad