Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32238

Fix unexpected floating point values in drpAssociation task

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      As brought up on this slack thread, when running step5 for tract 3828 on IDF (monitoring output at https://panda-doma.cern.ch/jobs/?jeditaskid=7004&jobstatus=failed&display_limit=100 ) a large number of drpAssociation jobs (10 of 49) failed due to unexpected floating point values.  An example stderr is here, with the most relevant lines being the following:

      File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-147-g0e635eb1+1acddb5be5/python/lsst/pipe/tasks/simpleAssociation.py", line 191, in run
          diaSources.set_index("diaSourceId", inplace=True, verify_integrity=True)
        File "/opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-0.7.0/lib/python3.8/site-packages/pandas/core/frame.py", line 4779, in set_index
          raise ValueError(f"Index has duplicate keys: {duplicates}")
      ValueError: Index has duplicate keys: Float64Index([1.0374117693849605e+17, 1.0374117693849606e+17,
                    1.1895663545548805e+17, 1.1895663545548806e+17,
                    2.4602324290764803e+17, 2.4602324290764806e+17,
                    1.0374117747536699e+17,   1.03741177475367e+17,
                    1.0374117747536702e+17, 2.4602324344451894e+17,
                    ...
                    1.0163888976822278e+17, 1.2930653275501376e+17,
                    1.4244108015396456e+17, 1.4244108015396458e+17,
                    1.0163889137883547e+17, 1.0163889137883549e+17,
                     1.424410806908355e+17, 1.4244108069083552e+17,
                    1.4244108069083554e+17, 1.4244108069083555e+17],
                   dtype='float64', name='diaSourceId', length=593)
      

      The slack discussion indicated a connection to the problem in RFC-808 (Set sentinel values for non-floating point columns for missing bands in Object tables), but in a different context.

        Attachments

          Activity

          Hide
          brendal4 Brock Brendal [X] (Inactive) added a comment -

          I also encountered this problem with DC2 w40 also for tract 3828. Running tract 3829 now. Here's an example:

            File "/software/lsstsw/stack_20210813/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-151-g12957622+e8b61e2e81/python/lsst/pipe/tasks/simpleAssociation.py", line 191, in run
              diaSources.set_index("diaSourceId", inplace=True, verify_integrity=True)
            File "/software/lsstsw/stack_20210813/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe/lib/python3.8/site-packages/pandas/core/frame.py", line 4779, in set_index
              raise ValueError(f"Index has duplicate keys: {duplicates}")
          ValueError: Index has duplicate keys: Float64Index([ 2.021647709739424e+16, 2.0217044569948304e+16,
                        2.0217045106819164e+16,  2.021704510681917e+16,
                        2.0217045106819172e+16, 2.0217045106819176e+16,
                         2.021704510681918e+16,  2.021704618056096e+16,
                        2.0217046717431844e+16,  2.021704671743185e+16,
                        ...
                        2.5749409553291677e+17, 2.6680346667489702e+17,
                         2.668034672117679e+17, 2.6680346828550963e+17,
                        2.6680346828550966e+17,  2.668034688223806e+17,
                         2.668190289520231e+17, 2.6681902895202314e+17,
                         2.668190327101195e+17, 2.6681903271011952e+17],
                       dtype='float64', name='diaSourceId', length=529)
          
          

          Path to the jobs is /scratch/brendal4/bps-gen3-dc2_OLD/submit/2.2i/runs/test-med-1/w_2021_40/DM-32024/20211014T205139Z/jobs/drpAssociation/3828

          Show
          brendal4 Brock Brendal [X] (Inactive) added a comment - I also encountered this problem with DC2 w40 also for tract 3828. Running tract 3829 now. Here's an example: File "/software/lsstsw/stack_20210813/stack/miniconda3-py38_4.9.2-0.7.0/Linux64/pipe_tasks/21.0.0-151-g12957622+e8b61e2e81/python/lsst/pipe/tasks/simpleAssociation.py" , line 191 , in run diaSources.set_index( "diaSourceId" , inplace=True, verify_integrity=True) File "/software/lsstsw/stack_20210813/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe/lib/python3.8/site-packages/pandas/core/frame.py" , line 4779 , in set_index raise ValueError(f "Index has duplicate keys: {duplicates}" ) ValueError: Index has duplicate keys: Float64Index([ 2 .021647709739424e+ 16 , 2 .0217044569948304e+ 16 , 2 .0217045106819164e+ 16 , 2 .021704510681917e+ 16 , 2 .0217045106819172e+ 16 , 2 .0217045106819176e+ 16 , 2 .021704510681918e+ 16 , 2 .021704618056096e+ 16 , 2 .0217046717431844e+ 16 , 2 .021704671743185e+ 16 , ... 2 .5749409553291677e+ 17 , 2 .6680346667489702e+ 17 , 2 .668034672117679e+ 17 , 2 .6680346828550963e+ 17 , 2 .6680346828550966e+ 17 , 2 .668034688223806e+ 17 , 2 .668190289520231e+ 17 , 2 .6681902895202314e+ 17 , 2 .668190327101195e+ 17 , 2 .6681903271011952e+ 17 ], dtype= 'float64' , name= 'diaSourceId' , length= 529 ) Path to the jobs is /scratch/brendal4/bps-gen3-dc2_OLD/submit/2.2i/runs/test-med-1/w_2021_40/ DM-32024 /20211014T205139Z/jobs/drpAssociation/3828
          Hide
          yusra Yusra AlSayyad added a comment -

          Assigning to me because I spent some time trying to reproduce this on Friday and am going to assume that as the DM-38125 author I could prob fix it the fastest.

          Show
          yusra Yusra AlSayyad added a comment - Assigning to me because I spent some time trying to reproduce this on Friday and am going to assume that as the DM-38125 author I could prob fix it the fastest.
          Hide
          yusra Yusra AlSayyad added a comment -

          What's different in w_2021_36 and w_2021_40 is the schema of the {{goodSeeingDiff_diaSrcTable}}s when they have no rows.

          Looks like I screwed something up in DM-31825. because they were fine in w36 and have funny column names in w40:
          w36 vs w40 empty goodSeeing_diaSrcTables.html

          Removing the line that in drpAssociation fixes up this symptom and makes this reported failure go away.
          I also want to also fix-up the bad transforms on empty diaSource tables.

          Show
          yusra Yusra AlSayyad added a comment - What's different in w_2021_36 and w_2021_40 is the schema of the {{goodSeeingDiff_diaSrcTable}}s when they have no rows. Looks like I screwed something up in DM-31825 . because they were fine in w36 and have funny column names in w40: w36 vs w40 empty goodSeeing_diaSrcTables.html Removing the line that in drpAssociation fixes up this symptom and makes this reported failure go away. I also want to also fix-up the bad transforms on empty diaSource tables.
          Hide
          yusra Yusra AlSayyad added a comment - - edited

          This tickets adds simply adds couple layers of robustness.

          The root cause was actually the pixelID functor producing something not shaped like a column when no rows were present. The pixelId functor was removed between w_2021_41 and w_2021_42, so we wouldn't see this again even without this ticket. We need to backport DM-32046 too. Note, the referenceBand functor would suffer the same fate, but I'll take care of that on DM-32306.

          Robustness layer #1) Add a check that the functor outputs something that is shaped like a column. Raise a RuntimeError if it isn't. This would prevent the empty table with the garbage schema from being written out in the first place.

          Robustness layer #2) in drpAssociation, remove the line that concatenates an empty DataFrame, if no diaSources overlap patch. Not necessary.

          Ran this on all of DC2 step4 and step5 combined with DM-32124.

          Show
          yusra Yusra AlSayyad added a comment - - edited This tickets adds simply adds couple layers of robustness. The root cause was actually the pixelID functor producing something not shaped like a column when no rows were present. The pixelId functor was removed between w_2021_41 and w_2021_42, so we wouldn't see this again even without this ticket. We need to backport DM-32046 too. Note, the referenceBand functor would suffer the same fate, but I'll take care of that on DM-32306 . Robustness layer #1) Add a check that the functor outputs something that is shaped like a column. Raise a RuntimeError if it isn't. This would prevent the empty table with the garbage schema from being written out in the first place. Robustness layer #2) in drpAssociation, remove the line that concatenates an empty DataFrame, if no diaSources overlap patch. Not necessary. Ran this on all of DC2 step4 and step5 combined with DM-32124 .
          Hide
          yusra Yusra AlSayyad added a comment -

          Chris, do you have time to review this either today or tomorrow? I'd like to get it in for the w_2021_44 weekly.

          Note the ONLY changes are on pipe_tasks.

          Jenkins passes here: https://ci.lsst.codes/job/stack-os-matrix/35232/display/redirect

          Show
          yusra Yusra AlSayyad added a comment - Chris, do you have time to review this either today or tomorrow? I'd like to get it in for the w_2021_44 weekly. Note the ONLY changes are on pipe_tasks. Jenkins passes here: https://ci.lsst.codes/job/stack-os-matrix/35232/display/redirect
          Hide
          cmorrison Chris Morrison [X] (Inactive) added a comment -

          One more Jenkins with ci_hsc run and you should be good per our pair coding conversation.

          Show
          cmorrison Chris Morrison [X] (Inactive) added a comment - One more Jenkins with ci_hsc run and you should be good per our pair coding conversation.
          Hide
          yusra Yusra AlSayyad added a comment -

          After testing how the downstream tasks handle empty the tables, it turns out that drpDiaCalculation isn't ready to take empty tables and turn them into goodSeeingDiff_fullDiaObjTable with the same schemas as those with data. It returns a `goodSeeingDiff_fullDiaObjTable` as a Series{{Index([], dtype='object') Series([], dtype: object)

          I'm going to flip the new doWriteEmptyTables to False by default and re-Jenkins.

          Show
          yusra Yusra AlSayyad added a comment - After testing how the downstream tasks handle empty the tables, it turns out that drpDiaCalculation isn't ready to take empty tables and turn them into goodSeeingDiff_fullDiaObjTable with the same schemas as those with data. It returns a `goodSeeingDiff_fullDiaObjTable` as a Series{{Index([], dtype='object') Series([], dtype: object) I'm going to flip the new doWriteEmptyTables to False by default and re-Jenkins.

            People

            Assignee:
            yusra Yusra AlSayyad
            Reporter:
            hlin Huan Lin
            Reviewers:
            Chris Morrison [X] (Inactive)
            Watchers:
            Brock Brendal [X] (Inactive), Chris Morrison [X] (Inactive), Eli Rykoff, Hsin-Fang Chiang, Huan Lin, Kenneth Herner, Yusra AlSayyad
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.