Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-36058

Fix untested Pandas deprecation warnings in ap_association

    XMLWordPrintable

    Details

    • Story Points:
      1
    • Sprint:
      AP F22-5 (October)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      There are a number of references to the deprecated DataFrame.append method that don't get picked up by unit tests. The conversion itself is trivial (see branch), but it breaks some mock-based tests in test_diaPipe.py. I can't figure out exactly where the break is coming from, especially since the code does not behave the same way inside a debugger.

      In pytest:

      self = <lsst.ap.association.diaPipe.DiaPipelineTask object at 0x7f7284351ab0>
      diaSourceTable = <MagicMock spec='DataFrame' id='140129819267504'>
      solarSystemObjectTable = <MagicMock spec='DataFrame' id='140129822736096'>
      diffIm = <Mock spec='ExposureF' id='140129819266736'>, exposure = <Mock spec='ExposureF' id='140129819267264'>
      template = <Mock spec='ExposureF' id='140129819267552'>, ccdExposureIdBits = 32, band = 'g'
       
          @timeMethod
          def run(self,
                  diaSourceTable,
                  solarSystemObjectTable,
                  diffIm,
                  exposure,
                  template,
                  ccdExposureIdBits,
                  band):
              """Process DiaSources and DiaObjects.
       
              Load previous DiaObjects and their DiaSource history. Calibrate the
              values in the diaSourceCat. Associate new DiaSources with previous
              DiaObjects. Run forced photometry at the updated DiaObject locations.
              Store the results in the Alert Production Database (Apdb).
       
              Parameters
              ----------
              diaSourceTable : `pandas.DataFrame`
                  Newly detected DiaSources.
              diffIm : `lsst.afw.image.ExposureF`
                  Difference image exposure in which the sources in ``diaSourceCat``
                  were detected.
              exposure : `lsst.afw.image.ExposureF`
                  Calibrated exposure differenced with a template to create
                  ``diffIm``.
              template : `lsst.afw.image.ExposureF`
                  Template exposure used to create diffIm.
              ccdExposureIdBits : `int`
                  Number of bits used for a unique ``ccdVisitId``.
              band : `str`
                  The band in which the new DiaSources were detected.
       
              Returns
              -------
              results : `lsst.pipe.base.Struct`
                  Results struct with components.
       
                  - ``apdbMaker`` : Marker dataset to store in the Butler indicating
                    that this ccdVisit has completed successfully.
                    (`lsst.dax.apdb.ApdbConfig`)
                  - ``associatedDiaSources`` : Catalog of newly associated
                    DiaSources. (`pandas.DataFrame`)
              """
              # Load the DiaObjects and DiaSource history.
              loaderResult = self.diaCatalogLoader.run(diffIm, self.apdb)
       
              # Associate new DiaSources with existing DiaObjects.
              assocResults = self.associator.run(diaSourceTable,
                                                 loaderResult.diaObjects)
              if self.config.doSolarSystemAssociation:
                  ssoAssocResult = self.solarSystemAssociator.run(
                      assocResults.unAssocDiaSources,
                      solarSystemObjectTable,
                      diffIm)
                  createResults = self.createNewDiaObjects(
                      ssoAssocResult.unAssocDiaSources)
                  associatedDiaSources = pd.concat(
                      [assocResults.matchedDiaSources,
                       ssoAssocResult.ssoAssocDiaSources,
                       createResults.diaSources])
                  nTotalSsObjects = ssoAssocResult.nTotalSsObjects
                  nAssociatedSsObjects = ssoAssocResult.nAssociatedSsObjects
              else:
                  createResults = self.createNewDiaObjects(
                      assocResults.unAssocDiaSources)
                  associatedDiaSources = pd.concat(
                      [assocResults.matchedDiaSources,
                       createResults.diaSources])
                  nTotalSsObjects = 0
                  nAssociatedSsObjects = 0
       
              # Create new DiaObjects from unassociated diaSources.
              self._add_association_meta_data(assocResults.nUpdatedDiaObjects,
                                              assocResults.nUnassociatedDiaObjects,
                                              createResults.nNewDiaObjects,
                                              nTotalSsObjects,
                                              nAssociatedSsObjects)
              # Index the DiaSource catalog for this visit after all associations
              # have been made.
              updatedDiaObjectIds = associatedDiaSources["diaObjectId"][
                  associatedDiaSources["diaObjectId"] != 0].to_numpy()
              associatedDiaSources.set_index(["diaObjectId",
                                              "filterName",
                                              "diaSourceId"],
                                             drop=False,
                                             inplace=True)
       
              # Append new DiaObjects and DiaSources to their previous history.
      >       diaObjects = pd.concat(
                  [loaderResult.diaObjects,
                   createResults.newDiaObjects.set_index("diaObjectId", drop=False)],
                  sort=True)
      E       TypeError: TestDiaPipelineTask._testRun.<locals>.concatMock() got an unexpected keyword argument 'sort'
      

      In PDB:

      (Pdb) pd.concat([loaderResult.diaObjects, createResults.newDiaObjects.set_index("diaObjectId", drop=False)], sort=True)
      *** TypeError: cannot concatenate object of type '<class 'unittest.mock.MagicMock'>'; only Series and DataFrame objs are valid
      

      Fix the tests so that we can remove append entirely. Note that the change only breaks unit tests; the modified code works fine in ap_verify.

        Attachments

          Activity

          Hide
          Parejkoj John Parejko added a comment -

          I'm very concerned about our use of pandas here (particularly pd.concat's changing numbers to float when one of the inputs is missing the field), but this change doesn't actually alter that particular bug. I'll be filing another ticket to deal with that problem; I think your changes here are fine.

          Show
          Parejkoj John Parejko added a comment - I'm very concerned about our use of pandas here (particularly pd.concat 's changing numbers to float when one of the inputs is missing the field), but this change doesn't actually alter that particular bug. I'll be filing another ticket to deal with that problem; I think your changes here are fine.

            People

            Assignee:
            krzys Krzysztof Findeisen
            Reporter:
            krzys Krzysztof Findeisen
            Reviewers:
            John Parejko
            Watchers:
            Eric Bellm, Ian Sullivan, John Parejko, Krzysztof Findeisen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.