Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13827

ScienceSourceSelectorTask is slowly appending to a table when it can simply do the selection

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: meas_algorithms
    • Labels:
      None
    • Story Points:
      0.5
    • Sprint:
      DRP S18-4
    • Team:
      Data Release Production

      Description

      ScienceSourceSelectorTask is slowly appending to a table when it can simply do the selection. The loop at https://github.com/lsst/meas_algorithms/blob/f7dca96402cda034104c615be7ef821ab3aa9ea9/python/lsst/meas/algorithms/sourceSelector.py#L451 is unnecessary.

        Attachments

          Activity

          Hide
          erykoff Eli Rykoff added a comment -

          Added into this ticket RequireIsolated to ScienceSourceSelectorTask to select isolated sources where ((catalog['parent'] == 0) & (catalog['deblend_nChild'] == 0)). There was some discussion on Slack between myself, John Parejko and Jim Bosch (https://lsstc.slack.com/archives/C2JPMCF5X/p1521218208000586 ) as to whether a loop over footprints was necessary, as discussed in DM-7100 and implemented in AstrometrySourceSelectorTask: https://github.com/lsst/meas_algorithms/blob/f7dca96402cda034104c615be7ef821ab3aa9ea9/python/lsst/meas/algorithms/astrometrySourceSelector.py#L107 .

          However, it seems that this is redundant in the case that the deblender has been run, which is required in the case that the deblend_nChild field actually exists. Therefore, my implementation does not do this loop (saving time on both loading the catalog and looping over sources). I have made a note of this in the docstring that RequireIsolated can only be run on catalogs that have had the deblender run.

          Show
          erykoff Eli Rykoff added a comment - Added into this ticket RequireIsolated to ScienceSourceSelectorTask to select isolated sources where ((catalog ['parent'] == 0) & (catalog ['deblend_nChild'] == 0)) . There was some discussion on Slack between myself, John Parejko and Jim Bosch ( https://lsstc.slack.com/archives/C2JPMCF5X/p1521218208000586 ) as to whether a loop over footprints was necessary, as discussed in DM-7100 and implemented in AstrometrySourceSelectorTask : https://github.com/lsst/meas_algorithms/blob/f7dca96402cda034104c615be7ef821ab3aa9ea9/python/lsst/meas/algorithms/astrometrySourceSelector.py#L107 . However, it seems that this is redundant in the case that the deblender has been run, which is required in the case that the deblend_nChild field actually exists. Therefore, my implementation does not do this loop (saving time on both loading the catalog and looping over sources). I have made a note of this in the docstring that RequireIsolated can only be run on catalogs that have had the deblender run.
          Hide
          erykoff Eli Rykoff added a comment -

          Final (before PR) version in Jenkins now... but in terms of speed-up, selecting 187 of 1685 sources in /datasets/hsc/repo/rerun/RC/w_2018_10/DM-13647 visit=36446, ccd=1, the selection used to take ~2.3 milliseconds, and now takes ~700 microseconds, over a factor of 3x speed-up. Okay, so this is tiny compared to the time to actually read in the catalog, but it can add up.

          In the case of just selecting on one flag so that we get 1560/1685 sources, the speed-up is roughly 60x (14.5 milliseconds vs 250 microseconds). The more sources you select, the more time this will save.

          Show
          erykoff Eli Rykoff added a comment - Final (before PR) version in Jenkins now... but in terms of speed-up, selecting 187 of 1685 sources in /datasets/hsc/repo/rerun/RC/w_2018_10/ DM-13647 visit=36446 , ccd=1 , the selection used to take ~2.3 milliseconds, and now takes ~700 microseconds, over a factor of 3x speed-up. Okay, so this is tiny compared to the time to actually read in the catalog, but it can add up. In the case of just selecting on one flag so that we get 1560/1685 sources, the speed-up is roughly 60x (14.5 milliseconds vs 250 microseconds). The more sources you select, the more time this will save.
          Hide
          price Paul Price added a comment -

          Very nice, thanks!

          Show
          price Paul Price added a comment - Very nice, thanks!

            People

            Assignee:
            erykoff Eli Rykoff
            Reporter:
            erykoff Eli Rykoff
            Reviewers:
            Paul Price
            Watchers:
            Eli Rykoff, John Parejko, Paul Price
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                CI Builds

                No builds found.