Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15221

parallelize validate_drp ingest

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Won't Fix
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: validate_drp, Verification
    • Labels:
      None

      Description

      The majority of the time when running validate_drp's MatchedVisitsMetricsTask is spent on ingest and building the MultiMatch catalog. This should be a trivially parallelizable step (modulo MultiMatch's threadsafeness), and could produce a drastic reduction in runtime for validate_drp on large datasets.

      First step is to try a ThreadPool (to see if I/O parallelization gains us anything), and the next is to try a ProcessPool (and see whether the returned objects are pickleable).

        Attachments

          Issue Links

            Activity

            Hide
            Parejkoj John Parejko added a comment -

            Note: I played with this a bit more after rebasing (see the updated branch), and it's clear that the missing component is pickling of afw table Schema. I filed a ticket to do that. It looks like we should be able to get a big speed bump just parallelizing the file reads, leaving multimatch to be done in serial (with the hope of replacing multimatch with something much faster in the future). My evidence for this is some line_profile runs on the purely serial code where the split between loadOneCatalog and mmatch.add is roughly 60%/25% of the total runtime of _loadAndMatchCatalogs.

            Show
            Parejkoj John Parejko added a comment - Note: I played with this a bit more after rebasing (see the updated branch), and it's clear that the missing component is pickling of afw table Schema. I filed a ticket to do that. It looks like we should be able to get a big speed bump just parallelizing the file reads, leaving multimatch to be done in serial (with the hope of replacing multimatch with something much faster in the future). My evidence for this is some line_profile runs on the purely serial code where the split between loadOneCatalog and mmatch.add is roughly 60%/25% of the total runtime of _loadAndMatchCatalogs .
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            Is the fundamental issue

            1. That the afw.table reading in the first place is slow?
            2. The calculation of the calibrated values (and required reading of the calexp metdata)?

            If #1 then it seems like speeding up the original reading is the key thing to do.
            If it's #2, then the intermediate persisting of files may be a useful step.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - Is the fundamental issue 1. That the afw.table reading in the first place is slow? 2. The calculation of the calibrated values (and required reading of the calexp metdata)? If #1 then it seems like speeding up the original reading is the key thing to do. If it's #2, then the intermediate persisting of files may be a useful step.
            Hide
            Parejkoj John Parejko added a comment -

            Most of the time is spent reading the catalog and metadata: the calibration calculations are less than 5% of the runtime.

            Show
            Parejkoj John Parejko added a comment - Most of the time is spent reading the catalog and metadata: the calibration calculations are less than 5% of the runtime.
            Hide
            lguy Leanne Guy added a comment -

            validate_drp has been replaced with the faro package for computing metrics.

            Show
            lguy Leanne Guy added a comment - validate_drp has been replaced with the faro package for computing metrics.

              People

              Assignee:
              jcarlin Jeffrey Carlin
              Reporter:
              Parejkoj John Parejko
              Watchers:
              Jim Bosch, John Parejko, John Swinbank, Leanne Guy, Michael Wood-Vasey, Simon Krughoff
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.