Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15221

parallelize validate_drp ingest

    Details

    • Type: Improvement
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: validate_drp, Verification
    • Labels:
      None

      Description

      The majority of the time when running validate_drp's MatchedVisitsMetricsTask is spent on ingest and building the MultiMatch catalog. This should be a trivially parallelizable step (modulo MultiMatch's threadsafeness), and could produce a drastic reduction in runtime for validate_drp on large datasets.

      First step is to try a ThreadPool (to see if I/O parallelization gains us anything), and the next is to try a ProcessPool (and see whether the returned objects are pickleable).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jcarlin Jeffrey Carlin
                Reporter:
                Parejkoj John Parejko
                Watchers:
                Jim Bosch, John Parejko, John Swinbank, Michael Wood-Vasey, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel