Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27097

Enable -j in butler convert

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Team:
      Ops Middleware
    • Urgent?:
      No

      Description

      In PREOPS-102 we were reminded that even though butler ingest-raws supports multiprocessing, butler convert does not. This inconsistency is causing slowdowns that will impact DP0.1. For that reason I am calling this a Data Preview ticket.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            I've done the low-hanging fruit of adding -j and forwarding that parameter to define-visits and ingest-raws since those are the parts of butler convert that are already parallelized. The skymap registration is slow but is a fixed time per conversion and doesn't scale with the number of datasets. Improving expandDataId performance will need a larger refactoring and so will have to go on a different ticket.

            Kian-Tat Lim this is mostly a trivial change of adding pool and processes arguments in a number of places and forwarding them lower down. The tricky part though (and why I'd like your opinion) involves the pretty bad hack required to get this to work at all because pickling a sub task involves needing to pickle the parent task and the repo conversion tasks seem to be quite tricky to pickle and it's not clear why we would want to pickle them anyhow – my hack is to try to pickle the parent task and if it won't pickle I ignore it in the real pickling...

            Show
            tjenness Tim Jenness added a comment - I've done the low-hanging fruit of adding -j and forwarding that parameter to define-visits and ingest-raws since those are the parts of butler convert that are already parallelized. The skymap registration is slow but is a fixed time per conversion and doesn't scale with the number of datasets. Improving expandDataId performance will need a larger refactoring and so will have to go on a different ticket. Kian-Tat Lim this is mostly a trivial change of adding pool and processes arguments in a number of places and forwarding them lower down. The tricky part though (and why I'd like your opinion) involves the pretty bad hack required to get this to work at all because pickling a sub task involves needing to pickle the parent task and the repo conversion tasks seem to be quite tricky to pickle and it's not clear why we would want to pickle them anyhow – my hack is to try to pickle the parent task and if it won't pickle I ignore it in the real pickling...
            Hide
            ktl Kian-Tat Lim added a comment -

            Besides the undesirable (recursive) double-pickling, I think what you will need to do is identify what role the parent task is actually playing (including the logger naming that I mention in the PR) and develop a way of fulfilling that role. If this means we should actually change the interface to use only the parent task's name rather than the entire object, so be it.

            Show
            ktl Kian-Tat Lim added a comment - Besides the undesirable (recursive) double-pickling, I think what you will need to do is identify what role the parent task is actually playing (including the logger naming that I mention in the PR) and develop a way of fulfilling that role. If this means we should actually change the interface to use only the parent task's name rather than the entire object, so be it.
            Hide
            tjenness Tim Jenness added a comment -

            I have a proper fix for the pickling problem in DM-27131 so I will assume this ticket is approved to merge if that ticket is approved.

            Show
            tjenness Tim Jenness added a comment - I have a proper fix for the pickling problem in DM-27131 so I will assume this ticket is approved to merge if that ticket is approved.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Kian-Tat Lim
              Watchers:
              Eric Neilsen, Jim Bosch, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.