Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-19616

Make IngestIndexReferenceObjectsTask multiprocessing capable

    Details

    • Story Points:
      8
    • Epic Link:
    • Sprint:
      AP S19-6, AP F19-1
    • Team:
      Alert Production

      Description

      In order to be able to generate a Gaia (or PS1-DR2) reference catalog in a reasonable amount of time, we need a way to process files in parallel. The existing IngestIndexReferenceObjectsTask is not capable of running in parallel. One approach that should be fairly simple is to use multiprocessing plus file locking: lock each output catalog before appending.

        Attachments

          Issue Links

            Activity

            Parejkoj John Parejko created issue -
            Parejkoj John Parejko made changes -
            Field Original Value New Value
            Epic Link DM-19473 [ 263512 ]
            Parejkoj John Parejko made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Parejkoj John Parejko made changes -
            Attachment test_filelock.py [ 38045 ]
            Hide
            Parejkoj John Parejko added a comment -

            Attached a sample script that should follow my basic plan for how to do file and counter locks in IngestIndexReferenceTask.

            Show
            Parejkoj John Parejko added a comment - Attached a sample script that should follow my basic plan for how to do file and counter locks in IngestIndexReferenceTask.
            Parejkoj John Parejko made changes -
            Attachment test_filelock.py [ 38047 ]
            Parejkoj John Parejko made changes -
            Attachment test_filelock.py [ 38045 ]
            Hide
            Parejkoj John Parejko added a comment -

            It's currently a bit hacky (and requires patching esutil to provide pickling support), but this branch of meas_algorithms now has a multiprocessing capable refcat ingester. Testing it with 2000 Gaia DR2 files took ~900 seconds on LSST dev with 8 cores (and ~240s of overhead to create the Locks ahead of time), vs. ~5000 seconds for the no-multiprocessing version, which is a speedup of about ~7x after subtracting overhead. Not bad!

            I'm going to try a run on the full Gaia catalog, which should take about 2 hours on a node, if my math is right.

            Show
            Parejkoj John Parejko added a comment - It's currently a bit hacky (and requires patching esutil to provide pickling support), but this branch of meas_algorithms now has a multiprocessing capable refcat ingester. Testing it with 2000 Gaia DR2 files took ~900 seconds on LSST dev with 8 cores (and ~240s of overhead to create the Locks ahead of time), vs. ~5000 seconds for the no-multiprocessing version, which is a speedup of about ~7x after subtracting overhead. Not bad! I'm going to try a run on the full Gaia catalog, which should take about 2 hours on a node, if my math is right.
            Hide
            Parejkoj John Parejko added a comment -

            John Swinbank: are you still ok to review this? It's probably easiest to review the meas_algorithms commits separately, as they should be atomic. I'm not entirely happy with the approach here, but the parallelization does work pretty well.

            Should I send an upstream PR for the esutil patch? meas.algorithms.HTMIndexer is the only place in the stack outside fgcm that I know of that uses esutil: we could conceivably replace it with lsst.sphgeom.HtmPixelization with a few additions to the latter class.

            Show
            Parejkoj John Parejko added a comment - John Swinbank : are you still ok to review this? It's probably easiest to review the meas_algorithms commits separately, as they should be atomic. I'm not entirely happy with the approach here, but the parallelization does work pretty well. Should I send an upstream PR for the esutil patch? meas.algorithms.HTMIndexer is the only place in the stack outside fgcm that I know of that uses esutil: we could conceivably replace it with lsst.sphgeom.HtmPixelization with a few additions to the latter class.
            Parejkoj John Parejko made changes -
            Reviewers John Swinbank [ swinbank ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Show
            Parejkoj John Parejko added a comment - Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29832/pipeline
            Hide
            swinbank John Swinbank added a comment -

            I'm happy to look at this, but it won't be until the second half of this week. Is that ok?

            Show
            swinbank John Swinbank added a comment - I'm happy to look at this, but it won't be until the second half of this week. Is that ok?
            Parejkoj John Parejko made changes -
            Link This issue is triggering DM-19761 [ DM-19761 ]
            Parejkoj John Parejko made changes -
            Link This issue is triggering DM-19768 [ DM-19768 ]
            Parejkoj John Parejko made changes -
            Status In Review [ 10004 ] In Progress [ 3 ]
            Hide
            Parejkoj John Parejko added a comment -

            Taking this out of review for the moment, while I debug the strangeness in the jointcal run.

            Show
            Parejkoj John Parejko added a comment - Taking this out of review for the moment, while I debug the strangeness in the jointcal run.
            Parejkoj John Parejko made changes -
            Link This issue is blocked by DM-20019 [ DM-20019 ]
            Parejkoj John Parejko made changes -
            Link This issue is triggering DM-20025 [ DM-20025 ]
            Hide
            Parejkoj John Parejko added a comment - - edited

            John Swinbank, I think this is ready for review again. It grew to be much larger (~1000 lines) because I had to significantly expand the test suite to make it actually test the contents of the ingested test refcats. Quite a bit of the code in ingestIndexTestBase.py and ingestIndexManager.py was moved from other files: I'll leave it you to decide whether to review all of the code, or to try to sort out what was moved.

            New Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29940/pipeline

            Show
            Parejkoj John Parejko added a comment - - edited John Swinbank , I think this is ready for review again. It grew to be much larger (~1000 lines) because I had to significantly expand the test suite to make it actually test the contents of the ingested test refcats. Quite a bit of the code in ingestIndexTestBase.py and ingestIndexManager.py was moved from other files: I'll leave it you to decide whether to review all of the code, or to try to sort out what was moved. New Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29940/pipeline
            Parejkoj John Parejko made changes -
            Status In Progress [ 3 ] In Review [ 10004 ]
            swinbank John Swinbank made changes -
            Sprint AP S19-6 [ 834 ] AP S19-6, AP F19-1 [ 834, 923 ]
            Hide
            Parejkoj John Parejko added a comment -

            Christopher Waters: John S. said he won't be able to get to this for a while, so I'm switching reviewers. Please see the comments above for note about the bits to be reviewed.

            Show
            Parejkoj John Parejko added a comment - Christopher Waters : John S. said he won't be able to get to this for a while, so I'm switching reviewers. Please see the comments above for note about the bits to be reviewed.
            Parejkoj John Parejko made changes -
            Reviewers John Swinbank [ swinbank ] Christopher Waters [ cwaters ]
            Status In Review [ 10004 ] In Review [ 10004 ]
            Hide
            czw Christopher Waters added a comment -

            Everything largely looked fine to me, except for some style issues.  For the moved code, I checked the PR versions, and then spot checked that they matched the removed versions (method signatures, etc).

            Show
            czw Christopher Waters added a comment - Everything largely looked fine to me, except for some style issues.  For the moved code, I checked the PR versions, and then spot checked that they matched the removed versions (method signatures, etc).
            czw Christopher Waters made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            Hide
            Parejkoj John Parejko added a comment -

            Did my approach to implementing multiprocessing seem reasonable to you?

            Show
            Parejkoj John Parejko added a comment - Did my approach to implementing multiprocessing seem reasonable to you?
            Hide
            czw Christopher Waters added a comment -

            I had to stop and think about the locking, but this seemed like a reasonable solution.  I have a suspicion that a randomized list of inputs would run faster than one that's sequential (as processes in the pool shouldn't have lock collisions), but that probably isn't a big deal in the long run.

            Show
            czw Christopher Waters added a comment - I had to stop and think about the locking, but this seemed like a reasonable solution.  I have a suspicion that a randomized list of inputs would run faster than one that's sequential (as processes in the pool shouldn't have lock collisions), but that probably isn't a big deal in the long run.
            Hide
            Parejkoj John Parejko added a comment - - edited

            Watching the CPU load on my desktop when generating my Gaia test set, I could see file contention appear as occasional dips, but it was pretty consistent otherwise. When I made some estimates of speed, it was roughly a factor of 7 faster with 8 process than with 1 process, so it seems to be doing a decent job.

            Show
            Parejkoj John Parejko added a comment - - edited Watching the CPU load on my desktop when generating my Gaia test set, I could see file contention appear as occasional dips, but it was pretty consistent otherwise. When I made some estimates of speed, it was roughly a factor of 7 faster with 8 process than with 1 process, so it seems to be doing a decent job.
            Hide
            Parejkoj John Parejko added a comment -

            Thanks for the quick review, Christopher Waters.

            Merged and done.

            Show
            Parejkoj John Parejko added a comment - Thanks for the quick review, Christopher Waters . Merged and done.
            Parejkoj John Parejko made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]

              People

              • Assignee:
                Parejkoj John Parejko
                Reporter:
                Parejkoj John Parejko
                Reviewers:
                Christopher Waters
                Watchers:
                Christopher Waters, Colin Slater, Jim Bosch, John Parejko, John Swinbank, Paul Price, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel