Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-21886

Create PipelineTask driver for ap_pipe tasks that interact with the APDB

    Details

    • Story Points:
      8
    • Sprint:
      AP F19-6 (November), AP S20-1 (December)
    • Team:
      Alert Production

      Description

      Create a PipelineTask whose runQuantum method includes all of the logic in ApPipeTask.runAssociation.

      This should probably be done after DM-21877, so it can use the same approach to signal completion of APDB writes to downstream PipelineTasks.

      It's not clear what to do about ownership of ApPipeTask.ppdb. It must belong to the new task in Gen 3, but doing so will break config overrides in Gen 2 (and the DB location is always overridden).

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Fortunately, I don't think we need runQuantum for this. All the subtasks take unpersisted objects, and ApPipeTask.runAssociation does not do anything with its input dataref besides a bunch of butler.get calls.

            It may be possible to make this a regular Gen 2 task (subtask of ApPipeTask), then convert to a PipelineTask once the upstream subtasks are ready. Though either way the argument list to run will be... rather lengthy.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Fortunately, I don't think we need runQuantum for this. All the subtasks take unpersisted objects, and ApPipeTask.runAssociation does not do anything with its input dataref besides a bunch of butler.get calls. It may be possible to make this a regular Gen 2 task (subtask of ApPipeTask ), then convert to a PipelineTask once the upstream subtasks are ready. Though either way the argument list to run will be... rather lengthy.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            I've split this ticket into DM-21944, so that we can do at least the essence of it without needing to wait for DM-21862 or DM-21874 (I expect the latter in particular to take a long time).

            Show
            krzys Krzysztof Findeisen added a comment - - edited I've split this ticket into DM-21944 , so that we can do at least the essence of it without needing to wait for DM-21862 or DM-21874 (I expect the latter in particular to take a long time).
            Hide
            krzys Krzysztof Findeisen added a comment -

            Following discussion with John Parejko, we've decided that DM-21862 and DM-21874 won't actually improve our ability to (unit) test the new task. Therefore, this work can be done as soon as DM-21877 is finished, and a Gen 2 only step (DM-21944) is unneccessary.

            Show
            krzys Krzysztof Findeisen added a comment - Following discussion with John Parejko , we've decided that DM-21862 and DM-21874 won't actually improve our ability to (unit) test the new task. Therefore, this work can be done as soon as DM-21877 is finished, and a Gen 2 only step ( DM-21944 ) is unneccessary.
            Hide
            ebellm Eric Bellm added a comment - - edited

            Chris Morrison suggests that as part of this ticket we ensure that the data flow is compatible with pre-caching queries to the APDB before the rest of ap_pipe runs. This could be split in two tickets if needed.

            Show
            ebellm Eric Bellm added a comment - - edited Chris Morrison suggests that as part of this ticket we ensure that the data flow is compatible with pre-caching queries to the APDB before the rest of ap_pipe runs. This could be split in two tickets if needed.
            Hide
            cmorrison Chris Morrison added a comment -

            Jim Bosch Krzysztof Findeisen Eric Bellm

            Just wanted to do a quick summary of what this ticket entails. I'm to create a pipeline task that replaces the functionality of runAssociation in ap_pipe, likely creating a subtask for accessing the apdb, modifying AssociationTask slightly, creating a SubTask to take car of measuring diaForcedSources and writing to the apdb, wrapping the whole thing in a pipeline task. a few quick questions:

            • What package will the new pipeline task live in? pipe_tasks?
            • The SubTasks will all live in ap_association I assume? Do we want any of them to live elsewhere?
            Show
            cmorrison Chris Morrison added a comment - Jim Bosch Krzysztof Findeisen Eric Bellm Just wanted to do a quick summary of what this ticket entails. I'm to create a pipeline task that replaces the functionality of runAssociation in ap_pipe, likely creating a subtask for accessing the apdb, modifying AssociationTask slightly, creating a SubTask to take car of measuring diaForcedSources and writing to the apdb, wrapping the whole thing in a pipeline task. a few quick questions: What package will the new pipeline task live in? pipe_tasks? The SubTasks will all live in ap_association I assume? Do we want any of them to live elsewhere?
            Hide
            jbosch Jim Bosch added a comment -

            I suspect Krzysztof Findeisen has kept the design for this in his head better than I have, so I think his sign-off on your summary is probably more important than mine.

            In terms of packaging, I'd probably lean towards putting the PipelineTask in ap_association or maybe ap_pipe; this is not reflected in pipe_tasks' current state, so it's just an idea, but if we ever want to get to a state where we have separate homes for:

            • PipelineTasks used by both AP and DRP
            • PipelineTasks used only by AP
            • PipelineTasks used only by DRP

            then I think pipe_tasks is probably closest to being the first of those, and I think the PipelineTask written on this ticket is in the second category, given that it interacts (indirectly) with the APDB?

            Show
            jbosch Jim Bosch added a comment - I suspect Krzysztof Findeisen has kept the design for this in his head better than I have, so I think his sign-off on your summary is probably more important than mine. In terms of packaging, I'd probably lean towards putting the PipelineTask in ap_association or maybe ap_pipe; this is not reflected in pipe_tasks' current state, so it's just an idea, but if we ever want to get to a state where we have separate homes for: PipelineTasks used by both AP and DRP PipelineTasks used only by AP PipelineTasks used only by DRP then I think pipe_tasks is probably closest to being the first of those, and I think the PipelineTask written on this ticket is in the second category, given that it interacts (indirectly) with the APDB?
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            I don't have a design in my head, since I don't know how to handle pre-caching.

            That said, I'm a bit surprised that you would need so many new subtasks, but since all three existing existing subtasks (MapDiaSourceTask, AssociationTask, and DiaForcedSourceTask) are from ap_association, it would at least make sense (and not cause dependency-order problems) for an umbrella task and any new tasks to also go in ap_association.

            Putting everything in one package for now would also make it easier to move things around later once we have a clearer picture of how database access "should work" in Gen 3. This ticket was originally proposed as a hack to keep AP Pipeline conversion from getting blocked on that question.

            Show
            krzys Krzysztof Findeisen added a comment - - edited I don't have a design in my head, since I don't know how to handle pre-caching. That said, I'm a bit surprised that you would need so many new subtasks, but since all three existing existing subtasks ( MapDiaSourceTask , AssociationTask , and DiaForcedSourceTask ) are from ap_association , it would at least make sense (and not cause dependency-order problems) for an umbrella task and any new tasks to also go in ap_association . Putting everything in one package for now would also make it easier to move things around later once we have a clearer picture of how database access "should work" in Gen 3. This ticket was originally proposed as a hack to keep AP Pipeline conversion from getting blocked on that question.
            Hide
            cmorrison Chris Morrison added a comment -

            I mostly just want to separate work that conceptually goes together into their own task. Thanks for reminding me about the MapApTask. I had forgotten about it.

            I'm going to leave the pre-caching for another ticket, but I am curious/worried if the PipelineTask will allow for pre-caching. Can you pass in memory data between different PipelineTasks without having to persist it through the Butler?

            Show
            cmorrison Chris Morrison added a comment - I mostly just want to separate work that conceptually goes together into their own task. Thanks for reminding me about the MapApTask. I had forgotten about it. I'm going to leave the pre-caching for another ticket, but I am curious/worried if the PipelineTask will allow for pre-caching. Can you pass in memory data between different PipelineTasks without having to persist it through the Butler?
            Hide
            jbosch Jim Bosch added a comment -

            The plan for that in other contexts is to pass things in-memory through the butler - we'd configure certain dataset types to not actually be written but just held inside the butler until they're asked for next.

            Show
            jbosch Jim Bosch added a comment - The plan for that in other contexts is to pass things in-memory through the butler - we'd configure certain dataset types to not actually be written but just held inside the butler until they're asked for next.
            Hide
            cmorrison Chris Morrison added a comment -

            Cool, happy to know there is a plan. 

            Show
            cmorrison Chris Morrison added a comment - Cool, happy to know there is a plan. 
            Hide
            cmorrison Chris Morrison added a comment - - edited

            As per the review of DM-22478, I'll move all database writing out of the subtasks AssociationTask and DiaForcedSourceTask and into the new pipeline task. Also remove all database calls from the AssociationTask unittesting.

            Show
            cmorrison Chris Morrison added a comment - - edited As per the review of DM-22478 , I'll move all database writing out of the subtasks AssociationTask and DiaForcedSourceTask and into the new pipeline task. Also remove all database calls from the AssociationTask unittesting.
            Show
            cmorrison Chris Morrison added a comment - - edited Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30929/pipeline
            Hide
            cmorrison Chris Morrison added a comment -

            Hey Jim Bosch I ended up "finishing" this ticket pretty simply. One thing I don't have yet is a mocked up Gen3 butler for unittesting the runQuantum method. Are there any examples in the stack currently for doing this or maybe something in a demo or doc?

            Show
            cmorrison Chris Morrison added a comment - Hey Jim Bosch I ended up "finishing" this ticket pretty simply. One thing I don't have yet is a mocked up Gen3 butler for unittesting the runQuantum method. Are there any examples in the stack currently for doing this or maybe something in a demo or doc?
            Hide
            cmorrison Chris Morrison added a comment -

            Hey,  Nate Lust, do you have an answer for the question above? Thanks!

            Show
            cmorrison Chris Morrison added a comment - Hey,  Nate Lust , do you have an answer for the question above? Thanks!
            Hide
            sullivan Ian Sullivan added a comment -

            I left a few relatively minor comments in ap_association the pull requests.

            Show
            sullivan Ian Sullivan added a comment - I left a few relatively minor comments in ap_association  the pull requests.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            I'm not sure this ticket is quite finished. The point was to hide ApPipeTask.apdb inside a PipelineTask, so that ApPipeTask wouldn't do anything that couldn't be done by the Gen 3 framework (enabling DM-21939 to make a Pipeline that's just the various subtasks and sub-pipelines of ApPipeTask). While the new diaCatalogLoader task eliminates some of the references to apdb, runAssociation still has four left – these can't be translated to Gen 3, as far as I know.

            Show
            krzys Krzysztof Findeisen added a comment - - edited I'm not sure this ticket is quite finished. The point was to hide ApPipeTask.apdb inside a PipelineTask , so that ApPipeTask wouldn't do anything that couldn't be done by the Gen 3 framework (enabling DM-21939 to make a Pipeline that's just the various subtasks and sub-pipelines of ApPipeTask ). While the new diaCatalogLoader task eliminates some of the references to apdb , runAssociation still has four left – these can't be translated to Gen 3, as far as I know.
            Hide
            cmorrison Chris Morrison added a comment -

            Hey Krzysztof Findeisen, did you checkout the new DiaPipelineTask method that comes along with this ticket? That's what's going to be "runAssociation" in the future. I kept ap_pipe as is (and slightly modified) so that it still works with Gen2 for now. I'm assuming that once we are ready, ap_pipe will just become a yaml file that will have DiaPipelineTask as one if it's pipelines or am I miss reading what is happening?

            Show
            cmorrison Chris Morrison added a comment - Hey Krzysztof Findeisen , did you checkout the new DiaPipelineTask method that comes along with this ticket? That's what's going to be "runAssociation" in the future. I kept ap_pipe as is (and slightly modified) so that it still works with Gen2 for now. I'm assuming that once we are ready, ap_pipe will just become a yaml file that will have DiaPipelineTask as one if it's pipelines or am I miss reading what is happening?
            Hide
            krzys Krzysztof Findeisen added a comment -

            I did miss the new DiaPipelineTask, sorry about that.

            Given that it exists, I guess my question is why you can't make it a subtask of the Gen 2 ApPipeTask and call it from runAssociation (all the inputs for DiaPipelineTask.run correspond to sensorRef.get calls in the old code). That would avoid the current code duplication and make the remaining migration steps much clearer. No matter what we do, ApPipeTask and the new yaml file will have to coexist for a bit (up to mid-2020, as of this week's middleware demo), so code duplication could be a real problem.

            If you're not going to make DiaPipelineTask a subtask of ApPipeTask, please add a comment about it to DM-21939, since it's not very discoverable at the moment.

            Show
            krzys Krzysztof Findeisen added a comment - I did miss the new DiaPipelineTask , sorry about that. Given that it exists, I guess my question is why you can't make it a subtask of the Gen 2 ApPipeTask and call it from runAssociation (all the inputs for DiaPipelineTask.run correspond to sensorRef.get calls in the old code). That would avoid the current code duplication and make the remaining migration steps much clearer. No matter what we do, ApPipeTask and the new yaml file will have to coexist for a bit (up to mid-2020, as of this week's middleware demo), so code duplication could be a real problem. If you're not going to make DiaPipelineTask a subtask of ApPipeTask , please add a comment about it to DM-21939 , since it's not very discoverable at the moment.
            Hide
            cmorrison Chris Morrison added a comment -

            Why don't I make a ticket to make DiaPipelineTask a subtask? There are a lot of specific calls to the associator and other subtasks throughout the code and more than a few settings in the Apdb that need to be taken care of here and in ap_verify. Seems like a good idea to separate that work a little.

            Show
            cmorrison Chris Morrison added a comment - Why don't I make a ticket to make DiaPipelineTask a subtask? There are a lot of specific calls to the associator and other subtasks throughout the code and more than a few settings in the Apdb that need to be taken care of here and in ap_verify. Seems like a good idea to separate that work a little.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Fine by me, as long as it's a blocker for DM-21939.

            Show
            krzys Krzysztof Findeisen added a comment - Fine by me, as long as it's a blocker for DM-21939 .
            Show
            cmorrison Chris Morrison added a comment - New Jenkins:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30932/pipeline

              People

              • Assignee:
                cmorrison Chris Morrison
                Reporter:
                jbosch Jim Bosch
                Reviewers:
                Ian Sullivan
                Watchers:
                Chris Morrison, Eric Bellm, Ian Sullivan, Jim Bosch, Krzysztof Findeisen
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel