Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11372

Create CommandLineTask utility in `ap_pipe`

    XMLWordPrintable

    Details

    • Story Points:
      6
    • Epic Link:
    • Sprint:
      Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-2
    • Team:
      Alert Production

      Description

      Currently we envision ap_pipe as a library of functions that may be called by ap_verify or other drivers. It will likely be useful to be able to test ap_pipe independently of ap_verify. This ticket is to add a CommandLineTask utility within the ap_pipe package that utilizes the package library functions.

        Attachments

          Issue Links

            Activity

            ebellm Eric Bellm created issue -
            ebellm Eric Bellm made changes -
            Field Original Value New Value
            Epic Link DM-10773 [ 32739 ]
            ebellm Eric Bellm made changes -
            Component/s Alert Production [ 13900 ]
            ebellm Eric Bellm made changes -
            Description Currently we envision `ap_pipe` as a library of functions that may be called by `verify_ap` or other drivers. It will likely be useful to be able to test `ap_pipe` independently of `verify_ap`. This ticket is to add a CommandLineTask utility within the `ap_pipe` package that utilizes the package library functions. Currently we envision {{ap_pipe}} as a library of functions that may be called by {{verify_ap}} or other drivers. It will likely be useful to be able to test {{ap_pipe}} independently of {{verify_ap}}. This ticket is to add a CommandLineTask utility within the {{ap_pipe}} package that utilizes the package library functions.
            mrawls Meredith Rawls made changes -
            Assignee Meredith Rawls [ mrawls ]
            Hide
            ebellm Eric Bellm added a comment -

            It may be useful to have this emulate the structure of the `pipe_driver` scripts.

            Show
            ebellm Eric Bellm added a comment - It may be useful to have this emulate the structure of the `pipe_driver` scripts.
            mrawls Meredith Rawls made changes -
            Sprint Alert Production F17 - 9 [ 639 ]
            Hide
            mrawls Meredith Rawls added a comment -

            While ap_pipe doesn't have an official CommandLineTask utility at present, it does have the runPipelineAlone function which handles basic argument parsing and lets the user do exactly what it sounds like (run ap_pipe from the command line without touching ap_verify). I plan to keep this basic functionality as ap_pipe evolves. An example of how to run ap_pipe is given in DMTN-039.

            Do we still think it is necessary to make this functionality a proper CommandLineTask? I'm concerned that the fact that ap_pipe begins by ingesting a non-Butler dataset of raw images would make this challenging and require changing things in pipe_base a fair bit.

            Show
            mrawls Meredith Rawls added a comment - While ap_pipe doesn't have an official CommandLineTask utility at present, it does have the runPipelineAlone function which handles basic argument parsing and lets the user do exactly what it sounds like (run ap_pipe from the command line without touching ap_verify ). I plan to keep this basic functionality as ap_pipe evolves. An example of how to run ap_pipe is given in DMTN-039 . Do we still think it is necessary to make this functionality a proper CommandLineTask ? I'm concerned that the fact that ap_pipe begins by ingesting a non-Butler dataset of raw images would make this challenging and require changing things in pipe_base a fair bit.
            Hide
            ebellm Eric Bellm added a comment -

            I think ap_pipe needs to be a real, bona-fide CommandLineTask--we shouldn't reinvent the wheel here. If ingestion is causing the trouble let's refactor so we can deal with ingestion separately.

            Show
            ebellm Eric Bellm added a comment - I think ap_pipe needs to be a real, bona-fide CommandLineTask --we shouldn't reinvent the wheel here. If ingestion is causing the trouble let's refactor so we can deal with ingestion separately.
            Hide
            krzys Krzysztof Findeisen added a comment -

            This sounds like something Simon Krughoff might have an opinion on.

            Show
            krzys Krzysztof Findeisen added a comment - This sounds like something Simon Krughoff might have an opinion on.
            Hide
            krughoff Simon Krughoff added a comment -

            I thought from the beginning that going to CommandLineTask would probably make sense eventually. I thought the overhead initially would keep us from getting to a prototype system quickly, so I didn't suggest we do it from the outset. I'm all for promoting it to a CommandLineTask if it makes sense to do so.

            Show
            krughoff Simon Krughoff added a comment - I thought from the beginning that going to CommandLineTask would probably make sense eventually. I thought the overhead initially would keep us from getting to a prototype system quickly, so I didn't suggest we do it from the outset. I'm all for promoting it to a CommandLineTask if it makes sense to do so.
            Hide
            krzys Krzysztof Findeisen added a comment -

            I meant the scope change involved in making ingestion not part of ap_pipe, since IIRC you were pushing for ingestion being included.

            Show
            krzys Krzysztof Findeisen added a comment - I meant the scope change involved in making ingestion not part of ap_pipe , since IIRC you were pushing for ingestion being included.
            Hide
            krughoff Simon Krughoff added a comment -

            Oh, sorry. I didn't catch that.

            It seems like having something that does everything with the push of a button is necessary for putting this in CI, but it could certainly just be a shell script that calls two command line tasks. I don't know what the challenges are with putting the ingest and processing in the same CommandLineTask. Naively, it should "Just Work"™ since they are all CommandLineTasks, but maybe IngestTask is different enough that it matters.

            Show
            krughoff Simon Krughoff added a comment - Oh, sorry. I didn't catch that. It seems like having something that does everything with the push of a button is necessary for putting this in CI, but it could certainly just be a shell script that calls two command line tasks. I don't know what the challenges are with putting the ingest and processing in the same CommandLineTask . Naively, it should "Just Work"™ since they are all CommandLineTasks , but maybe IngestTask is different enough that it matters.
            Hide
            krzys Krzysztof Findeisen added a comment -

            IngestTask is not a CommandLineTask because CommandLineTasks require ingested data as their starting point – you can't specify their input except by using the Butler. That's the whole issue here.

            Show
            krzys Krzysztof Findeisen added a comment - IngestTask is not a CommandLineTask because CommandLineTasks require ingested data as their starting point – you can't specify their input except by using the Butler. That's the whole issue here.
            Hide
            krughoff Simon Krughoff added a comment -

            O.K. thanks for the explanation. I have no strong objection to ingest and processing being two different things. My only concern from an architecture stand point is that we make it possible to put some version of this in the CI system. I think the easiest way to do this is to have a single executable that the CI system can call, but I don't want to dictate the design of that.

            Show
            krughoff Simon Krughoff added a comment - O.K. thanks for the explanation. I have no strong objection to ingest and processing being two different things. My only concern from an architecture stand point is that we make it possible to put some version of this in the CI system. I think the easiest way to do this is to have a single executable that the CI system can call, but I don't want to dictate the design of that.
            Hide
            mrawls Meredith Rawls added a comment -

            Thanks for this discussion. Based on what has been said, I propose refactoring the current ap_pipe.runPipelineAlone into two pieces: one for camera-specific non-command-line task operations (ingestion and calib ingestion) and one for camera-agnostic command-line task operations (processing and difference imaging). The first piece would probably not be a CommandLineTask but the second piece would be.

            Advantages:

            • Allows ap_pipe to be run from the command line or by the CI system, albeit in two pieces rather than one (e.g., there would be two functions called sequentially in ap_pipe/bin.src/ap_pipe.py)
            • Lets the non-ingestion-related components of ap_pipe be a CommandLineTask as desired
            • Sets us up to extend the ingestion-related components of ap_pipe to support more than just DECam in the future

            Outstanding issues:

            • ap_pipe.doProcessCcd has some DECam-specific configs that are complex enough to require the use of run instead of parseAndRun
            • ap_pipe.doDiffIm will need to be updated to handle coadd templates by default first (see DM-11422)
            • ap_pipe.doIngest and ap_pipe.doIngestCalibs are both necessarily very DECam-specific; we should think about the best way to extend their functionalities to other cameras
            Show
            mrawls Meredith Rawls added a comment - Thanks for this discussion. Based on what has been said, I propose refactoring the current ap_pipe.runPipelineAlone into two pieces: one for camera-specific non -command-line task operations (ingestion and calib ingestion) and one for camera-agnostic command-line task operations (processing and difference imaging). The first piece would probably not be a CommandLineTask but the second piece would be. Advantages: Allows ap_pipe to be run from the command line or by the CI system, albeit in two pieces rather than one (e.g., there would be two functions called sequentially in ap_pipe/bin.src/ap_pipe.py ) Lets the non-ingestion-related components of ap_pipe be a CommandLineTask as desired Sets us up to extend the ingestion-related components of ap_pipe to support more than just DECam in the future Outstanding issues: ap_pipe.doProcessCcd has some DECam-specific configs that are complex enough to require the use of run instead of parseAndRun ap_pipe.doDiffIm will need to be updated to handle coadd templates by default first (see DM-11422 ) ap_pipe.doIngest and ap_pipe.doIngestCalibs are both necessarily very DECam-specific; we should think about the best way to extend their functionalities to other cameras
            mrawls Meredith Rawls made changes -
            Link This issue has to be done after DM-11422 [ DM-11422 ]
            mrawls Meredith Rawls made changes -
            Description Currently we envision {{ap_pipe}} as a library of functions that may be called by {{verify_ap}} or other drivers. It will likely be useful to be able to test {{ap_pipe}} independently of {{verify_ap}}. This ticket is to add a CommandLineTask utility within the {{ap_pipe}} package that utilizes the package library functions. Currently we envision {{ap_pipe}} as a library of functions that may be called by {{ap_verify}} or other drivers. It will likely be useful to be able to test {{ap_pipe}} independently of {{ap_verify}}. This ticket is to add a CommandLineTask utility within the {{ap_pipe}} package that utilizes the package library functions.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            While it's outside the scope of this ticket, it's worth asking how this redesign would affect ap_verify:

            • Will ap_verify still call individual pipeline steps (CCD processing, difference imaging, source association), or will it call ap_pipe as a monolithic (err, bilithic) Task? The latter approach would remove a lot of design constraints from ap_pipe, and make it more sensible to turn ap_verify into a Task, but would make it harder for ap_verify to be defensive about measurements from failed/incomplete pipeline executions.
            • Currently we're duplicating information about how datasets are organized in both ap_verify and ap_pipe. IIRC Eric Bellm said that ap_pipe should not be specific to the Dataset format, which was created to standardize verification; will the future "extend the ingestion-related components of ap_pipe" step take care of this?
            Show
            krzys Krzysztof Findeisen added a comment - - edited While it's outside the scope of this ticket, it's worth asking how this redesign would affect ap_verify : Will ap_verify still call individual pipeline steps (CCD processing, difference imaging, source association), or will it call ap_pipe as a monolithic (err, bilithic) Task? The latter approach would remove a lot of design constraints from ap_pipe , and make it more sensible to turn ap_verify into a Task, but would make it harder for ap_verify to be defensive about measurements from failed/incomplete pipeline executions. Currently we're duplicating information about how datasets are organized in both ap_verify and ap_pipe . IIRC Eric Bellm said that ap_pipe should not be specific to the Dataset format, which was created to standardize verification; will the future "extend the ingestion-related components of ap_pipe " step take care of this?
            Hide
            mrawls Meredith Rawls added a comment -

            Great questions Krzysztof Findeisen. My preference would be to keep ap_verify as calling each step of ap_pipe one at a time so it is easier to pinpoint problem areas metric-wise on a per-task basis. That said, if the new command-line/CI interface is so great that we want to switch, and we can still do what we want in terms of metrics, that would be fine too. Keep in mind that ap_pipe will eventually have additional steps post-difference-imaging.

            I'm not sure if the ingestion-related ap_pipe components are the best places for dataset wrangling or not. I think it will depend on how the obs_ packages evolve and how many camera-specific config settings we have to deal with to ingest images and calibration products. It might make sense to have an ap_pipe.loadDataset or something that runs prior to ingestion and not have ap_verify care where the dataset comes from at all? The best approach here isn't obvious to me, but I agree we should minimize duplicating functionality as much as possible.

            Show
            mrawls Meredith Rawls added a comment - Great questions Krzysztof Findeisen . My preference would be to keep ap_verify as calling each step of ap_pipe one at a time so it is easier to pinpoint problem areas metric-wise on a per-task basis. That said, if the new command-line/CI interface is so great that we want to switch, and we can still do what we want in terms of metrics, that would be fine too. Keep in mind that ap_pipe will eventually have additional steps post-difference-imaging. I'm not sure if the ingestion-related ap_pipe components are the best places for dataset wrangling or not. I think it will depend on how the obs_ packages evolve and how many camera-specific config settings we have to deal with to ingest images and calibration products. It might make sense to have an ap_pipe.loadDataset or something that runs prior to ingestion and not have ap_verify care where the dataset comes from at all? The best approach here isn't obvious to me, but I agree we should minimize duplicating functionality as much as possible.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Actually, wait – I think the two-piece script design still has the same basic problem. A CmdLineTask takes, as a command-line argument, the input data repository. Presumably what we actually pass as the input of ApPipeTask is the output of the ingestion process, but what do we tell users of the script (who need to specify the location of the data to be ingested either instead of or in addition to the repository)? The command line interface for the script is different from the interface for either of the two Tasks.

            This question applies not only to formal documentation, but also to the --help command-line argument.

            Show
            krzys Krzysztof Findeisen added a comment - Actually, wait – I think the two-piece script design still has the same basic problem. A CmdLineTask takes, as a command-line argument, the input data repository. Presumably what we actually pass as the input of ApPipeTask is the output of the ingestion process, but what do we tell users of the script (who need to specify the location of the data to be ingested either instead of or in addition to the repository)? The command line interface for the script is different from the interface for either of the two Tasks. This question applies not only to formal documentation, but also to the --help command-line argument.
            swinbank John Swinbank made changes -
            Sprint Alert Production F17 - 9 [ 639 ] Alert Production F17 - 9, Alert Production F17 - 10 [ 639, 643 ]
            swinbank John Swinbank made changes -
            Rank Ranked higher
            krzys Krzysztof Findeisen made changes -
            Component/s ap_pipe [ 14281 ]
            krzys Krzysztof Findeisen made changes -
            Link This issue relates to DM-12314 [ DM-12314 ]
            swinbank John Swinbank made changes -
            Sprint Alert Production F17 - 9, Alert Production F17 - 10 [ 639, 643 ] Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11 [ 639, 643, 644 ]
            swinbank John Swinbank made changes -
            Rank Ranked higher
            krzys Krzysztof Findeisen made changes -
            Link This issue blocks DM-12549 [ DM-12549 ]
            Hide
            krzys Krzysztof Findeisen added a comment -

            It looks like we can effectively remove command-line arguments from those imposed by pipe.base.ArgumentParser. However, this is would violate the substitution principle if implemented as a subclass of ArgumentParser, and it's likely that CmdLineTask assumes as part of its initialization code or as part of the TaskRunner framework that an input repository has been defined.

            Still, perhaps this can be used to salvage the two-piece design – we provide a strictly correct ArgumentParser for the command-line task portion, but use a modified version (via an object decorator?) in the calling script to supply ingestion-specific arguments and ensure the --help argument reflects the real UI.

            It's still hacky, of course, but I assume making ap_pipe a CmdLineTask is a temporary measure until Pipeline becomes available.

            Show
            krzys Krzysztof Findeisen added a comment - It looks like we can effectively remove command-line arguments from those imposed by pipe.base.ArgumentParser . However, this is would violate the substitution principle if implemented as a subclass of ArgumentParser , and it's likely that CmdLineTask assumes as part of its initialization code or as part of the TaskRunner framework that an input repository has been defined. Still, perhaps this can be used to salvage the two-piece design – we provide a strictly correct ArgumentParser for the command-line task portion, but use a modified version (via an object decorator?) in the calling script to supply ingestion-specific arguments and ensure the --help argument reflects the real UI. It's still hacky, of course, but I assume making ap_pipe a CmdLineTask is a temporary measure until Pipeline becomes available.
            krzys Krzysztof Findeisen made changes -
            Link This issue relates to DM-12853 [ DM-12853 ]
            krzys Krzysztof Findeisen made changes -
            Link This issue relates to DM-12867 [ DM-12867 ]
            swinbank John Swinbank made changes -
            Epic Link DM-10773 [ 32739 ] DM-12711 [ 36308 ]
            swinbank John Swinbank made changes -
            Sprint Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11 [ 639, 643, 644 ] Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-1 [ 639, 643, 644, 669 ]
            swinbank John Swinbank made changes -
            Rank Ranked higher
            swinbank John Swinbank made changes -
            Sprint Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-1 [ 639, 643, 644, 669 ] Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11 [ 639, 643, 644 ]
            ebellm Eric Bellm made changes -
            Story Points 3 6
            ebellm Eric Bellm made changes -
            Sprint Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11 [ 639, 643, 644 ] Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-2 [ 639, 643, 644, 677 ]
            ebellm Eric Bellm made changes -
            Rank Ranked lower
            Hide
            ebellm Eric Bellm added a comment -

            After much offline discussion, we have agreed to refactor ap_pipe so that ingestion is a separate script within the package but the core processing steps are a proper CmdLineTask. Accordingly I am going to close this ticket and make a new one that more explicitly (DM-13163).

            Show
            ebellm Eric Bellm added a comment - After much offline discussion, we have agreed to refactor ap_pipe so that ingestion is a separate script within the package but the core processing steps are a proper CmdLineTask . Accordingly I am going to close this ticket and make a new one that more explicitly ( DM-13163 ).
            ebellm Eric Bellm made changes -
            Resolution Done [ 10000 ]
            Status To Do [ 10001 ] Won't Fix [ 10405 ]

              People

              Assignee:
              mrawls Meredith Rawls
              Reporter:
              ebellm Eric Bellm
              Watchers:
              Eric Bellm, John Swinbank, Krzysztof Findeisen, Meredith Rawls, Simon Krughoff
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: