Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11372

Create CommandLineTask utility in `ap_pipe`

    Details

    • Story Points:
      6
    • Epic Link:
    • Sprint:
      Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-2
    • Team:
      Alert Production

      Description

      Currently we envision ap_pipe as a library of functions that may be called by ap_verify or other drivers. It will likely be useful to be able to test ap_pipe independently of ap_verify. This ticket is to add a CommandLineTask utility within the ap_pipe package that utilizes the package library functions.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            While it's outside the scope of this ticket, it's worth asking how this redesign would affect ap_verify:

            • Will ap_verify still call individual pipeline steps (CCD processing, difference imaging, source association), or will it call ap_pipe as a monolithic (err, bilithic) Task? The latter approach would remove a lot of design constraints from ap_pipe, and make it more sensible to turn ap_verify into a Task, but would make it harder for ap_verify to be defensive about measurements from failed/incomplete pipeline executions.
            • Currently we're duplicating information about how datasets are organized in both ap_verify and ap_pipe. IIRC Eric Bellm said that ap_pipe should not be specific to the Dataset format, which was created to standardize verification; will the future "extend the ingestion-related components of ap_pipe" step take care of this?
            Show
            krzys Krzysztof Findeisen added a comment - - edited While it's outside the scope of this ticket, it's worth asking how this redesign would affect ap_verify : Will ap_verify still call individual pipeline steps (CCD processing, difference imaging, source association), or will it call ap_pipe as a monolithic (err, bilithic) Task? The latter approach would remove a lot of design constraints from ap_pipe , and make it more sensible to turn ap_verify into a Task, but would make it harder for ap_verify to be defensive about measurements from failed/incomplete pipeline executions. Currently we're duplicating information about how datasets are organized in both ap_verify and ap_pipe . IIRC Eric Bellm said that ap_pipe should not be specific to the Dataset format, which was created to standardize verification; will the future "extend the ingestion-related components of ap_pipe " step take care of this?
            Hide
            mrawls Meredith Rawls added a comment -

            Great questions Krzysztof Findeisen. My preference would be to keep ap_verify as calling each step of ap_pipe one at a time so it is easier to pinpoint problem areas metric-wise on a per-task basis. That said, if the new command-line/CI interface is so great that we want to switch, and we can still do what we want in terms of metrics, that would be fine too. Keep in mind that ap_pipe will eventually have additional steps post-difference-imaging.

            I'm not sure if the ingestion-related ap_pipe components are the best places for dataset wrangling or not. I think it will depend on how the obs_ packages evolve and how many camera-specific config settings we have to deal with to ingest images and calibration products. It might make sense to have an ap_pipe.loadDataset or something that runs prior to ingestion and not have ap_verify care where the dataset comes from at all? The best approach here isn't obvious to me, but I agree we should minimize duplicating functionality as much as possible.

            Show
            mrawls Meredith Rawls added a comment - Great questions Krzysztof Findeisen . My preference would be to keep ap_verify as calling each step of ap_pipe one at a time so it is easier to pinpoint problem areas metric-wise on a per-task basis. That said, if the new command-line/CI interface is so great that we want to switch, and we can still do what we want in terms of metrics, that would be fine too. Keep in mind that ap_pipe will eventually have additional steps post-difference-imaging. I'm not sure if the ingestion-related ap_pipe components are the best places for dataset wrangling or not. I think it will depend on how the obs_ packages evolve and how many camera-specific config settings we have to deal with to ingest images and calibration products. It might make sense to have an ap_pipe.loadDataset or something that runs prior to ingestion and not have ap_verify care where the dataset comes from at all? The best approach here isn't obvious to me, but I agree we should minimize duplicating functionality as much as possible.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Actually, wait – I think the two-piece script design still has the same basic problem. A CmdLineTask takes, as a command-line argument, the input data repository. Presumably what we actually pass as the input of ApPipeTask is the output of the ingestion process, but what do we tell users of the script (who need to specify the location of the data to be ingested either instead of or in addition to the repository)? The command line interface for the script is different from the interface for either of the two Tasks.

            This question applies not only to formal documentation, but also to the --help command-line argument.

            Show
            krzys Krzysztof Findeisen added a comment - Actually, wait – I think the two-piece script design still has the same basic problem. A CmdLineTask takes, as a command-line argument, the input data repository. Presumably what we actually pass as the input of ApPipeTask is the output of the ingestion process, but what do we tell users of the script (who need to specify the location of the data to be ingested either instead of or in addition to the repository)? The command line interface for the script is different from the interface for either of the two Tasks. This question applies not only to formal documentation, but also to the --help command-line argument.
            Hide
            krzys Krzysztof Findeisen added a comment -

            It looks like we can effectively remove command-line arguments from those imposed by pipe.base.ArgumentParser. However, this is would violate the substitution principle if implemented as a subclass of ArgumentParser, and it's likely that CmdLineTask assumes as part of its initialization code or as part of the TaskRunner framework that an input repository has been defined.

            Still, perhaps this can be used to salvage the two-piece design – we provide a strictly correct ArgumentParser for the command-line task portion, but use a modified version (via an object decorator?) in the calling script to supply ingestion-specific arguments and ensure the --help argument reflects the real UI.

            It's still hacky, of course, but I assume making ap_pipe a CmdLineTask is a temporary measure until Pipeline becomes available.

            Show
            krzys Krzysztof Findeisen added a comment - It looks like we can effectively remove command-line arguments from those imposed by pipe.base.ArgumentParser . However, this is would violate the substitution principle if implemented as a subclass of ArgumentParser , and it's likely that CmdLineTask assumes as part of its initialization code or as part of the TaskRunner framework that an input repository has been defined. Still, perhaps this can be used to salvage the two-piece design – we provide a strictly correct ArgumentParser for the command-line task portion, but use a modified version (via an object decorator?) in the calling script to supply ingestion-specific arguments and ensure the --help argument reflects the real UI. It's still hacky, of course, but I assume making ap_pipe a CmdLineTask is a temporary measure until Pipeline becomes available.
            Hide
            ebellm Eric Bellm added a comment -

            After much offline discussion, we have agreed to refactor ap_pipe so that ingestion is a separate script within the package but the core processing steps are a proper CmdLineTask. Accordingly I am going to close this ticket and make a new one that more explicitly (DM-13163).

            Show
            ebellm Eric Bellm added a comment - After much offline discussion, we have agreed to refactor ap_pipe so that ingestion is a separate script within the package but the core processing steps are a proper CmdLineTask . Accordingly I am going to close this ticket and make a new one that more explicitly ( DM-13163 ).

              People

              • Assignee:
                mrawls Meredith Rawls
                Reporter:
                ebellm Eric Bellm
                Watchers:
                Eric Bellm, John Swinbank, Krzysztof Findeisen, Meredith Rawls, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel