# Create CommandLineTask utility in ap_pipe

XMLWordPrintable

#### Details

• Type: Story
• Status: Won't Fix
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
6
• Sprint:
Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-2
• Team:

#### Description

Currently we envision ap_pipe as a library of functions that may be called by ap_verify or other drivers. It will likely be useful to be able to test ap_pipe independently of ap_verify. This ticket is to add a CommandLineTask utility within the ap_pipe package that utilizes the package library functions.

#### Activity

Eric Bellm created issue -
Field Original Value New Value
Epic Link DM-10773 [ 32739 ]
 Component/s Alert Production [ 13900 ]
 Description Currently we envision ap_pipe as a library of functions that may be called by verify_ap or other drivers. It will likely be useful to be able to test ap_pipe independently of verify_ap. This ticket is to add a CommandLineTask utility within the ap_pipe package that utilizes the package library functions. Currently we envision {{ap_pipe}} as a library of functions that may be called by {{verify_ap}} or other drivers. It will likely be useful to be able to test {{ap_pipe}} independently of {{verify_ap}}. This ticket is to add a CommandLineTask utility within the {{ap_pipe}} package that utilizes the package library functions.
 Assignee Meredith Rawls [ mrawls ]
Hide
Eric Bellm added a comment -

It may be useful to have this emulate the structure of the pipe_driver scripts.

Show
Eric Bellm added a comment - It may be useful to have this emulate the structure of the pipe_driver scripts.
 Sprint Alert Production F17 - 9 [ 639 ]
Hide
Meredith Rawls added a comment -

While ap_pipe doesn't have an official CommandLineTask utility at present, it does have the runPipelineAlone function which handles basic argument parsing and lets the user do exactly what it sounds like (run ap_pipe from the command line without touching ap_verify). I plan to keep this basic functionality as ap_pipe evolves. An example of how to run ap_pipe is given in DMTN-039.

Do we still think it is necessary to make this functionality a proper CommandLineTask? I'm concerned that the fact that ap_pipe begins by ingesting a non-Butler dataset of raw images would make this challenging and require changing things in pipe_base a fair bit.

Show
Meredith Rawls added a comment - While ap_pipe doesn't have an official CommandLineTask utility at present, it does have the runPipelineAlone function which handles basic argument parsing and lets the user do exactly what it sounds like (run ap_pipe from the command line without touching ap_verify ). I plan to keep this basic functionality as ap_pipe evolves. An example of how to run ap_pipe is given in DMTN-039 . Do we still think it is necessary to make this functionality a proper CommandLineTask ? I'm concerned that the fact that ap_pipe begins by ingesting a non-Butler dataset of raw images would make this challenging and require changing things in pipe_base a fair bit.
Hide
Eric Bellm added a comment -

I think ap_pipe needs to be a real, bona-fide CommandLineTask--we shouldn't reinvent the wheel here. If ingestion is causing the trouble let's refactor so we can deal with ingestion separately.

Show
Eric Bellm added a comment - I think ap_pipe needs to be a real, bona-fide CommandLineTask --we shouldn't reinvent the wheel here. If ingestion is causing the trouble let's refactor so we can deal with ingestion separately.
Hide
Krzysztof Findeisen added a comment -

This sounds like something Simon Krughoff might have an opinion on.

Show
Krzysztof Findeisen added a comment - This sounds like something Simon Krughoff might have an opinion on.
Hide
Simon Krughoff added a comment -

I thought from the beginning that going to CommandLineTask would probably make sense eventually. I thought the overhead initially would keep us from getting to a prototype system quickly, so I didn't suggest we do it from the outset. I'm all for promoting it to a CommandLineTask if it makes sense to do so.

Show
Simon Krughoff added a comment - I thought from the beginning that going to CommandLineTask would probably make sense eventually. I thought the overhead initially would keep us from getting to a prototype system quickly, so I didn't suggest we do it from the outset. I'm all for promoting it to a CommandLineTask if it makes sense to do so.
Hide
Krzysztof Findeisen added a comment -

I meant the scope change involved in making ingestion not part of ap_pipe, since IIRC you were pushing for ingestion being included.

Show
Krzysztof Findeisen added a comment - I meant the scope change involved in making ingestion not part of ap_pipe , since IIRC you were pushing for ingestion being included.
Hide
Simon Krughoff added a comment -

Oh, sorry. I didn't catch that.

It seems like having something that does everything with the push of a button is necessary for putting this in CI, but it could certainly just be a shell script that calls two command line tasks. I don't know what the challenges are with putting the ingest and processing in the same CommandLineTask. Naively, it should "Just Work"™ since they are all CommandLineTasks, but maybe IngestTask is different enough that it matters.

Show
Simon Krughoff added a comment - Oh, sorry. I didn't catch that. It seems like having something that does everything with the push of a button is necessary for putting this in CI, but it could certainly just be a shell script that calls two command line tasks. I don't know what the challenges are with putting the ingest and processing in the same CommandLineTask . Naively, it should "Just Work"™ since they are all CommandLineTasks , but maybe IngestTask is different enough that it matters.
Hide
Krzysztof Findeisen added a comment -

IngestTask is not a CommandLineTask because CommandLineTasks require ingested data as their starting point – you can't specify their input except by using the Butler. That's the whole issue here.

Show
Krzysztof Findeisen added a comment - IngestTask is not a CommandLineTask because CommandLineTasks require ingested data as their starting point – you can't specify their input except by using the Butler. That's the whole issue here.
Hide
Simon Krughoff added a comment -

O.K. thanks for the explanation. I have no strong objection to ingest and processing being two different things. My only concern from an architecture stand point is that we make it possible to put some version of this in the CI system. I think the easiest way to do this is to have a single executable that the CI system can call, but I don't want to dictate the design of that.

Show
Simon Krughoff added a comment - O.K. thanks for the explanation. I have no strong objection to ingest and processing being two different things. My only concern from an architecture stand point is that we make it possible to put some version of this in the CI system. I think the easiest way to do this is to have a single executable that the CI system can call, but I don't want to dictate the design of that.
Hide
Meredith Rawls added a comment -

Thanks for this discussion. Based on what has been said, I propose refactoring the current ap_pipe.runPipelineAlone into two pieces: one for camera-specific non-command-line task operations (ingestion and calib ingestion) and one for camera-agnostic command-line task operations (processing and difference imaging). The first piece would probably not be a CommandLineTask but the second piece would be.

• Allows ap_pipe to be run from the command line or by the CI system, albeit in two pieces rather than one (e.g., there would be two functions called sequentially in ap_pipe/bin.src/ap_pipe.py)
• Lets the non-ingestion-related components of ap_pipe be a CommandLineTask as desired
• Sets us up to extend the ingestion-related components of ap_pipe to support more than just DECam in the future

Outstanding issues:

• ap_pipe.doProcessCcd has some DECam-specific configs that are complex enough to require the use of run instead of parseAndRun
• ap_pipe.doDiffIm will need to be updated to handle coadd templates by default first (see DM-11422)
• ap_pipe.doIngest and ap_pipe.doIngestCalibs are both necessarily very DECam-specific; we should think about the best way to extend their functionalities to other cameras
Show
Meredith Rawls added a comment - Thanks for this discussion. Based on what has been said, I propose refactoring the current ap_pipe.runPipelineAlone into two pieces: one for camera-specific non -command-line task operations (ingestion and calib ingestion) and one for camera-agnostic command-line task operations (processing and difference imaging). The first piece would probably not be a CommandLineTask but the second piece would be. Advantages: Allows ap_pipe to be run from the command line or by the CI system, albeit in two pieces rather than one (e.g., there would be two functions called sequentially in ap_pipe/bin.src/ap_pipe.py ) Lets the non-ingestion-related components of ap_pipe be a CommandLineTask as desired Sets us up to extend the ingestion-related components of ap_pipe to support more than just DECam in the future Outstanding issues: ap_pipe.doProcessCcd has some DECam-specific configs that are complex enough to require the use of run instead of parseAndRun ap_pipe.doDiffIm will need to be updated to handle coadd templates by default first (see DM-11422 ) ap_pipe.doIngest and ap_pipe.doIngestCalibs are both necessarily very DECam-specific; we should think about the best way to extend their functionalities to other cameras
 Link This issue has to be done after DM-11422 [ DM-11422 ]
 Description Currently we envision {{ap_pipe}} as a library of functions that may be called by {{verify_ap}} or other drivers. It will likely be useful to be able to test {{ap_pipe}} independently of {{verify_ap}}. This ticket is to add a CommandLineTask utility within the {{ap_pipe}} package that utilizes the package library functions. Currently we envision {{ap_pipe}} as a library of functions that may be called by {{ap_verify}} or other drivers. It will likely be useful to be able to test {{ap_pipe}} independently of {{ap_verify}}. This ticket is to add a CommandLineTask utility within the {{ap_pipe}} package that utilizes the package library functions.
Hide
Krzysztof Findeisen added a comment - - edited

While it's outside the scope of this ticket, it's worth asking how this redesign would affect ap_verify:

• Will ap_verify still call individual pipeline steps (CCD processing, difference imaging, source association), or will it call ap_pipe as a monolithic (err, bilithic) Task? The latter approach would remove a lot of design constraints from ap_pipe, and make it more sensible to turn ap_verify into a Task, but would make it harder for ap_verify to be defensive about measurements from failed/incomplete pipeline executions.
• Currently we're duplicating information about how datasets are organized in both ap_verify and ap_pipe. IIRC Eric Bellm said that ap_pipe should not be specific to the Dataset format, which was created to standardize verification; will the future "extend the ingestion-related components of ap_pipe" step take care of this?
Show
Krzysztof Findeisen added a comment - - edited While it's outside the scope of this ticket, it's worth asking how this redesign would affect ap_verify : Will ap_verify still call individual pipeline steps (CCD processing, difference imaging, source association), or will it call ap_pipe as a monolithic (err, bilithic) Task? The latter approach would remove a lot of design constraints from ap_pipe , and make it more sensible to turn ap_verify into a Task, but would make it harder for ap_verify to be defensive about measurements from failed/incomplete pipeline executions. Currently we're duplicating information about how datasets are organized in both ap_verify and ap_pipe . IIRC Eric Bellm said that ap_pipe should not be specific to the Dataset format, which was created to standardize verification; will the future "extend the ingestion-related components of ap_pipe " step take care of this?
Hide
Meredith Rawls added a comment -

Great questions Krzysztof Findeisen. My preference would be to keep ap_verify as calling each step of ap_pipe one at a time so it is easier to pinpoint problem areas metric-wise on a per-task basis. That said, if the new command-line/CI interface is so great that we want to switch, and we can still do what we want in terms of metrics, that would be fine too. Keep in mind that ap_pipe will eventually have additional steps post-difference-imaging.

I'm not sure if the ingestion-related ap_pipe components are the best places for dataset wrangling or not. I think it will depend on how the obs_ packages evolve and how many camera-specific config settings we have to deal with to ingest images and calibration products. It might make sense to have an ap_pipe.loadDataset or something that runs prior to ingestion and not have ap_verify care where the dataset comes from at all? The best approach here isn't obvious to me, but I agree we should minimize duplicating functionality as much as possible.

Show
Meredith Rawls added a comment - Great questions Krzysztof Findeisen . My preference would be to keep ap_verify as calling each step of ap_pipe one at a time so it is easier to pinpoint problem areas metric-wise on a per-task basis. That said, if the new command-line/CI interface is so great that we want to switch, and we can still do what we want in terms of metrics, that would be fine too. Keep in mind that ap_pipe will eventually have additional steps post-difference-imaging. I'm not sure if the ingestion-related ap_pipe components are the best places for dataset wrangling or not. I think it will depend on how the obs_ packages evolve and how many camera-specific config settings we have to deal with to ingest images and calibration products. It might make sense to have an ap_pipe.loadDataset or something that runs prior to ingestion and not have ap_verify care where the dataset comes from at all? The best approach here isn't obvious to me, but I agree we should minimize duplicating functionality as much as possible.
Hide
Krzysztof Findeisen added a comment -

Actually, wait – I think the two-piece script design still has the same basic problem. A CmdLineTask takes, as a command-line argument, the input data repository. Presumably what we actually pass as the input of ApPipeTask is the output of the ingestion process, but what do we tell users of the script (who need to specify the location of the data to be ingested either instead of or in addition to the repository)? The command line interface for the script is different from the interface for either of the two Tasks.

This question applies not only to formal documentation, but also to the --help command-line argument.

Show
Krzysztof Findeisen added a comment - Actually, wait – I think the two-piece script design still has the same basic problem. A CmdLineTask takes, as a command-line argument, the input data repository. Presumably what we actually pass as the input of ApPipeTask is the output of the ingestion process, but what do we tell users of the script (who need to specify the location of the data to be ingested either instead of or in addition to the repository)? The command line interface for the script is different from the interface for either of the two Tasks. This question applies not only to formal documentation, but also to the --help command-line argument.
 Sprint Alert Production F17 - 9 [ 639 ] Alert Production F17 - 9, Alert Production F17 - 10 [ 639, 643 ]
 Rank Ranked higher
 Component/s ap_pipe [ 14281 ]
 Link This issue relates to DM-12314 [ DM-12314 ]
 Sprint Alert Production F17 - 9, Alert Production F17 - 10 [ 639, 643 ] Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11 [ 639, 643, 644 ]
 Rank Ranked higher
 Link This issue blocks DM-12549 [ DM-12549 ]
Hide
Krzysztof Findeisen added a comment -

It looks like we can effectively remove command-line arguments from those imposed by pipe.base.ArgumentParser. However, this is would violate the substitution principle if implemented as a subclass of ArgumentParser, and it's likely that CmdLineTask assumes as part of its initialization code or as part of the TaskRunner framework that an input repository has been defined.

Still, perhaps this can be used to salvage the two-piece design – we provide a strictly correct ArgumentParser for the command-line task portion, but use a modified version (via an object decorator?) in the calling script to supply ingestion-specific arguments and ensure the --help argument reflects the real UI.

It's still hacky, of course, but I assume making ap_pipe a CmdLineTask is a temporary measure until Pipeline becomes available.

Show
Krzysztof Findeisen added a comment - It looks like we can effectively remove command-line arguments from those imposed by pipe.base.ArgumentParser . However, this is would violate the substitution principle if implemented as a subclass of ArgumentParser , and it's likely that CmdLineTask assumes as part of its initialization code or as part of the TaskRunner framework that an input repository has been defined. Still, perhaps this can be used to salvage the two-piece design – we provide a strictly correct ArgumentParser for the command-line task portion, but use a modified version (via an object decorator?) in the calling script to supply ingestion-specific arguments and ensure the --help argument reflects the real UI. It's still hacky, of course, but I assume making ap_pipe a CmdLineTask is a temporary measure until Pipeline becomes available.
 Link This issue relates to DM-12853 [ DM-12853 ]
 Link This issue relates to DM-12867 [ DM-12867 ]
 Epic Link DM-10773 [ 32739 ] DM-12711 [ 36308 ]
 Sprint Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11 [ 639, 643, 644 ] Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-1 [ 639, 643, 644, 669 ]
 Rank Ranked higher
 Sprint Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-1 [ 639, 643, 644, 669 ] Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11 [ 639, 643, 644 ]
 Story Points 3 6
 Sprint Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11 [ 639, 643, 644 ] Alert Production F17 - 9, Alert Production F17 - 10, Alert Production F17 - 11, AP S18-2 [ 639, 643, 644, 677 ]
 Rank Ranked lower
Hide
Eric Bellm added a comment -

After much offline discussion, we have agreed to refactor ap_pipe so that ingestion is a separate script within the package but the core processing steps are a proper CmdLineTask. Accordingly I am going to close this ticket and make a new one that more explicitly (DM-13163).

Show
Eric Bellm added a comment - After much offline discussion, we have agreed to refactor ap_pipe so that ingestion is a separate script within the package but the core processing steps are a proper CmdLineTask . Accordingly I am going to close this ticket and make a new one that more explicitly ( DM-13163 ).
 Resolution Done [ 10000 ] Status To Do [ 10001 ] Won't Fix [ 10405 ]

#### People

Assignee:
Meredith Rawls
Reporter:
Eric Bellm
Watchers:
Eric Bellm, John Swinbank, Krzysztof Findeisen, Meredith Rawls, Simon Krughoff