# IsrTask shoud use regular Input for raw data

XMLWordPrintable

## Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
1
• Team:
Data Release Production
• Urgent?:
No

## Description

Currently IsrTask connection are defined with all inputs being of PrerequisiteInput type. It should instead be using regular Input type for at least a "raw" type top constrain its inputs to only existing inputs, otherwise it will result in all possible combinations of visits/detectors being used.

## Activity

Hide
Andy Salnikov added a comment -

Show
Hide
Christopher Waters added a comment -

I have a feeling that I'm the only major opponent to the PrerequisiteInput -> Input migration, and so will implement this ticket without an RFC.  I have also filed DM-23765 to point out that this can create unwanted massive processing jobs.

Show
Christopher Waters added a comment - I have a feeling that I'm the only major opponent to the PrerequisiteInput -> Input migration, and so will implement this ticket without an RFC.  I have also filed  DM-23765 to point out that this can create unwanted massive processing jobs.
Hide
Krzysztof Findeisen added a comment -

The big drawback of that approach is that it makes it easy to accidentally... ask for all data in a huge input collection to be processed; the big advantage is that it allows one to define a small collection of inputs to be processed and then use it with no data ID expression at all. I'd certainly love to have others think on how to avoid the former while permitting the latter.

I suggest putting this question to an RFC. In my (also limited) experience we usually do want to process datasets in bulk, and one of the big selling points of Gen 3 was the claim that we would not need to configure pipelines with long lists of data IDs, like we sometimes need to in Gen 2.

Show
Krzysztof Findeisen added a comment - The big drawback of that approach is that it makes it easy to accidentally... ask for all data in a huge input collection to be processed; the big advantage is that it allows one to define a small collection of inputs to be processed and then use it with no data ID expression at all. I'd certainly love to have others think on how to avoid the former while permitting the latter. I suggest putting this question to an RFC. In my (also limited) experience we usually do want to process datasets in bulk, and one of the big selling points of Gen 3 was the claim that we would not need to configure pipelines with long lists of data IDs, like we sometimes need to in Gen 2.
Hide
Jim Bosch added a comment -

For a little more background, the Cartesian-product logic that's causing problems here exists because:

- for other dimensions, starting from a Cartesian product is what we want, as it's precisely what lets us generate reasonable output data IDs before the datasets for them ever could exist (e.g. make coadds for all combinations of tract+patch+filter that could be produced from otherwise-constrained inputs);

- I would prefer not to add special-casing to the exposure and/or detector dimensions to make them behave differently, especially because we already have a dataset (raw) whose existence constrains the Cartesian product down to what actually exists in the repository and collection.

Show
Jim Bosch added a comment - For a little more background, the Cartesian-product logic that's causing problems here exists because:  - for other dimensions, starting from a Cartesian product is what we want, as it's precisely what lets us generate reasonable output data IDs before the datasets for them ever could exist (e.g. make coadds for all combinations of tract+patch+filter that could be produced from otherwise-constrained inputs);  - I would prefer not to add special-casing to the exposure and/or detector dimensions to make them behave differently, especially because we already have a dataset (raw) whose existence constrains the Cartesian product down to what actually exists in the repository and collection.
Hide
Jim Bosch added a comment -

Christopher Waters and I discussed this offline, and while I think there are arguments for both sides, my preference is probably to do as this ticket requests, and make raw a regular (non-prerequisite) input of IsrTask.  The big drawback of that approach is that it makes it easy to accidentally (i.e. by leaving off the -d argument entirely) ask for all data in a huge input collection to be processed; the big advantage is that it allows one to define a small collection of inputs to be processed and then use it with no data ID expression at all.  I'd certainly love to have others think on how to avoid the former while permitting the latter.  One possibility is to make the -d option required and have an explicit special expression that means "everything"; while the current approach in which no expression implies "everything" is mathematically natural (given that nontrivial expressions represent constraints, and a lack of constraints implies everything), perhaps practicality should trump naturalness here.

I also think making raw a regular input makes IsrTask feel like it behaves somewhat more like other PipelineTasks (also good), though it's always going to be somewhat special in that raw is never going to be produced by another PipelineTask.

Show
Jim Bosch added a comment - Christopher Waters and I discussed this offline, and while I think there are arguments for both sides, my preference is probably to do as this ticket requests, and make raw a regular (non-prerequisite) input of IsrTask .  The big drawback of that approach is that it makes it easy to accidentally (i.e. by leaving off the  -d argument entirely) ask for all data in a huge input collection to be processed; the big advantage is that it allows one to define a small collection of inputs to be processed and then use it with no data ID expression at all.  I'd certainly love to have others think on how to avoid the former while permitting the latter.  One possibility is to make the  -d option required and have an explicit special expression that means "everything"; while the current approach in which no expression implies "everything" is mathematically natural (given that nontrivial expressions represent constraints, and a lack of constraints implies everything), perhaps practicality should trump naturalness here. I also think making  raw a regular input makes  IsrTask feel like it behaves somewhat more like other PipelineTasks (also good), though it's always going to be somewhat special in that raw is never going to be produced by another PipelineTask .

## People

• Assignee:
Christopher Waters
Reporter:
Andy Salnikov
Reviewers:
Andy Salnikov
Watchers:
Andy Salnikov, Christopher Waters, Jim Bosch, John Parejko, Krzysztof Findeisen, Tim Jenness