Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Labels:
-
Story Points:4
-
Epic Link:
-
Sprint:BG3_F18_07
-
Team:Data Access and Database
Description
The default implementation of SuperTask.runQuantum currently passes additional output data ID information to run, as this is necessary in at least some contexts in which run needs to be able to group input datasets. However, the way this is passed is confusing (the kwargs generated do not have names that suggest that they are IDs), and the need for these IDs may be sufficiently rare that most SuperTasks should not be required to accept them.
One possibility for how to address this would be:
- runQuantum always passes just a single data ID (the quantum data ID, not the data of either inputs or outputs) to run, as an always-optional dataId keyword argument (i.e. SuperTasks must permit this argument to be None). That will at least meet the needs of SuperTasks that want to use the data ID for diagnostic or custom-provenance purposes (see also
DM-14821). - SuperTasks that need to do data ID grouping in run should override runQuantum themselves.
- To make the above easier / less verbose, we should look for ways to make some of the logic in the default implementation of runQuantum available to subclasses that override that method (e.g. via utility methods that do some of the work).
I'm open to other ideas as well, and I should note that I have not thought much about how this proposal would change which of our concrete SuperTasks would need to override runQuantum.
Attachments
Issue Links
- blocks
-
DM-14816 Convert all concrete CmdLineTasks to PipelineTasks
- Invalid
Thanks for review! Jenkins passed for Centos jobs, still in progress for osx. Merged both packages, done.