The default implementation of SuperTask.runQuantum currently passes additional output data ID information to run, as this is necessary in at least some contexts in which run needs to be able to group input datasets. However, the way this is passed is confusing (the kwargs generated do not have names that suggest that they are IDs), and the need for these IDs may be sufficiently rare that most SuperTasks should not be required to accept them.
One possibility for how to address this would be:
- runQuantum always passes just a single data ID (the quantum data ID, not the data of either inputs or outputs) to run, as an always-optional dataId keyword argument (i.e. SuperTasks must permit this argument to be None). That will at least meet the needs of SuperTasks that want to use the data ID for diagnostic or custom-provenance purposes (see also
- SuperTasks that need to do data ID grouping in run should override runQuantum themselves.
- To make the above easier / less verbose, we should look for ways to make some of the logic in the default implementation of runQuantum available to subclasses that override that method (e.g. via utility methods that do some of the work).
I'm open to other ideas as well, and I should note that I have not thought much about how this proposal would change which of our concrete SuperTasks would need to override runQuantum.