# Clean up handling of extra data ID information in SuperTask.run

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s: None
• Labels:
• Story Points:
4
• Epic Link:
• Sprint:
BG3_F18_07
• Team:
Data Access and Database

#### Description

The default implementation of SuperTask.runQuantum currently passes additional output data ID information to run, as this is necessary in at least some contexts in which run needs to be able to group input datasets.  However, the way this is passed is confusing (the kwargs generated do not have names that suggest that they are IDs), and the need for these IDs may be sufficiently rare that most SuperTasks should not be required to accept them.

One possibility for how to address this would be:

• runQuantum always passes just a single data ID (the quantum data ID, not the data of either inputs or outputs) to run, as an always-optional dataId keyword argument (i.e. SuperTasks must permit this argument to be None).  That will at least meet the needs of SuperTasks that want to use the data ID for diagnostic or custom-provenance purposes (see also DM-14821).
• SuperTasks that need to do data ID grouping in run should override runQuantum themselves.
• To make the above easier / less verbose, we should look for ways to make some of the logic in the default implementation of runQuantum available to subclasses that override that method (e.g. via utility methods that do some of the work).

I'm open to other ideas as well, and I should note that I have not thought much about how this proposal would change which of our concrete SuperTasks would need to override runQuantum.

#### Activity

Hide
Andy Salnikov added a comment - - edited

I think that if we want output units to be optional (not used by most concrete tasks) then run() method cannot even have them in the argument list (but we still want all input data there). I can imagine few ways to design/implement that:

1. have run() method to only take inputs, most normal tasks will override that method. In addition to run() method have some other method, e.g. runWithOutputs(inputData: dict, outputUnits: dict) which has a default implementation calling run():

  def runWithOutputs(self, inputData, outputUnits):  """default implementation calls run() with only input data as kw arguments"""  return self.run(**inputData)    def run(self, **kw):  # subclass will override this with actual input args, e.g. "def run(self, input1, input2):" 

The runQuantum() will call this new method instead of run(). Tasks that need access to output units would have to override runWithOutputs() method.

1. variation of this is to skip runWithOutputs() entirely and let people override runQuantum() instead. But most or all of the logic in runQuantum() has to be reimplemented anyways, so I think that intermediate method is a little bit cleaner approach.
2. one more option is to provide separate method or property which returns output units for a currently processed Quanta, the units will be remembered by runQuantum() before calling run() method. I'm not particularly happy with this idea, I would prefer passing all context via method arguments, but if this simplifies life for many users then it's probably OK (and I don't think we'll ever use same Task instance to process more than one Quanta in parallel).
Show
Andy Salnikov added a comment - - edited I think that if we want output units to be optional (not used by most concrete tasks) then run() method cannot even have them in the argument list (but we still want all input data there). I can imagine few ways to design/implement that: have run() method to only take inputs, most normal tasks will override that method. In addition to run() method have some other method, e.g. runWithOutputs(inputData: dict, outputUnits: dict) which has a default implementation calling run() : def runWithOutputs( self , inputData, outputUnits): "" "default implementation calls run() with only input data as kw arguments" "" return self .run( * * inputData)   def run( self , * * kw): # subclass will override this with actual input args, e.g. "def run(self, input1, input2):" The runQuantum() will call this new method instead of run() . Tasks that need access to output units would have to override runWithOutputs() method. variation of this is to skip runWithOutputs() entirely and let people override runQuantum() instead. But most or all of the logic in runQuantum()  has to be reimplemented anyways, so I think that intermediate method is a little bit cleaner approach. one more option is to provide separate method or property which returns output units for a currently processed Quanta, the units will be remembered by runQuantum() before calling run()  method. I'm not particularly happy with this idea, I would prefer passing all context via method arguments, but if this simplifies life for many users then it's probably OK (and I don't think we'll ever use same Task instance to process more than one Quanta in parallel).
Hide
Andy Salnikov added a comment -

Jim Bosch, if you have no other preference I'll go ahead with my option 1 from the list above.

Show
Andy Salnikov added a comment - Jim Bosch , if you have no other preference I'll go ahead with my option 1 from the list above.
Hide
Jim Bosch added a comment -

Sorry I didn't get to this earlier.  (1) would at least be okay; it certainly solves this problem.  The option that looks like it was supposed to be (2) (but is also labeled (1)) might be better if we can find some way to refactor some of the logic in runQuantum into utility methods in order to make it easier to override runQuantum directly.  Nate Lust and I have already identified (on DM-14819) a piece of shared code (for reading reference catalogs) that does some specialized I/O; that would naturally go best in an override of runQuantum, so we have at least one other use case for wanting to override runQuantum.  But we haven't yet looked to see if that kind of refactoring into utility methods is feasible.

Show
Jim Bosch added a comment - Sorry I didn't get to this earlier.  (1) would at least be okay; it certainly solves this problem.  The option that looks like it was supposed to be (2) (but is also labeled (1)) might be better if we can find some way to refactor some of the logic in runQuantum into utility methods in order to make it easier to override runQuantum directly.  Nate Lust and I have already identified (on DM-14819 ) a piece of shared code (for reading reference catalogs) that does some specialized I/O; that would naturally go best in an override of runQuantum , so we have at least one other use case for wanting to override runQuantum .  But we haven't yet looked to see if that kind of refactoring into utility methods is feasible.
Hide
Andy Salnikov added a comment -

Here is my attempt to improve things. I'm not particularly happy with it but I think we can at least freeze API for run() method at this point as we are preparing to merge it on Friday. Any suggestions for better name of runWithOutputs() method are welcome.

I also think we may want to do something with Struct in case when task returns output data for multiple DataIds. Right now code expects it to be a list in the same order as DataRefs in Quantum, but that feels somewhat fragile. It may be better to return a dict with DataIds as keys, but that would complicate things for single-DataId case.

Show
Andy Salnikov added a comment - Here is my attempt to improve things. I'm not particularly happy with it but I think we can at least freeze API for run() method at this point as we are preparing to merge it on Friday. Any suggestions for better name of runWithOutputs()  method are welcome.  I also think we may want to do something with Struct  in case when task returns output data for multiple DataIds. Right now code expects it to be a list in the same order as DataRefs in Quantum, but that feels somewhat fragile. It may be better to return a dict with DataIds as keys, but that would complicate things for single-DataId case.
Hide
Jim Bosch added a comment -

Comments on the PR.  I'd also like Nate Lust to take a quick look, as he may have some ideas about how to make runQuantum easier to override.

Show
Jim Bosch added a comment - Comments on the PR.  I'd also like Nate Lust to take a quick look, as he may have some ideas about how to make runQuantum easier to override.
Hide
Andy Salnikov added a comment -

Ready to merge (after Jenkins finishes for this branch), Jim Bosch can you mark JIRA as reviewed if there are no more questions/suggestions?

Show
Andy Salnikov added a comment - Ready to merge (after Jenkins finishes for this branch), Jim Bosch can you mark JIRA as reviewed if there are no more questions/suggestions?
Hide
Jim Bosch added a comment -

Review complete!  Sorry I forgot to do that here before.

Show
Jim Bosch added a comment - Review complete!  Sorry I forgot to do that here before.
Hide
Andy Salnikov added a comment -

Thanks for review! Jenkins passed for Centos jobs, still in progress for osx. Merged both packages, done.

Show
Andy Salnikov added a comment - Thanks for review! Jenkins passed for Centos jobs, still in progress for osx. Merged both packages, done.

#### People

Assignee:
Andy Salnikov
Reporter:
Jim Bosch
Reviewers:
Jim Bosch
Watchers:
Andy Salnikov, Jim Bosch
Votes:
0 Vote for this issue
Watchers:
2 Start watching this issue

#### Dates

Created:
Updated:
Resolved:

#### CI Builds

No builds found.