# PipelineTask should always use overridable methods to get DatasetTypes

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
1
• Sprint:
BG3_F18_10
• Team:
Data Access and Database

#### Description

It looks like PipelineTask.runQuantum (at least) uses iteration through the Config instance to find all of the input and output DatasetTypes.  This at least partially ignores the get*DatasetType methods, which should be permitted to add and modify the set of DatasetTypes processed.

Fixing this may involving changing the signatures of the get*DatasetType methods a bit - there is at least some information (e.g. "scalar") that is currently only available from the configs, and that probably needs to be forwarded through the get*DatasetType methods so they can be considered the final source of truth by PipelineTask and all activators.

#### Activity

Hide
Andy Salnikov added a comment -

I'm having hard time imagining how can I change get*DatasetType to also return extra info from configuration object without making it overly complicated and keeping it future-proof.

How about adding extra method to PipelineTask which returns the value of "scalar" for given field with default implementation that sub-classes can override as well, e.g.:

  @classmethod  def getDatasetIsScalar(cls, config, key):  field = config.get(key)  if isinstance(field, (InputDatasetConfig, OutputDatasetConfig)):  return field.scalar  return False 

I do not particularly like this, if we need to add something else to configuration it would mean new method like this will be needed. Should we just rename get*DatasetType() to get*DatasetConfig() and return configuration objects from those (and let sub-classes override those)?

Show
Andy Salnikov added a comment - I'm having hard time imagining how can I change get*DatasetType to also return extra info from configuration object without making it overly complicated and keeping it future-proof. How about adding extra method to PipelineTask which returns the value of "scalar" for given field with default implementation that sub-classes can override as well, e.g.: @classmethod def getDatasetIsScalar( cls , config, key): field = config.get(key) if isinstance (field, (InputDatasetConfig, OutputDatasetConfig)): return field.scalar return False I do not particularly like this, if we need to add something else to configuration it would mean new method like this will be needed. Should we just rename get*DatasetType() to get*DatasetConfig() and return configuration objects from those (and let sub-classes override those)?
Hide
Jim Bosch added a comment -

Should we just rename get*DatasetType() to get*DatasetConfig() and return configuration objects from those (and let sub-classes override those)?

I think this is a bit closer to what we want.  I'm a bit worried it may be confusing to use a Config class as the return element type of these methods, despite the fact that they're likely to have all of the attributes we need*.*

Should we invent a new class for "DatasetType as used by a PipelineTask" that could be returned instead?  Those could have a DatasetType instance, and a way to construct from *DatasetConfig (taking over the logic in PipelineTask.makeDatasetType), and a scalar attribute, and whatever else we need them to have in the future.  We could start with a simple namedtuple for now.

Show
Jim Bosch added a comment - Should we just rename get*DatasetType() to get*DatasetConfig() and return configuration objects from those (and let sub-classes override those)? I think this is a bit closer to what we want.  I'm a bit worried it may be confusing to use a Config class as the return element type of these methods, despite the fact that they're likely to have all of the attributes we need*.* Should we invent a new class for "DatasetType as used by a PipelineTask" that could be returned instead?  Those could have a DatasetType instance, and a way to construct from *DatasetConfig (taking over the logic in PipelineTask.makeDatasetType), and a scalar attribute, and whatever else we need them to have in the future.  We could start with a simple namedtuple for now.
Hide
Andy Salnikov added a comment -

I was trying to avoid introducing new class (semantically it should be ~equivalent to *DatasetConfig classes) but it may be the safest option for the future extensions. I'll try to implement something not too complicated.

Show
Andy Salnikov added a comment - I was trying to avoid introducing new class (semantically it should be ~equivalent to *DatasetConfig classes) but it may be the safest option for the future extensions. I'll try to implement something not too complicated.
Hide
Andy Salnikov added a comment -

I think this is ready for review. I ran Jenkins yesterday with success, and re-running it now.

I have decided to keep method names (they are already quite long) but they now return instances of new class DatasetTypeDescriptor instead of DatasetType. All code that uses those methods is updated and I switched runQuantum() to use those methods instead of doing config introspection.

Show
Andy Salnikov added a comment - I think this is ready for review. I ran Jenkins yesterday with success, and re-running it now. I have decided to keep method names (they are already quite long) but they now return instances of new class DatasetTypeDescriptor instead of DatasetType . All code that uses those methods is updated and I switched runQuantum() to use those methods instead of doing config introspection.
Hide
Andy Salnikov added a comment -

Thanks for review! I think I have taken care of all comments. Re-ran Jenkins, centos builds completed OK, osx is still running but I presume it's going to be success Merged and closing.

Show
Andy Salnikov added a comment - Thanks for review! I think I have taken care of all comments. Re-ran Jenkins, centos builds completed OK, osx is still running but I presume it's going to be success Merged and closing.

#### People

Assignee:
Andy Salnikov
Reporter:
Jim Bosch
Reviewers:
Jim Bosch
Watchers:
Andy Salnikov, Jim Bosch, Nate Lust, Vaikunth Thukral