Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-16797

Add template string names and formatters to PipelineTask configs

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: pipe_base
    • Labels:
      None

      Description

      There is a common pattern in man of our tasks where dataset types share a common sub string. This ticket will introduce a way to template these names and format them at configuration time.

        Attachments

          Issue Links

            Activity

            Hide
            nlust Nate Lust added a comment -

            This is the ticket I was talking about Friday. I thought I had sent it out for review already, but it seems not. Sorry about that, and thanks for taking a look at it.

            Show
            nlust Nate Lust added a comment - This is the ticket I was talking about Friday. I thought I had sent it out for review already, but it seems not. Sorry about that, and thanks for taking a look at it.
            Hide
            yusra Yusra AlSayyad added a comment -

            Can I see an example of this in action? I'm having trouble following how it's used.

            Show
            yusra Yusra AlSayyad added a comment - Can I see an example of this in action? I'm having trouble following how it's used.
            Hide
            nlust Nate Lust added a comment -

            Here is how it is being used on a ticket branch for a pipeline task. https://github.com/lsst/pipe_tasks/blob/tickets/DM-15845/python/lsst/pipe/tasks/mergeDetections.py#L62 . The use in setDefaults is one possible way to use this, but we envision it being much more useful in the future as a method to call on configs in general from config override files.

            Show
            nlust Nate Lust added a comment - Here is how it is being used on a ticket branch for a pipeline task. https://github.com/lsst/pipe_tasks/blob/tickets/DM-15845/python/lsst/pipe/tasks/mergeDetections.py#L62 . The use in setDefaults is one possible way to use this, but we envision it being much more useful in the future as a method to call on configs in general from config override files.
            Hide
            yusra Yusra AlSayyad added a comment -

            Thank you! The usage made it very clear. OK, old fashioned Jira review:

            • The docstring for the namedTemplate wasn't enough for me to really know what the point was. Starting with "Name of a string..." sounds weird. How about something starting with "Template for the `name` field which ..."
            • Say there WERE still goodSeeing coadd datasets in obs_base: --config coaddName=goodSeeing won't change the input and output dataset types. Presumably, the way the names were being translated before in getOutputDatasetTypes(), WOULD honor user supplied configs. So this isn't really a replacement for that for what you had before. So if it doesn't solve that problem, what problem does it solve?
            Show
            yusra Yusra AlSayyad added a comment - Thank you! The usage made it very clear. OK, old fashioned Jira review: The docstring for the namedTemplate wasn't enough for me to really know what the point was. Starting with "Name of a string..." sounds weird. How about something starting with "Template for the `name` field which ..." Say there WERE still goodSeeing coadd datasets in obs_base: --config coaddName=goodSeeing won't change the input and output dataset types. Presumably, the way the names were being translated before in getOutputDatasetTypes() , WOULD honor user supplied configs. So this isn't really a replacement for that for what you had before. So if it doesn't solve that problem, what problem does it solve?
            Hide
            nlust Nate Lust added a comment - - edited

            I really debated, in the task I linked you to, whether or not to couple the setting of the templates to that config variable, or write out the string "deep" again. This primarily arises because of the shard CmdlineTask PipelineTask code base. I am not sure which is the best way to go in that case.

            Since linking to that I have finished converting the deblending tasks here.

            The idea going forward is that you could have a few strings that are templated together, and set that default in the config setDefaults. Then in config overrides you could do something like

            config = DeblendCoaddSourcesSingleConfig()
            # Set all names to some common sub string for bulk changes, for instance running
            # Single frame processing normally, and then a second time, but output everything
            # prepended with fakes
            config.formatTemplateNames('Princeton4Life')
            # override just the output catalog name, because something special is planned
            config.measureCatalog.name = "UWRulesPrincetonDrools"
            

            This is different to the example I gave at the PCW with getOutputDatasetTypes(). The problem with that method is that very often many different methods needed to be overridden in every class just to format these strings. This new way of doing it pushes where you supply the name string into a function call instead of a config parameter, but simplifies writing PipelineTasks.

            I will note that the nameTemplate does not go away after calling formatTemplateNames, so if there was some other more specialized workflow, it is still possible to implement a getOutputDatasetTypes method that looked at a config parameter to set that string a second time. This just simplifies the need to do it to a bulk number of parameters all with a common string.

            Show
            nlust Nate Lust added a comment - - edited I really debated, in the task I linked you to, whether or not to couple the setting of the templates to that config variable, or write out the string "deep" again. This primarily arises because of the shard CmdlineTask PipelineTask code base. I am not sure which is the best way to go in that case. Since linking to that I have finished converting the deblending tasks here . The idea going forward is that you could have a few strings that are templated together, and set that default in the config setDefaults. Then in config overrides you could do something like config = DeblendCoaddSourcesSingleConfig() # Set all names to some common sub string for bulk changes, for instance running # Single frame processing normally, and then a second time, but output everything # prepended with fakes config.formatTemplateNames( 'Princeton4Life' ) # override just the output catalog name, because something special is planned config.measureCatalog.name = "UWRulesPrincetonDrools" This is different to the example I gave at the PCW with getOutputDatasetTypes(). The problem with that method is that very often many different methods needed to be overridden in every class just to format these strings. This new way of doing it pushes where you supply the name string into a function call instead of a config parameter, but simplifies writing PipelineTasks. I will note that the nameTemplate does not go away after calling formatTemplateNames, so if there was some other more specialized workflow, it is still possible to implement a getOutputDatasetTypes method that looked at a config parameter to set that string a second time. This just simplifies the need to do it to a bulk number of parameters all with a common string.

              People

              Assignee:
              nlust Nate Lust
              Reporter:
              nlust Nate Lust
              Reviewers:
              Yusra AlSayyad
              Watchers:
              Nate Lust, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.