Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22851

Remove extraneous line breaks in pipeline --show=config output

    Details

    • Story Points:
      1
    • Sprint:
      DB_S20_01
    • Team:
      Data Access and Database

      Description

      The output from pipeline ... --show=config occasionally can contain some extra newlines in it, here are examples.

      This is an expected output:

      $ pipetask build -t lsst.ctrl.mpexec.examples.calexpToCoaddTask.CalexpToCoaddTask --show=config
      ### Configuration for task `CalexpToCoaddTask'
      import lsst.ctrl.mpexec.examples.calexpToCoaddTask
      assert type(config)==lsst.ctrl.mpexec.examples.calexpToCoaddTask.CalexpToCoaddTaskConfig, 'config is of type %s.%s instead of lsst.ctrl.mpexec.examples.calexpToCoaddTask.CalexpToCoaddTaskConfig' % (type(config).__module__, type(config).__name__)
      import lsst.pipe.base.config
      # Flag to enable/disable metadata saving for a task, enabled by default.
      config.saveMetadata=True
       
      # name for connection calexp
      config.connections.calexp='calexp'
       
      # name for connection coadd
      config.connections.coadd='deepCoadd_calexp'
      
      

      and this one shows those extra line breaks:

      $ pipetask build -t lsst.ctrl.mpexec.examples.calexpToCoaddTask.CalexpToCoaddTask --show=config='*'
      ### Configuration for task `CalexpToCoaddTask'
       
      import lsst.ctrl.mpexec.examples.calexpToCoaddTask
       
      assert type(config)==lsst.ctrl.mpexec.examples.calexpToCoaddTask.CalexpToCoaddTaskConfig, 'config is of type %s.%s
       
      instead of lsst.ctrl.mpexec.examples.calexpToCoaddTask.CalexpToCoaddTaskConfig' % (type(config).__module__, type(config).__name__)
       
      import lsst.pipe.base.config
       
      # Flag to enable/disable metadata saving for a task, enabled by default.
      config.saveMetadata=True
       
      # name for connection calexp
      config.connections.calexp='calexp'
       
      # name for connection coadd
      config.connections.coadd='deepCoadd_calexp'
      

      Difference between two commands is a filter added to --show=config option, I guess there is a code that is broken when tat filtering option is used.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment - - edited

            When dumping configuration without a filter the code just calls config.saveToStream() for each task (tasks are filtered separately). With filter we still use the same config.saveToStream() method but we pass it a special "smart" stream (FilteredStream) object that knows how to parse output and print only those fields that are requested. That filter stream class was copied from pipe_base.argumentParser and apparently it is slightly broken. The class only implements write() method and it knows how config.saveToStream() is implemented and how it calls that write() method, in particular for printing config fields it does something like

                file.write(f"# {field.doc}\n{fullFieldName} = {field.value}\n")
            

            FilteredStream knows how to parse that and check field name against the pattern, if that does not match then it skips whole thing. That works more or less OK for the fields and their values, but there is other output (import list and assert) which break assumptions and that results in those extraneous line breaks.

            I think that whole FilteredStream thing is a bad design, we need something less fragile to support field filtering before it is sent to the stream. That probably has to happen at pex_config level rather than in ctrl_mpexec but I'm afraid to touch anything in pex_config. I need to look at how saveToStream() is implemented, maybe I can duplicate part of its logic in ctrl_mpexec.

            Also for many purposes we probably don't care about imports and that assert statement, we only want to see field values (e.g. DM-22301 probably wants just that). I can imagine that --show=config[=...] would only print fields and their values (and doc strings as comments) and we could have special option to dump config in full format with all imports and asserts.

            Show
            salnikov Andy Salnikov added a comment - - edited When dumping configuration without a filter the code just calls config.saveToStream() for each task (tasks are filtered separately). With filter we still use the same config.saveToStream() method but we pass it a special "smart" stream (FilteredStream) object that knows how to parse output and print only those fields that are requested. That filter stream class was copied from pipe_base.argumentParser and apparently it is slightly broken. The class only implements write() method and it knows how config.saveToStream() is implemented and how it calls that write() method, in particular for printing config fields it does something like file.write(f"# {field.doc}\n{fullFieldName} = {field.value}\n") FilteredStream knows how to parse that and check field name against the pattern, if that does not match then it skips whole thing. That works more or less OK for the fields and their values, but there is other output (import list and assert) which break assumptions and that results in those extraneous line breaks. I think that whole FilteredStream thing is a bad design, we need something less fragile to support field filtering before it is sent to the stream. That probably has to happen at pex_config level rather than in ctrl_mpexec but I'm afraid to touch anything in pex_config. I need to look at how saveToStream() is implemented, maybe I can duplicate part of its logic in ctrl_mpexec. Also for many purposes we probably don't care about imports and that assert statement, we only want to see field values (e.g. DM-22301 probably wants just that). I can imagine that --show=config [=...] would only print fields and their values (and doc strings as comments) and we could have special option to dump config in full format with all imports and asserts.
            Hide
            salnikov Andy Salnikov added a comment -

            Getting rid of the line breaks appears to be a trivial two-line change. This is actually related closely to DM-22301, I think it makes sense to do both of these tickets together, so I'll do everything on DM-22301 branch.

            Show
            salnikov Andy Salnikov added a comment - Getting rid of the line breaks appears to be a trivial two-line change. This is actually related closely to DM-22301 , I think it makes sense to do both of these tickets together, so I'll do everything on DM-22301 branch.

              People

              • Assignee:
                salnikov Andy Salnikov
                Reporter:
                salnikov Andy Salnikov
                Watchers:
                Andy Salnikov
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel