Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-34190

Missing str/repr for middleware objects

    XMLWordPrintable

    Details

    • Team:
      Architecture
    • Urgent?:
      No

      Description

      Quite a few middleware-related python objects don't have a useful string representation. I've listed the ones I know about below, with suggestions for how to implement stringification.

      obs_base

      • Instrument
      • FilterDefinitions: repr could duplicate str?

      pipe_base

      • Pipeline: the str could be the instrument and description fields?
      • PipelineTaskConnections
      • QuantumGraph

      daf_butler

      • Butler: at minimum, repr should match str.
      • Registry: repr almost looks more useful to me; str is just a long string with the path.
      • Datastore: str/repr exist, but could be more useful[1].
      • StorageClassFactory: repr could duplicate str?
      • RegistryDefaults

      1)

      >>> str(self.butler.datastore)
      'file:///var/folders/1s/x4hsw5kj4pdbnlvfw75zzkv40000gq/T/tmpccyrt726/'
      >>> repr(self.butler.datastore)
      'FileDatastore@<butlerRoot>'
      

        Attachments

          Activity

          Hide
          nlust Nate Lust added a comment -

          For Pipeline, do you mean you would like the repr to be something different, because the string form is the pipeline itself.

          Show
          nlust Nate Lust added a comment - For Pipeline, do you mean you would like the repr to be something different, because the string form is the pipeline itself.
          Hide
          krzys Krzysztof Findeisen added a comment - - edited

          The string form is far too long to use in another message (e.g., "Running pipeline x...", "Object[pipeline=x]"). __str__ implementations are supposed to be concise. Having them be multi-line is bad enough, and this one is multi-page, completely derailing any output foolish enough to use it.

          Show
          krzys Krzysztof Findeisen added a comment - - edited The string form is far too long to use in another message (e.g., "Running pipeline x...", "Object[pipeline=x]"). __str__ implementations are supposed to be concise . Having them be multi-line is bad enough, and this one is multi-page, completely derailing any output foolish enough to use it.
          Hide
          nlust Nate Lust added a comment -

          There is nothing that says a _str_ representation be concise, even on the page you linked. The exact text is "there is no expectation that _str_() return a valid Python expression: a more convenient or concise representation can be used" You can say that it is not a convenient representation for your use case, and that is fine, for other use cases it is convenient. Which needs take precedence of course can be debated, and I am open for change.

          This however is the exact reason it is not `_repr_` which should be more close to some representation of the object. Can you think of a good representation? Almost any field is allowed to be omitted. It can be constructed from a file, or it can be constructed interactively (which is what happens from pipetask command line).

          There are two fields that are required by a pipeline, Description and Tasks. Description might be a good candidate for repr, but it is often multiline, which is something you do not want. As far as tasks, what do you print the labels? Those can actually point to any task, so are not deterministic. Even if you wanted to do that, that is not really what goes into initializing a pipeline. Pipelines are also not `final` so any repr might actually change each time you call it (except for the description attribute, but as discussed that is multiline). We could print something about the number of tasks or something, but that really has little to do with the repr of the object. Likewise even if you wanted to print something like pipeline uri, that does not really tell you about the object per-say as it can be modified after the file is loaded. Configuration is also really important to determine what is going on. Two Pipelines with the same tasks, same labels, same instrument, might still differ because they have different configs applied to them.

          I am certainly open to any suggestions, but it is not that this has not been thought through. Anything you pick is going to have trade-offs and not informative in some way or another. The most informative you can get about the pipeline is something a human can read and understand in its complete form (or some subset of information about the pipeline).

          I would contend that in this case there is no real answer for what is best, and so the task should not define a single function that defines what people will see. In my opinion, specific logging code (or whatever) should define their own "in this context x and y info is what is important given the constraints of the application" function that can be used to create a representation.

          However, I would really welcome being wrong and having someone point out an optimal representation.

          Show
          nlust Nate Lust added a comment - There is nothing that says a _ str _ representation be concise, even on the page you linked. The exact text is "there is no expectation that _ str _() return a valid Python expression: a more convenient or concise representation can be used" You can say that it is not a convenient representation for your use case, and that is fine, for other use cases it is convenient. Which needs take precedence of course can be debated, and I am open for change. This however is the exact reason it is not `_ repr _` which should be more close to some representation of the object. Can you think of a good representation? Almost any field is allowed to be omitted. It can be constructed from a file, or it can be constructed interactively (which is what happens from pipetask command line). There are two fields that are required by a pipeline, Description and Tasks. Description might be a good candidate for repr, but it is often multiline, which is something you do not want. As far as tasks, what do you print the labels? Those can actually point to any task, so are not deterministic. Even if you wanted to do that, that is not really what goes into initializing a pipeline. Pipelines are also not `final` so any repr might actually change each time you call it (except for the description attribute, but as discussed that is multiline). We could print something about the number of tasks or something, but that really has little to do with the repr of the object. Likewise even if you wanted to print something like pipeline uri, that does not really tell you about the object per-say as it can be modified after the file is loaded. Configuration is also really important to determine what is going on. Two Pipelines with the same tasks, same labels, same instrument, might still differ because they have different configs applied to them. I am certainly open to any suggestions, but it is not that this has not been thought through. Anything you pick is going to have trade-offs and not informative in some way or another. The most informative you can get about the pipeline is something a human can read and understand in its complete form (or some subset of information about the pipeline). I would contend that in this case there is no real answer for what is best, and so the task should not define a single function that defines what people will see. In my opinion, specific logging code (or whatever) should define their own "in this context x and y info is what is important given the constraints of the application" function that can be used to create a representation. However, I would really welcome being wrong and having someone point out an optimal representation.

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            Parejkoj John Parejko
            Watchers:
            Jim Bosch, John Parejko, Krzysztof Findeisen, Nate Lust, Tim Jenness
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:

                Jenkins

                No builds found.