# Documenting Butler DatasetTypes

XMLWordPrintable

#### Details

• Type: Bug
• Status: To Do
• Resolution: Unresolved
• Fix Version/s: None
• Component/s:
• Labels:
• Urgent?:
No

#### Description

Following up on a question Dominique Boutigny posted on Slack about the difference between the deepCoadd_calexp and deepCoadd products, I asked whether we have documentation of the different kinds of datasets in gen3 like we do in the description fields of obs_base's datasets.yaml and exposures.yaml (added in DM-13756). Jim Bosch said that the only place such docs might live in gen3 would be "associated with the PipelineTask connections that produce them.", but most of those Connections do not currently include such descriptive information about what the dataset is.

If those Connections are to be where we document PipelineTask output, we need a way to aggregate that information (shades of DM-6655) and we need to ensure that all of the descriptive information in the obs_base yaml files is copied over to the relevant Connections docs.

#### Activity

Hide
Tim Jenness added a comment -

At first glance this is possibly a request for the DatasetType constructor to take an optional description string (as we support for collections) and for that string to be stored in the registry when the DatasetType is registered. This would still require pipeline tasks to include these strings but would allow the butler query-dataset-types command to report the description string. This would require a registry schema change and so is not immediately trivial but we could easily add the summary string to DatasetType such that people could at least start adding the documentation strings into the code even if they aren't persisted.

I don't think obs_base is involved in this.

Show
Tim Jenness added a comment - At first glance this is possibly a request for the DatasetType constructor to take an optional description string (as we support for collections) and for that string to be stored in the registry when the DatasetType is registered. This would still require pipeline tasks to include these strings but would allow the butler query-dataset-types command to report the description string. This would require a registry schema change and so is not immediately trivial but we could easily add the summary string to DatasetType such that people could at least start adding the documentation strings into the code even if they aren't persisted. I don't think obs_base is involved in this.
Hide
John Parejko added a comment -

obs_base is involved because that is where the existing descriptions live.

Show
John Parejko added a comment - obs_base is involved because that is where the existing descriptions live.
Hide
Nate Lust added a comment -

Connection types (that a type name is just an identifier for) does already support doc fields. We have in the past not talked about doing this at a registry level, but having a command that will report all the dataset types and docs associated with a given pipeline. In principal I can't see any reason to not also store it in the registry outside of a generalized feeling of creating one gigantic complex thing.

Show
Nate Lust added a comment - Connection types (that a type name is just an identifier for) does already support doc fields. We have in the past not talked about doing this at a registry level, but having a command that will report all the dataset types and docs associated with a given pipeline. In principal I can't see any reason to not also store it in the registry outside of a generalized feeling of creating one gigantic complex thing.
Hide
Tim Jenness added a comment -

Okay, but this ticket doesn't involve any work on obs_base.

Show
Tim Jenness added a comment - Okay, but this ticket doesn't involve any work on obs_base.
Hide
Tim Jenness added a comment - - edited

There are two things then. One is "for pipeline X describe to me the output DatasetTypes that this pipeline creates" – that just needs the pipeline. The other is: "I see there is a dataset type in the butler repository called Y, what does it represent". Short of a clever provenance tracking code that looks up a dataset of that type, then looks at the run, then looks at the provenance to see what pipeline that was made by and then loads that pipeline and asks for the definition, the second option is much easier if registering a dataset type also registers a short summary string for it.

Show
Tim Jenness added a comment - - edited There are two things then. One is "for pipeline X describe to me the output DatasetTypes that this pipeline creates" – that just needs the pipeline. The other is: "I see there is a dataset type in the butler repository called Y, what does it represent". Short of a clever provenance tracking code that looks up a dataset of that type, then looks at the run, then looks at the provenance to see what pipeline that was made by and then loads that pipeline and asks for the definition, the second option is much easier if registering a dataset type also registers a short summary string for it.
Hide
Jim Bosch added a comment -

The other place (and maybe the primary place) we should target as where this documentation lands is pipelines.lsst.io - ideally, we'd put together some Sphinx (etc) magic such that one could delegate a pipeline (YAML file) in a package as being sufficient important that its output dataset types should be rendered into the static docs, and then the doc build would pull the information from that and put it in a table somewhere.

Show
Jim Bosch added a comment - The other place (and maybe the primary place) we should target as where this documentation lands is pipelines.lsst.io - ideally, we'd put together some Sphinx (etc) magic such that one could delegate a pipeline (YAML file) in a package as being sufficient important that its output dataset types should be rendered into the static docs, and then the doc build would pull the information from that and put it in a table somewhere.

#### People

Assignee:
Unassigned
Reporter:
John Parejko
Watchers:
Ian Sullivan, Jim Bosch, John Parejko, Meredith Rawls, Nate Lust, Tim Jenness, Yusra AlSayyad