Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-14191

Redesign and refactor YAML config usage in Gen3 Butler

    XMLWordPrintable

    Details

    • Story Points:
      6
    • Team:
      Architecture

      Description

      Our YAML files are a bit of a mess, and they're very hard to use:

      • The defaults for StorageClass and the registry schema are not just nodes in the Butler config tree; there are also special nodes that point to files (e.g. "storageClasses.config") that make it impossible to use the defaults directly, and obfuscate what the real structure ought to be.
      • Each of the methods that constructs something from a Config has its own special handling for non-Config arguments (e.g. filenames).
      • StorageClassFactory has two public APIs for constructing StorageClasses from Config.
      • Because we never validate any of the structures, error messages due to bad configs are very opaque.
      • Config files contain references to other Config files that are relative to the current working directory (which is assumed to be DAF_BUTLER_DIR, since that's where tests are run).
      • Constructing most important daf_butler objects requires either creating a Config file on disk or a Config object in memory, rather than applying an automatically-located default configuration and overriding with kwargs.

      I think it's time to step back and actually design how we want the configuration system to work.

        Attachments

          Activity

          Hide
          pschella Pim Schellart [X] (Inactive) added a comment -

          While I agree with all the statements above the only thing I'm not sure about is the timing. Are we already at the stage where we know all we want to configure and the ways in which we want it to be?
          If yes, then great! Lets redesign. If not, then we might benefit from muddling on a tiny little bit longer until that becomes clear.

          Show
          pschella Pim Schellart [X] (Inactive) added a comment - While I agree with all the statements above the only thing I'm not sure about is the timing. Are we already at the stage where we know all we want to configure and the ways in which we want it to be? If yes, then great! Lets redesign. If not, then we might benefit from muddling on a tiny little bit longer until that becomes clear.
          Hide
          jbosch Jim Bosch added a comment - - edited

          I think we need to do something about some of this now, even if we don't tackle all of it - we at least need a place to put default configuration for PosixDatastore (in particular) and a way to override that in repos without tons of repetition, and a way to easily create repos and Butler instances.

          Here's a strawman for what the high-level interface could be:

           

          Filesystem Structure

          daf_butler
              config/
                  butler.yaml # includes StorageClasses and schema
                  gen2convert.yaml
                  datastores/
                      PosixDatastore.yaml # stuff specific to PosixDatastore (including formatters and templates). Does not include root.
                  registries/
                      SqlRegistry.yaml # stuff specific to SqlRegistry. Does not include DB connection string.
           
          some-repo-root/
              butler.yaml # includes all entires from butler.yaml, datastores/.yaml, registries/.yaml, as well as repo-specific strings.
              registry3.sqlite3   # to avoid clashes with gen2 registry.sqlite3
          

           

          APIs

          Butler.makeRepo(root, registry=None, datastore=None):

          Creates butler.yaml in root, copying in defaults from daf_butler for the given registry and datastore (which default to Sqlite and Posix), and adding root-specific entries to that config.

           

          Butler(root):

          Create Butler instance assuming makeRepo has already been called on the given root.

           

          Show
          jbosch Jim Bosch added a comment - - edited I think we need to do something about some of this now, even if we don't tackle all of it - we at least need a place to put default configuration for PosixDatastore (in particular) and a way to override that in repos without tons of repetition, and a way to easily create repos and Butler instances. Here's a strawman for what the high-level interface could be:   Filesystem Structure daf_butler config/ butler.yaml # includes StorageClasses and schema gen2convert.yaml datastores/ PosixDatastore.yaml # stuff specific to PosixDatastore (including formatters and templates). Does not include root. registries/ SqlRegistry.yaml # stuff specific to SqlRegistry. Does not include DB connection string.   some-repo-root/ butler.yaml # includes all entires from butler.yaml, datastores/.yaml, registries/.yaml, as well as repo-specific strings. registry3.sqlite3 # to avoid clashes with gen2 registry.sqlite3   APIs Butler.makeRepo(root, registry=None, datastore=None) : Creates butler.yaml in root, copying in defaults from daf_butler for the given registry and datastore (which default to Sqlite and Posix), and adding root-specific entries to that config.   Butler(root) : Create Butler instance assuming makeRepo has already been called on the given root.  
          Hide
          tjenness Tim Jenness added a comment -

          Config merging now works and I've had a go at handling the sub configs. One thing we should consider is changing RegistryConfig to refer to only the registry section and SchemaConfig only the schema section since they are now distinct. This would mean a small change to how Registry is configured, at least having two configs internally even if ButlerConfig is only way to create.

          Show
          tjenness Tim Jenness added a comment - Config merging now works and I've had a go at handling the sub configs. One thing we should consider is changing RegistryConfig to refer to only the registry section and SchemaConfig only the schema section since they are now distinct. This would mean a small change to how Registry is configured, at least having two configs internally even if ButlerConfig is only way to create.
          Hide
          jbosch Jim Bosch added a comment -

          While I'm just starting to look at the code now, your proposal to make RegistryConfig and SchemaConfig refer only to their own sections sounds like a good one to me.

          Show
          jbosch Jim Bosch added a comment - While I'm just starting to look at the code now, your proposal to make RegistryConfig and SchemaConfig refer only to their own sections sounds like a good one to me.
          Hide
          pschella Pim Schellart [X] (Inactive) added a comment -

          I added some more minor comments to the PR. Consider my review to be done. Feel free to merge with or without the suggested changes as you see fit.

          Show
          pschella Pim Schellart [X] (Inactive) added a comment - I added some more minor comments to the PR. Consider my review to be done. Feel free to merge with or without the suggested changes as you see fit.
          Hide
          jbosch Jim Bosch added a comment -

          My review is complete, too.  (Minor) comments on the PR.

          Show
          jbosch Jim Bosch added a comment - My review is complete, too.  (Minor) comments on the PR.
          Hide
          tjenness Tim Jenness added a comment -

          Merged.

          • Defaults read from DAF_BUTLER_DIR/config and DAF_BUTLER_CONFIG_PATH
          • Each config subclass now reads its own defaults
          • Butler config now asks each subclass config to populate defaults
          • Configs can now be created without arguments to get the defaults.
          • Extensive tests added for this behavior.
          • Also rewrote doImport to use python3.6 features allowing useful error messages.
          Show
          tjenness Tim Jenness added a comment - Merged. Defaults read from DAF_BUTLER_DIR/config and DAF_BUTLER_CONFIG_PATH Each config subclass now reads its own defaults Butler config now asks each subclass config to populate defaults Configs can now be created without arguments to get the defaults. Extensive tests added for this behavior. Also rewrote doImport to use python3.6 features allowing useful error messages.

            People

            Assignee:
            tjenness Tim Jenness
            Reporter:
            jbosch Jim Bosch
            Reviewers:
            Jim Bosch, Pim Schellart [X] (Inactive)
            Watchers:
            Jim Bosch, Pim Schellart [X] (Inactive), Tim Jenness
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.