Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23260

Problems initializing datastore values using DAF_BUTLER_CONFIG_PATH

    Details

    • Type: Bug
    • Status: Invalid
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: daf_butler
    • Labels:
      None
    • Team:
      Data Facility
    • Urgent?:
      No

      Description

      This may end up not being a bug, but rather not using the right subdir or filename in DAF_BUTLER_CONFIG_PATH.   Using current weekly (w_2020_04).  

      Ran into problem trying to set checksum to False for RC2 bootstrap in an override config where I normally override the database connection string to Oracle.  For the db connection string, I normally set the DAF_BUTLER_CONFIG_PATH variable to a path which contains a file called registry.yamlmakeButlerRepo.py picks up the db connection string just fine and puts it in the DATA/butler.yaml file.  I added the datastore section with the checksum entry to the registry.yaml file, but the new DATA/butler.yaml file only got the new db connection string. 

      Noticing the file name, I also tried the following filenames for the override yaml file in $DAF_BUTLER_CONFIG_PATH:  butler.yaml, datastore.yaml, datastore/posixDatastore.yaml.  None of them worked.  (Completely removed the DATA directory between tests to ensure existing butler yaml files not causing a problem)

      Works fine if I explicitly pass the registry.yaml file with both changes to the -c option of makeButlerRepo.py.  Trying the -o option didn't seem to make any difference when trying to use DAF_BUTLER_CONFIG_PATH.

      Not a blocker because for current RC2 ingest work because I am in control of running makeButlerRepo.py and then bootstrap.py.  So I can manually edit the DATA/butler.yaml file in between or I can explicitly pass the override yaml file to makeButlerRepo.py's -c option which works.

      Here is the beginning of output from running verbose option on makeButlerRepo.py 

      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /scratch/mgower/rc2_ingest/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/storageClasses.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/datastore.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /scratch/mgower/rc2_ingest/datastore.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/datastores/posixDatastore.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML file via !include: file:///software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/datastores/formatters.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/composites.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/dimensions.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/repo_transfer_formats.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /scratch/mgower/rc2_ingest/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /scratch/mgower/rc2_ingest/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /scratch/mgower/rc2_ingest/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /scratch/mgower/rc2_ingest/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /scratch/mgower/rc2_ingest/registry.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/dimensions.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/storageClasses.yaml
      DEBUG:lsst.daf.butler.core.config:Opening YAML config file: /software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_butler/19.0.0-18-g6d6ca2d0+1/config/storageClasses.yaml
      DEBUG:lsst.daf.butler.core.storageClass:Setting default assembler for DataFrame
      Collapse
      _
      

        Attachments

          Activity

          Hide
          tjenness Tim Jenness added a comment -

          I've had a quick look at this.

          DAF_BUTLER_CONFIG_PATH is working:

          • Create a standalone repo and you will see that checksum will be pulled in.
          • Have DAF_BUTLER_CONFIG_PATH set at run time and the config will be modified properly (you can check this by running dumpButlerConfig.py -s datastore.

          What's happening is that the minimalist config created by makeRepo only includes config items that you have specified explicitly in the seed config plus the config parameters that have a root (via Config.updateParameters). Since DAF_BUTLER_CONFIG_PATH is only used when constructing the full config and not the config that is written out, you don't get it in in the minimalist config stored in the repo. The only way to fix this would be for a Config to know where each item came from and to understand that items read from the environment should be persisted in the repo output.

          I think we can close this ticket as invalid.

          This investigation did remind me that the output file parameter in makeRepo is not doing anything.

          Show
          tjenness Tim Jenness added a comment - I've had a quick look at this. DAF_BUTLER_CONFIG_PATH is working: Create a standalone repo and you will see that checksum will be pulled in. Have DAF_BUTLER_CONFIG_PATH set at run time and the config will be modified properly (you can check this by running dumpButlerConfig.py -s datastore. What's happening is that the minimalist config created by makeRepo only includes config items that you have specified explicitly in the seed config plus the config parameters that have a root (via Config.updateParameters). Since DAF_BUTLER_CONFIG_PATH is only used when constructing the full config and not the config that is written out, you don't get it in in the minimalist config stored in the repo. The only way to fix this would be for a Config to know where each item came from and to understand that items read from the environment should be persisted in the repo output. I think we can close this ticket as invalid. This investigation did remind me that the output file parameter in makeRepo is not doing anything.
          Hide
          mgower Michelle Gower added a comment -

          Then why does makeButlerRepo.py output the new DB connection string in the butler.yaml file it creates?  It is a config parameter that has a root?  The different behavior with makeButlerRepo.py is just confusing.

          Show
          mgower Michelle Gower added a comment - Then why does makeButlerRepo.py output the new DB connection string in the butler.yaml file it creates?  It is a config parameter that has a root?  The different behavior with makeButlerRepo.py is just confusing.
          Hide
          tjenness Tim Jenness added a comment -

          By default it propagates the db item from registry config because by default that is one that includes a path (much like datastore.root) and makeRepo creates a sqlite registry in its normal usage. The data store records key is propagated presumably because for multiple datastores you need to make sure they are different to each other and it's clearer that way.

          Are you worried about the choice of parameters that appear in the simplified config? We could make datastore.checksum be one of the special keys that we think everyone should think about editing but I assume that checksum is something that we don't want to make so visible.

          Show
          tjenness Tim Jenness added a comment - By default it propagates the db item from registry config because by default that is one that includes a path (much like datastore.root) and makeRepo creates a sqlite registry in its normal usage. The data store records key is propagated presumably because for multiple datastores you need to make sure they are different to each other and it's clearer that way. Are you worried about the choice of parameters that appear in the simplified config? We could make datastore.checksum be one of the special keys that we think everyone should think about editing but I assume that checksum is something that we don't want to make so visible.
          Hide
          mgower Michelle Gower added a comment -

          Generically, I think I am fine.   The different behavior will just lead to questions like this ticket.

          Show
          mgower Michelle Gower added a comment - Generically, I think I am fine.   The different behavior will just lead to questions like this ticket.
          Hide
          tjenness Tim Jenness added a comment -

          As I said, the only way to fix this is for the Config class to start tracking which keys were read from override files and then include those fields in the minimalist output config created by makeRepo.

          Show
          tjenness Tim Jenness added a comment - As I said, the only way to fix this is for the Config class to start tracking which keys were read from override files and then include those fields in the minimalist output config created by makeRepo.

            People

            • Assignee:
              tjenness Tim Jenness
              Reporter:
              mgower Michelle Gower
              Watchers:
              Michelle Gower, Tim Jenness
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel