Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17025

Improve filename template mechanisms in PosixDatastore and Butler




      The current filename template system is missing support for a number of features we'll probably need:

      • We should validate that a template generates a unique expansion within a collection for a particular set of Dimensions. We may not be able to check this in general, but we should be able to at least identify the easy ways of getting it wrong. At present, a typo in an optional placeholder (or just forgetting to add a placeholder) will lead to the same filename being silently used for different data IDs (we should probably be more careful about clobbering in PosixDatastore, too).
      • We should be able to include a term only if it identifies a required dimension, not an implied one (i.e. a template should be able to say "use PhysicalFilter if Visit is not provided".
      • We should be able to override templates based on the set of dimensions used, in addition to the DatasetType and StorageClass level (I actually now think this is more important than StorageClass-level overrides). For example, we probably want to have a common pattern for everything identified by Skymap+Tract+Patch+AbstractFilter, but that may not be a subset of any more general pattern.
      • We should be able to override a template based on the value of a particular Dimension (i.e. a particular Instrument).
      • Templates need to be able to include fields beyond just the Dimension "link" primary keys. At present this doesn't work because those aren't in the key-value pairs of the dict-like interface the DataId object presents; instead they're in a separate entries attribute. That could be revisited in DM-17023, but regardless we also need to have a dotted-name syntax (i.e. "Detector.raft") for these fields, as we don't want to assume (as we currently do for links) that all field names are unique across all tables.
      • We should able to expand a DataId to include the fields necessary to expand the template during put (or consider always maximally expanding DataIds to include everything they possibly could). This requires communication (presumably mediated by Butler) between Datastore and Registry, which seems to imply that we can't treat templates or filenames as being relevant only for PosixDatastore (note that subset/transfer may also push us towards making filenames a Butler-level concept to some degree).
      • We need to be able to pre-expand DataIds appropriately when transferring them to a limited Registry (i.e. on a worker node), because a limited Registry does actually have the metadata necessary to expand them. That means we need to find out what template(s) data IDs will be used with before that transfer begins.

      We may want to push back on some of these and/or work harder to identify whether we really need them if some turn out to be particularly hard. All I can say right now is that I'm pretty confident people will want all of the above.

      I do not think any of these improvements are necessary for the end-of-January milestone, but we will want them before retiring Gen2.

      I am a bit skeptical we can accomplish all of the above with just templates in config files; we may need to explore ways to select them programmatically. But there's a large body of work on macro/templating engines and DSLs that I'm mostly unfamiliar with, so I'm hoping this is something Tim Jenness can tackle that question once he's back from vacation.


          Issue Links


            No work has yet been logged on this issue.


              tjenness Tim Jenness
              jbosch Jim Bosch
              Jim Bosch
              Jim Bosch, Tim Jenness
              0 Vote for this issue
              2 Start watching this issue



                  CI Builds

                  No builds found.