Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26691

Add command-line tool for Butler.export

    XMLWordPrintable

    Details

    • Sprint:
      DB_S21_12, DB_F21_06, DB_S22_12
    • Team:
      Ops Middleware
    • Urgent?:
      No

      Description

      This will be a tricky command-line interface to write, because this method actually returns a context manager with an object with additional methods that must be called (with the outputs of Registry.queryDatasets and Registry.queryDataIds, usually) in order to actually do anything.

      I don't think we should ever try to capture all of the flexibility of that Python interface on the command-line, but there are a few common cases that we should try to support, such as exporting all datasets from a list of collections. Part of the work of this ticket is gathering input on what those common cases really are.

        Attachments

        1. help.txt
          4 kB
        2. help-1.txt
          4 kB

          Issue Links

            Activity

            Hide
            npease Nate Pease [X] (Inactive) added a comment -

            butler export dimension-data isn't implemented yet;

            Tim Jenness, Jim Bosch I'd like to get feedback on what I've done so far (at least a / on the way it's working so far), and then proceed, or backtrack. I've pushed all the changes to the ticket branch, so you can pull it down and play with it if you'd like. 

            Show
            npease Nate Pease [X] (Inactive) added a comment - butler export dimension-data isn't implemented yet; Tim Jenness , Jim Bosch I'd like to get feedback on what I've done so far (at least a  /  on the way it's working so far), and then proceed, or backtrack. I've pushed all the changes to the ticket branch, so you can pull it down and play with it if you'd like. 
            Hide
            jbosch Jim Bosch added a comment -

            I admit that at this point I still think this might be a solution in search of a problem.

            Show
            jbosch Jim Bosch added a comment - I admit that at this point I still think this might be a solution in search of a problem.
            Hide
            npease Nate Pease [X] (Inactive) added a comment - - edited

            sorry, if it wasn't clear; you can chain the export subcommands together resulting in one call to Butler.export. So for example butler export REPO collection --name foo datasets <query dataset options> will create one butler, call butler.export, and call saveCollection and saveDatasets on the context returned from export
            Which I think is what doing it this way buys you.

            Show
            npease Nate Pease [X] (Inactive) added a comment - - edited sorry, if it wasn't clear; you can chain the export subcommands together resulting in one call to Butler.export . So for example butler export REPO collection --name foo datasets <query dataset options> will create one butler, call butler.export , and call saveCollection and saveDatasets on the context returned from export Which I think is what doing it this way buys you.
            Hide
            jbosch Jim Bosch added a comment - - edited

            Yeah, but I while I think that's at least as good as any other way of translating a relatively complete interface for this to the command-line that I could come up with, I still think that if someone put an invocation like that directly in a SConscript or shell script instead of just writing a little Python script of their own, it's probably a step backwards in terms of readability and maintainability. So (IMO) if we need a command-line interface at all for export, it's for something much simpler that is nevertheless super common; best candidate I can think of is "export one collection and all of its datasets matching an expression", i.e. wrapping:

            def simple_export(repo, collection, where, *, filename, directory, transfer):
                with butler.export(...) as context:
                    context.saveCollection(collection)
                    context.saveDatasets(butler.registry.queryDatasets(..., collections=[collection], where=where)
            

            we could have options to allow multiple collections and/or explicit dataset types, or maybe to control which dimension elements are exported (the "elements" kwarg to saveDatasets), but I think I'd start with the simplest case and see what options would need to be added for it to replace existing usage.

            As a side note: the limitation on not being able to query calibration collections (something I need to fix but haven't gotten to) may make even that not very useful, because it breaks querying for all dataset types in what otherwise would be the most useful collections to be able to export. But hopefully it won't be too much longer before I can get back to the work that would fix that.

            Show
            jbosch Jim Bosch added a comment - - edited Yeah, but I while I think that's at least as good as any other way of translating a relatively complete interface for this to the command-line that I could come up with, I still think that if someone put an invocation like that directly in a SConscript or shell script instead of just writing a little Python script of their own, it's probably a step backwards in terms of readability and maintainability. So (IMO) if we need a command-line interface at all for export, it's for something much simpler that is nevertheless super common; best candidate I can think of is "export one collection and all of its datasets matching an expression", i.e. wrapping: def simple_export(repo, collection, where, *, filename, directory, transfer): with butler.export(...) as context: context.saveCollection(collection) context.saveDatasets(butler.registry.queryDatasets(..., collections=[collection], where=where) we could have options to allow multiple collections and/or explicit dataset types, or maybe to control which dimension elements are exported (the "elements" kwarg to saveDatasets ), but I think I'd start with the simplest case and see what options would need to be added for it to replace existing usage. As a side note: the limitation on not being able to query calibration collections (something I need to fix but haven't gotten to) may make even that not very useful, because it breaks querying for all dataset types in what otherwise would be the most useful collections to be able to export. But hopefully it won't be too much longer before I can get back to the work that would fix that.
            Hide
            npease Nate Pease [X] (Inactive) added a comment -

            Putting this on hold pending concise use cases or a decision not to write this tool, the thinking being that it makes more sense for a user to write a small script that accomplishes the task instead of trying to force a ton of variables through a not-well-matched CLI api.

            Some discussion about it in slack is here.

            Show
            npease Nate Pease [X] (Inactive) added a comment - Putting this on hold pending concise use cases or a decision not to write this tool, the thinking being that it makes more sense for a user to write a small script that accomplishes the task instead of trying to force a ton of variables through a not-well-matched CLI api. Some discussion about it in slack is here .

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Jim Bosch, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.