Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26685

Add command-line tool for Registry.queryDatasets

    XMLWordPrintable

    Details

    • Story Points:
      5
    • Sprint:
      DB_F20_09
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Create a command-line tool for Registry.queryDatasets as a butler subcommand.

      This method has a large set of optional keyword arguments, and I'm not sure they all need to be included in the command-line interface, especially at first.  I would certainly skip the dataId argument, as from the command-line it's completely redundant with the where argument.  The dimensions argument is probably also less likely to be generally useful.

      The other consideration when defining this command-line interface is that we also have many Registry and Butler methods whose primary inputs are the outputs of queryDatasets - in other words, anything that takes an iterable of DatasetRefs in Python will probably need a command-line script that first runs queryDatasets and forwards the DatasetRefs to the method being wrapped.  That means all of those commands will probably need to have many of the same command-line option, but sometimes the names will need to change to avoid clashes.  For example, Registry.associate takes a "collection" argument of its own, so we'd probably want to map that to something like --output-collection while calling the collection(s) passed to queryDatasets --input-collections.

      Also, you (Nate Pease [X]) are more familiar with the command-line argument names already in use than I am, so please don't take anything I've written above as authoritative; consistency across commands is more important than anything I've written here (as long as consistency doesn't lead to ambiguity, of course).

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment - - edited

            One of us should change the Registry API either on this ticket or very soon after, and I think a few other places "deduplicate" appears in lower-level code as well. I would be happy to not be the one to do it, and in daf_butler a straightforward grep for "deduplicate" should find all occurrences. But there may be some downstream breakage from that, too.

            Show
            jbosch Jim Bosch added a comment - - edited One of us should change the Registry API either on this ticket or very soon after, and I think a few other places "deduplicate" appears in lower-level code as well. I would be happy to not be the one to do it, and in daf_butler a straightforward grep for "deduplicate" should find all occurrences. But there may be some downstream breakage from that, too.
            Hide
            tjenness Tim Jenness added a comment -

            Ok. Change the command line API and ticket the internals change. I can probably do the internals change.

            Show
            tjenness Tim Jenness added a comment - Ok. Change the command line API and ticket the internals change. I can probably do the internals change.
            Hide
            npease Nate Pease [X] (Inactive) added a comment -

            I created DM-27222 for the argument name change from deduplicate to findFirst

            Show
            npease Nate Pease [X] (Inactive) added a comment - I created DM-27222 for the argument name change from deduplicate to findFirst
            Hide
            npease Nate Pease [X] (Inactive) added a comment -

            Tim, I'm ready for review again.

            After all the discussion about ordered dict & doing some work on that, I decided that it didn't make sense to have the intermediate representation of the tables between the script function & the command function. I also didn't like that the column names were repeated for each row in that IR. So I moved the astropy table creation to the script function, and wrote a couple small helper classes so I could create each table and not modify it afterward. It also made testing easier because I could call the script function & verify using the astropy table, and so didn't have to do any text-based processing of the CLI output for verification. Hopefully you agree with all that when you see it.

            I think I've got all the other requested changes too. Thanks for looking.

            Show
            npease Nate Pease [X] (Inactive) added a comment - Tim, I'm ready for review again. After all the discussion about ordered dict & doing some work on that, I decided that it didn't make sense to have the intermediate representation of the tables between the script function & the command function. I also didn't like that the column names were repeated for each row in that IR. So I moved the astropy table creation to the script function, and wrote a couple small helper classes so I could create each table and not modify it afterward. It also made testing easier because I could call the script function & verify using the astropy table, and so didn't have to do any text-based processing of the CLI output for verification. Hopefully you agree with all that when you see it. I think I've got all the other requested changes too. Thanks for looking.
            Hide
            tjenness Tim Jenness added a comment -

            Two main comments:

            • I think we want to use the sorting from DM-26324 so that we get predictable output.
            • Don't ask for URIs unless people want them.

            Plus some other comments. I tried out the command line and it worked great.

            Show
            tjenness Tim Jenness added a comment - Two main comments: I think we want to use the sorting from DM-26324 so that we get predictable output. Don't ask for URIs unless people want them. Plus some other comments. I tried out the command line and it worked great.

              People

              Assignee:
              npease Nate Pease [X] (Inactive)
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Tim Jenness
              Watchers:
              Jim Bosch, Nate Pease [X] (Inactive), Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.