Status: To Do
Fix Version/s: None
Team:Data Access and Database
Currently the dax_obscore package is narrowly focused on outputting a (potentially very large) export of a significant chunk of a repository to an external file, which could then be handed over to the Qserv ingest system (or, conceivably, Postgres in some alternate models).
There's another set of use cases, though, for supplying ObsCore data for smaller sets of data from a repository, and for handling them in-memory.
Two reference use cases with different flavors:
I) Microservices for related data – imagine services along these lines:
- "Given this dataset (UUID or DatasetRef), tell me about the datasets of all available DatasetTypes with the same DataID" (e.g., for a calexp, return the raw, the diffim, the calibration images used for it, etc.);
- "Given this coadd tile dataset, tell me about all the single-epoch images that went into it"; and
- "Given this solar system object, tell me about all the single-epoch images (calexps) that should contain it".
Each of these services may assemble the DatasetRefs that represent its answer to the query in different ways - some by a pure Butler query, some with additional calculations required, but in each case the appropriate table to return would be an ObsCore table – so that Firefly and other client software can then follow the access_url links to the actual images, or to a cutout service – possibly decorated with some additional service-specific columns.
II) Push to the Portal / Firefly directly from user Python code:
In this use case, a user's Python notebook has identified one or more images of interest - let's say, for instance, a list of images recently taken during commissioning with a certain type of problem identified by the notebook code. The user then wants to make those available in the usual image browser in the Portal.
If the user could take their list of problem DatasetRefs, hand it off to dax_obscore, get back an in-memory ObsCore table, maybe with some additional columns for annotation, and then use the existing Firefly Python API to push that table to the Portal, they'd get an immediately useful GUI for reviewing those images. (Future planned features in Firefly would even let them pass back data from the GUI, like further user selections annotation.)
With these use cases in mind, I'd like to explore refactoring the dax_obscore code to make it easy to do tasks like this. I don't think it will be all that difficult, but I'd like Andy Salnikov's opinion on that first, of course.
If you need some requirements quoted for backup, I can do that later.
- relates to
DM-35850 Collect ideas and requirements for daf_butler obscore implementation
Gregory Dubois-Felsmann, how soon do you want this option available? It is likely that, as a result of
DM-35532, we'll have to implement part of that logic in the butler itself. I can try to make it more reusable at that time if this can wait.
Andy Salnikov As an outer bound, I feel this needs to be deployed and working by ComCam-on-sky. I want it available for commissioning. But we could use it any time in both DP0.2 and AuxTel contexts before then. What kind of time scale are you thinking of for the
Gregory Dubois-Felsmann, we do not have any work planned at the moment, in fact there is no formal approval yet of any option from
DM-35532, though I expect one of the options to be approved soon. And as usual I'm not very good in tracking various timelines, what is the ComCam-on-sky time scale?
To be a little more concrete, I think what would be useful is a function or class method that takes as input a Butler, a table of DatasetRefs, with the option of having arbitrary additional columns in the input, and a dax_obscore configuration template (like the existing ones). Perhaps a class that takes the Butler and the configuration template as constructor parameters, and then a method to do the actual transformation of a table of refs?
As output, it would produce a tabular data structure in memory - not afw.table, so that we don't pick up a new dependency - containing the ObsCore data for each DatasetRef, as well as - copied over verbatim - any additional columns from the input.
This function could then be invoked either directly by a user (use case II above) or incorporated into a microservice built with Russ Allbery's VO service framework.
I have so many applications for this!