Details
-
Type:
Story
-
Status: To Do
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: Science Platform
-
Labels:
-
Team:Architecture
Description
The deliverable from this ticket is a DMTN or other comparable document describing access to the Engineering and Facilities Database (EFD) Large File Annex (LFA) in the context of the various deployments of the Science Platform.
The LFA is a collection of files created in the Observatory's Summit (and probably Base) operational systems that are treated as a form of telemetry items, too large to be transmitted by value through the SAL or stored in tables in the (relational DB) EFD, but still conceptually arising as time-stamped streams of information.
The concept of the LFA includes the existence of tables in the (relational) EFD that record the usual time stamps of files "sent" via the LFA, with access to the file provided by a URI recorded in the relational table. (I'll call these "LFA metadata" tables below.)
In the context of the Summit and Base and access to the LFA via the LFA metadata tables in the EFD, the URIs mentioned above are supposed to be usable throughout the Summit/Base computing complex to access the actual files.
When the EFD is ingested/ETL'd into the Transformed EFD, the tables of LFA metadata will be part of that process, and so in the Transformed EFD we can expect LFA metadata tables that allow us to go from image and/or visit IDs to one or more URIs for an LFA topic at timestamps in the range appropriate to the image (or visit) ID provided. These URIs are meant to provide access to the replicas of the LFA files that are held in the Transformed EFD. I don't think we have yet clearly specified whether the URIs themselves will be recast to point to the replicas, or whether the tools provided for resolving unmodified LFA URIs will just access the replicas when run in contexts in which the Transformed EFD is available.
In any event we are committed to making all this data available through the Science Platform. In the two DACs and the Science Validation instance, we provide access to the Transformed EFD and its LFA clone. In the Commissioning Cluster instance, we are still discussing whether direct access to the EFD's Base replica will be provided or whether a Base-hosted Transformed EFD will be available to the Commissioning Cluster.
Notwithstanding the above open issues, the main issue for this ticket is:
What are the access methods in the three Aspects of the Science Platform for LFA data?
- In the API Aspect, the (Transformed) EFD tables, including the LFA metadata tables, will be accessible via a TAP service. That means that the URIs from the LFA metadata tables will be returned as values in the VOTable or other result forms supported by the dbserv TAP interface. In a natural interpretation, then, it would seem that these URIs should be directly usable offsite, e.g., in programmatic HTTP GET operations, and via wget/curl. Can we confirm this? If that is not possible, then it would be useful to think about whether we could supply a simple rule for getting an offsite-usable URL from an LFA URI.
- In the Portal Aspect, it would be natural to make the LFA files' URIs clickable in the UI for tabular results returned from Portal queries against the LFA metadata tables. Again, for this purpose it would be useful for the URIs to be directly usable as URLs, with the Portal providing the user the option to view the file directly in the browser (which would be aided by the provision of suitable MIME types, where appropriate) or download it. If an additional step is required to get a URL from the URI, the Portal could be taught that logic, as long as it were a universal rule usable for all LFA topics. It would also make sense to contemplate a system that would permit LSST-specific visualizations of selected LFA file types to be provided on an as-needed and as-effort-available basis (such visualizations are not in the Science Platform baseline).
- In the Notebook Aspect, where user Python processes are running within the DACs and have access to internal shared filesystems, it is possible that URIs obtained from the LFA metadata might be resolvable as POSIX file accesses (e.g., to a GPFS filesystem), not just as HTTP GETs to a service. This needs to be clarified, and the code support, if any, for this defined.
In the end, we need a good solution in this area for our users for each Aspect.
Perhaps some initial discussion on this ticket and then a dedicated teleconference on the subject?
Attachments
Issue Links
- mentioned in
-
Page Loading...
I think it is simplest for the Transformed EFD to make the LFA files accessible as a filesystem and for DAX to stand up a web server (with appropriate authentication/authorization) that accepts the Transformed EFD URIs to retrieve those files.
Would it be a problem for each DAC's Transformed EFD to have URIs specific to itself, or do they need to be identical between the two DACs (which could mean that directing a user to a particular DAC's LFA may require external information)?
In addition, Michelle Gower and I discussed ingesting some LFA files (such as guider postage stamp stacks) into the Data Backbone as Butler-accessible datasets. This ingestion would occur as a logical part of the EFD transformation process.