Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13860

Clarify access methods to the Large File Annex of the EFD in the Science Platform

    XMLWordPrintable

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Science Platform
    • Labels:
    • Team:
      Architecture

      Description

      The deliverable from this ticket is a DMTN or other comparable document describing access to the Engineering and Facilities Database (EFD) Large File Annex (LFA) in the context of the various deployments of the Science Platform.


      The LFA is a collection of files created in the Observatory's Summit (and probably Base) operational systems that are treated as a form of telemetry items, too large to be transmitted by value through the SAL or stored in tables in the (relational DB) EFD, but still conceptually arising as time-stamped streams of information.

      The concept of the LFA includes the existence of tables in the (relational) EFD that record the usual time stamps of files "sent" via the LFA, with access to the file provided by a URI recorded in the relational table.  (I'll call these "LFA metadata" tables below.)

      In the context of the Summit and Base and access to the LFA via the LFA metadata tables in the EFD, the URIs mentioned above are supposed to be usable throughout the Summit/Base computing complex to access the actual files.

      When the EFD is ingested/ETL'd into the Transformed EFD, the tables of LFA metadata will be part of that process, and so in the Transformed EFD we can expect LFA metadata tables that allow us to go from image and/or visit IDs to one or more URIs for an LFA topic at timestamps in the range appropriate to the image (or visit) ID provided.  These URIs are meant to provide access to the replicas of the LFA files that are held in the Transformed EFD.  I don't think we have  yet clearly specified whether the URIs themselves will be recast to point to the replicas, or whether the tools provided for resolving unmodified LFA URIs will just access the replicas when run in contexts in which the Transformed EFD is available.

      In any event we are committed to making all this data available through the Science Platform.  In the two DACs and the Science Validation instance, we provide access to the Transformed EFD and its LFA clone.  In the Commissioning Cluster instance, we are still discussing whether direct access to the EFD's Base replica will be provided or whether a Base-hosted Transformed EFD will be available to the Commissioning Cluster.

      Notwithstanding the above open issues, the main issue for this ticket is:

      What are the access methods in the three Aspects of the Science Platform for LFA data?

      1.  In the API Aspect, the (Transformed) EFD tables, including the LFA metadata tables, will be accessible via a TAP service. That means that the URIs from the LFA metadata tables will be returned as values in the VOTable or other result forms supported by the dbserv TAP interface. In a natural interpretation, then, it would seem that these URIs should be directly usable offsite, e.g., in programmatic HTTP GET operations, and via wget/curl. Can we confirm this? If that is not possible, then it would be useful to think about whether we could supply a simple rule for getting an offsite-usable URL from an LFA URI.
      2. In the Portal Aspect, it would be natural to make the LFA files' URIs clickable in the UI for tabular results returned from Portal queries against the LFA metadata tables. Again, for this purpose it would be useful for the URIs to be directly usable as URLs, with the Portal providing the user the option to view the file directly in the browser (which would be aided by the provision of suitable MIME types, where appropriate) or download it. If an additional step is required to get a URL from the URI, the Portal could be taught that logic, as long as it were a universal rule usable for all LFA topics. It would also make sense to contemplate a system that would permit LSST-specific visualizations of selected LFA file types to be provided on an as-needed and as-effort-available basis (such visualizations are not in the Science Platform baseline).
      3. In the Notebook Aspect, where user Python processes are running within the DACs and have access to internal shared filesystems, it is possible that URIs obtained from the LFA metadata might be resolvable as POSIX file accesses (e.g., to a GPFS filesystem), not just as HTTP GETs to a service. This needs to be clarified, and the code support, if any, for this defined.

      In the end, we need a good solution in this area for our users for each Aspect.

      Perhaps some initial discussion on this ticket and then a dedicated teleconference on the subject?

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment -

            I think it is simplest for the Transformed EFD to make the LFA files accessible as a filesystem and for DAX to stand up a web server (with appropriate authentication/authorization) that accepts the Transformed EFD URIs to retrieve those files.

            Would it be a problem for each DAC's Transformed EFD to have URIs specific to itself, or do they need to be identical between the two DACs (which could mean that directing a user to a particular DAC's LFA may require external information)?

            In addition, Michelle Gower and I discussed ingesting some LFA files (such as guider postage stamp stacks) into the Data Backbone as Butler-accessible datasets.  This ingestion would occur as a logical part of the EFD transformation process.

            Show
            ktl Kian-Tat Lim added a comment - I think it is simplest for the Transformed EFD to make the LFA files accessible as a filesystem and for DAX to stand up a web server (with appropriate authentication/authorization) that accepts the Transformed EFD URIs to retrieve those files. Would it be a problem for each DAC's Transformed EFD to have URIs specific to itself, or do they need to be identical between the two DACs (which could mean that directing a user to a particular DAC's LFA may require external information)? In addition, Michelle Gower and I discussed ingesting some LFA files (such as guider postage stamp stacks) into the Data Backbone as Butler-accessible datasets.  This ingestion would occur as a logical part of the EFD transformation process.
            Hide
            tjenness Tim Jenness added a comment -

            Is the outcome of this ticket a tech note?

            It's really important that we have a list of butler accessible files from the LFA because those are the files that will need sufficient metadata in them to allow them to be ingested.

            Show
            tjenness Tim Jenness added a comment - Is the outcome of this ticket a tech note? It's really important that we have a list of butler accessible files from the LFA because those are the files that will need sufficient metadata in them to allow them to be ingested.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment - - edited

            Frossie Economou We need to find some time to talk this one through. I don't know what the current state of the LFA is, with the substantial evolution of the as-built EFD from the original design.

            Tim Jenness Yes, the outcome of this ticket would be a tech note or an addition to an existing EFD tech note.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - - edited Frossie Economou We need to find some time to talk this one through. I don't know what the current state of the LFA is, with the substantial evolution of the as-built EFD from the original design. Tim Jenness Yes, the outcome of this ticket would be a tech note or an addition to an existing EFD tech note.

              People

              Assignee:
              gpdf Gregory Dubois-Felsmann
              Reporter:
              gpdf Gregory Dubois-Felsmann
              Watchers:
              Brian Van Klaveren, Dave Mills, Fritz Mueller, Frossie Economou, Gregory Dubois-Felsmann, Kian-Tat Lim, Simon Krughoff, Tim Jenness, Xiuqin Wu [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.