Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27476

Add support for metadata sidecar files for ingest

    XMLWordPrintable

Details

    • 3
    • Ops Middleware
    • No

    Description

      In order to support ingest of files from object store with any reasonable performance we need to be able to pre-calculate the ingest metadata and then use that metadata for ingest. Files will likely be ingested repeatedly and each time we do not want to have to download either the full file or the first N bytes to parse FITS headers.

      Mofdify butler ingest-raws so that it can read JSON metadata snippet. This can either be in the form of a .json file of the same name as the data file, or optionally an update of ButlerURI to get it to read metadata from the object. ktl has suggested that we can use the object metadata (see https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html but I'm not yet sure if the astronomy metadata fits in 2kB although we could probably encode it).

      Pre-calculating this metadata will be a different ticket.

      Since this is only really an issue for object stores, I am marking this as an IDF ticket.

      Attachments

        Issue Links

          Activity

            tjenness Tim Jenness added a comment -

            Parejkoj would you be able to review this ticket? There are two parts: firstly adding support to raw ingest for index files and sidecar files as created by DM-28844. The second part is to augment obs_base testing significantly so that we can test raw ingest and define visit without needing another obs package. I create now a proper dummy instrument and use the index files for ingest so that I don't need to write real data files that need a real metadata translator. I also added a test for yaml camera (there wasn't one) so that I can add a simple yaml camera for the dummy instrument so we can define visits.

            tjenness Tim Jenness added a comment - Parejkoj would you be able to review this ticket? There are two parts: firstly adding support to raw ingest for index files and sidecar files as created by DM-28844 . The second part is to augment obs_base testing significantly so that we can test raw ingest and define visit without needing another obs package. I create now a proper dummy instrument and use the index files for ingest so that I don't need to write real data files that need a real metadata translator. I also added a test for yaml camera (there wasn't one) so that I can add a simple yaml camera for the dummy instrument so we can define visits.
            tjenness Tim Jenness added a comment -

            Note that changing raw ingest to use ButlerURI (so it will work with object stores) will be on a separate ticket.

            tjenness Tim Jenness added a comment - Note that changing raw ingest to use ButlerURI (so it will work with object stores) will be on a separate ticket.
            Parejkoj John Parejko added a comment -

            Thanks for all the cleanups: this looks much nice. A few more comments on the PR, mostly docs/comment related.

            Parejkoj John Parejko added a comment - Thanks for all the cleanups: this looks much nice. A few more comments on the PR, mostly docs/comment related.

            People

              tjenness Tim Jenness
              tjenness Tim Jenness
              John Parejko
              Hsin-Fang Chiang, John Parejko, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.