Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27476

Add support for metadata sidecar files for ingest

    XMLWordPrintable

    Details

    • Story Points:
      3
    • Team:
      Ops Middleware
    • Urgent?:
      No

      Description

      In order to support ingest of files from object store with any reasonable performance we need to be able to pre-calculate the ingest metadata and then use that metadata for ingest. Files will likely be ingested repeatedly and each time we do not want to have to download either the full file or the first N bytes to parse FITS headers.

      Mofdify butler ingest-raws so that it can read JSON metadata snippet. This can either be in the form of a .json file of the same name as the data file, or optionally an update of ButlerURI to get it to read metadata from the object. Kian-Tat Lim has suggested that we can use the object metadata (see https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html but I'm not yet sure if the astronomy metadata fits in 2kB although we could probably encode it).

      Pre-calculating this metadata will be a different ticket.

      Since this is only really an issue for object stores, I am marking this as an IDF ticket.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            John Parejko would you be able to review this ticket? There are two parts: firstly adding support to raw ingest for index files and sidecar files as created by DM-28844. The second part is to augment obs_base testing significantly so that we can test raw ingest and define visit without needing another obs package. I create now a proper dummy instrument and use the index files for ingest so that I don't need to write real data files that need a real metadata translator. I also added a test for yaml camera (there wasn't one) so that I can add a simple yaml camera for the dummy instrument so we can define visits.

            Show
            tjenness Tim Jenness added a comment - John Parejko would you be able to review this ticket? There are two parts: firstly adding support to raw ingest for index files and sidecar files as created by DM-28844 . The second part is to augment obs_base testing significantly so that we can test raw ingest and define visit without needing another obs package. I create now a proper dummy instrument and use the index files for ingest so that I don't need to write real data files that need a real metadata translator. I also added a test for yaml camera (there wasn't one) so that I can add a simple yaml camera for the dummy instrument so we can define visits.
            Hide
            tjenness Tim Jenness added a comment -

            Note that changing raw ingest to use ButlerURI (so it will work with object stores) will be on a separate ticket.

            Show
            tjenness Tim Jenness added a comment - Note that changing raw ingest to use ButlerURI (so it will work with object stores) will be on a separate ticket.
            Hide
            Parejkoj John Parejko added a comment -

            Thanks for all the cleanups: this looks much nice. A few more comments on the PR, mostly docs/comment related.

            Show
            Parejkoj John Parejko added a comment - Thanks for all the cleanups: this looks much nice. A few more comments on the PR, mostly docs/comment related.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              John Parejko
              Watchers:
              Hsin-Fang Chiang, John Parejko, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.