Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23224

Cross-check the schema column names in the Object table

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ci_hsc
    • Labels:
      None

      Description

      Now that the Object table schema of the output parquet does not depend on what filters the input data include (DM-23074), we can do checks of the schema in ci_hsc and the schema should be the same in the HSC-RC2 outputs. 

      My plan in this ticket is to add a cross-check of the schema (column names only) between the Object parquet generated in ci_hsc and the yaml ddl in cat.  

      It will be nice that the DAX team can use the yaml file in cat as the expected schema. So far nothing checks the schema of the pipeline outputs and this ticket will start adding that.  Science Pipelines can still change the schema from time to time, but that's okay, because a failed ci_hsc should prompt the developers to update the cat yaml file. 

      I plan to leave the data type checking for later (after the end-of-Feb DMLT).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hchiang2 Hsin-Fang Chiang
                Reporter:
                hchiang2 Hsin-Fang Chiang
                Reviewers:
                Colin Slater
                Watchers:
                Colin Slater, Hsin-Fang Chiang, Yusra AlSayyad
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel