Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-21795

Rework Registry provenance objects to match prototype

    XMLWordPrintable

    Details

      Description

      (original description is no longer accurate; see comments)

      Registry's provenance tables - execution, run, and quantum - can't currently be fully populated without having database updates.  For example:

      • you can't insert a dataset until after its run has been inserted;
      • you can't insert a run until you've inserted the execution it inherits its ID from;
      • you can't insert an execution until after it's completed, because it has an end timestamp field.

      Work through low-level use cases for these tables and ensure we can actually have all of their values when it's time to insert them, splitting up tables as necessary.

      Michelle Gower, Christopher Stephens [X]: this ticket exists because I'm assuming updates are a problem.  Please let me know if they aren't or if you have any expections/requirements/wisdom on when various provenance records should be inserted relative to the datasets they refer to.

        Attachments

          Issue Links

            Activity

            Hide
            cs2018 Christopher Stephens [X] (Inactive) added a comment -

            It looks to me like any updates would be to logging type tables (mainly execution.end_time). i don't see any issues with allowing this.

            Show
            cs2018 Christopher Stephens [X] (Inactive) added a comment - It looks to me like any updates would be to logging type tables (mainly execution.end_time). i don't see any issues with allowing this.
            Hide
            jbosch Jim Bosch added a comment -

            I'm repurposing this ticket (slightly) to bring the handling of collections, runs, and quanta in line with the new prototype.  That will involve:

            • Removing the Run class, and always using strings to refer to runs in public APIs.
            • Removing the Execution class, and moving its attributes into Run and Quantum.
            • Inventing the high-level Registry APIs for working with these entities (the mid-level prototype APIs inform these, but do not specify them), and replacing the existing Registry APIs with them.
            • Adjusting the Quantum class to work with the new Registry APIs and the mid-level prototype APIs.

            This will involve some minor changes to the schema, but the big changes will be reserved until the Registry backend out of the way.  For now we'll probably just let the string name of a run be its primary key, and add a surrogate integer ID later.  The focus for this ticket is on getting the primitives in shape so that we can develop the new Registry against them without breaking the existing one, and to a lesser extent getting the public Registry APIs closer to their final form.

            Show
            jbosch Jim Bosch added a comment - I'm repurposing this ticket (slightly) to bring the handling of collections, runs, and quanta in line with the new prototype .  That will involve: Removing the Run class, and always using strings to refer to runs in public APIs. Removing the Execution class, and moving its attributes into Run and Quantum. Inventing the high-level Registry APIs for working with these entities (the mid-level prototype APIs inform these, but do not specify them), and replacing the existing Registry APIs with them. Adjusting the Quantum class to work with the new Registry APIs and the mid-level prototype APIs. This will involve some minor changes to the schema, but the big changes will be reserved until the Registry backend out of the way.  For now we'll probably just let the string name of a run be its primary key, and add a surrogate integer ID later.  The focus for this ticket is on getting the primitives in shape so that we can develop the new Registry against them without breaking the existing one, and to a lesser extent getting the public Registry APIs closer to their final form.
            Hide
            jbosch Jim Bosch added a comment -

            Mikolaj Kowalik, do you mind taking this review?  I'm trying to spread daf_butler reviews around more, and this one both doesn't require much previous knowledge of the codebase and it's a part that's relevant for interfacing with workflow-management code, so it seemed like a good one for you.

            Changes are in four packages:

            Only the daf_butler changes are nontrivial, and even most of those are quite mechanical.

            Show
            jbosch Jim Bosch added a comment - Mikolaj Kowalik , do you mind taking this review?  I'm trying to spread daf_butler reviews around more, and this one both doesn't require much previous knowledge of the codebase and it's a part that's relevant for interfacing with workflow-management code, so it seemed like a good one for you. Changes are in four packages: daf_butler ctrl_mpexec pipe_base ci_hsc_gen3 Only the daf_butler changes are nontrivial, and even most of those are quite mechanical.
            Hide
            mkowalik Mikolaj Kowalik added a comment -

            Sure. I'll review the changes.

            Show
            mkowalik Mikolaj Kowalik added a comment - Sure. I'll review the changes.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Mikolaj Kowalik
              Watchers:
              Christopher Stephens [X] (Inactive), Jim Bosch, Mikolaj Kowalik
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.