Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17491

Implement Butler deletion APIs

    Details

      Description

      Implement Registry.dissociate as well as a way to remove all Datasets from one or more Collections.

        Attachments

          Activity

          Hide
          jbosch Jim Bosch added a comment -

          I'm not opposed to the OODS-is-a-Datastore model but would like to assume for the purposes of this ticket that we're using the OODS-uses-butler model unless it ends up being too problematic, just to keep that option open.

          In that approach, I imagine we'd add timestamps to the Butler Registry, and Steve Pietrowicz's query would look something like

          for oldRef in butler.registry.findOldDatasets(...):
              butler.remove(oldRef)

          In any case, I think the important point is that if you're doing the query via Butler, what you'll get back are DatasetRefs, so you'd probably be using the single-argument form of the new remove method.

          Show
          jbosch Jim Bosch added a comment - I'm not opposed to the OODS-is-a-Datastore model but would like to assume for the purposes of this ticket that we're using the OODS-uses-butler model unless it ends up being too problematic, just to keep that option open. In that approach, I imagine we'd add timestamps to the Butler Registry, and Steve Pietrowicz 's query would look something like for oldRef in butler.registry.findOldDatasets(...): butler.remove(oldRef) In any case, I think the important point is that if you're doing the query via Butler, what you'll get back are DatasetRefs, so you'd probably be using the single-argument form of the new remove method.
          Hide
          spietrowicz Steve Pietrowicz added a comment -

          Sounds fine to me.

          Show
          spietrowicz Steve Pietrowicz added a comment - Sounds fine to me.
          Hide
          cs2018 Christopher Stephens added a comment -

          Jim Bosch "on delete cascade" isn't ideal from a performance perspective since it results in row-by-row processing. If the deletes from the parent table(s) are going to be limited to a relatively small number of rows that also correspond to a relatively small number of child rows, this will likely be fine. If these deletes involve a large amount of data, deleting from child tables first and then parents will be faster. 

          If the above comments concern you, maybe we can set up a realistic test to measure the difference in performance? 

           

          Show
          cs2018 Christopher Stephens added a comment - Jim Bosch "on delete cascade" isn't ideal from a performance perspective since it results in row-by-row processing. If the deletes from the parent table(s) are going to be limited to a relatively small number of rows that also correspond to a relatively small number of child rows, this will likely be fine. If these deletes involve a large amount of data, deleting from child tables first and then parents will be faster.  If the above comments concern you, maybe we can set up a realistic test to measure the difference in performance?   
          Hide
          jbosch Jim Bosch added a comment -

          Thanks, Christopher Stephens.  I think I'm not concerned right now, because these Python APIs already limit us to row-by-row processing on the parent tables.  I'm sure we will want to add vectorized APIs for at least some operations in the future, and I'll probably have some follow-up questions on Slack when we get to that.

          Show
          jbosch Jim Bosch added a comment - Thanks, Christopher Stephens .  I think I'm not concerned right now, because these Python APIs already limit us to row-by-row processing on the parent tables.  I'm sure we will want to add vectorized APIs for at least some operations in the future, and I'll probably have some follow-up questions on Slack when we get to that.
          Hide
          tjenness Tim Jenness added a comment -

          Looks okay to me.

          Show
          tjenness Tim Jenness added a comment - Looks okay to me.

            People

            • Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Tim Jenness
              Watchers:
              Christopher Stephens, Jim Bosch, Steve Pietrowicz, Tim Jenness
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel