# Implement Butler deletion APIs

XMLWordPrintable

## Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
2
• Sprint:
DRP S19-3
• Team:
Data Release Production

## Description

Implement Registry.dissociate as well as a way to remove all Datasets from one or more Collections.

## Activity

Hide
Jim Bosch added a comment -

I'm not opposed to the OODS-is-a-Datastore model but would like to assume for the purposes of this ticket that we're using the OODS-uses-butler model unless it ends up being too problematic, just to keep that option open.

In that approach, I imagine we'd add timestamps to the Butler Registry, and Steve Pietrowicz's query would look something like

 for oldRef in butler.registry.findOldDatasets(...):  butler.remove(oldRef)

In any case, I think the important point is that if you're doing the query via Butler, what you'll get back are DatasetRefs, so you'd probably be using the single-argument form of the new remove method.

Show
Jim Bosch added a comment - I'm not opposed to the OODS-is-a-Datastore model but would like to assume for the purposes of this ticket that we're using the OODS-uses-butler model unless it ends up being too problematic, just to keep that option open. In that approach, I imagine we'd add timestamps to the Butler Registry, and Steve Pietrowicz 's query would look something like for oldRef in butler.registry.findOldDatasets(...): butler.remove(oldRef) In any case, I think the important point is that if you're doing the query via Butler, what you'll get back are DatasetRefs, so you'd probably be using the single-argument form of the new remove method.
Hide
Steve Pietrowicz added a comment -

Sounds fine to me.

Show
Steve Pietrowicz added a comment - Sounds fine to me.
Hide
Christopher Stephens added a comment -

Jim Bosch "on delete cascade" isn't ideal from a performance perspective since it results in row-by-row processing. If the deletes from the parent table(s) are going to be limited to a relatively small number of rows that also correspond to a relatively small number of child rows, this will likely be fine. If these deletes involve a large amount of data, deleting from child tables first and then parents will be faster.

If the above comments concern you, maybe we can set up a realistic test to measure the difference in performance?

Show
Christopher Stephens added a comment - Jim Bosch "on delete cascade" isn't ideal from a performance perspective since it results in row-by-row processing. If the deletes from the parent table(s) are going to be limited to a relatively small number of rows that also correspond to a relatively small number of child rows, this will likely be fine. If these deletes involve a large amount of data, deleting from child tables first and then parents will be faster.  If the above comments concern you, maybe we can set up a realistic test to measure the difference in performance?
Hide
Jim Bosch added a comment -

Thanks, Christopher Stephens.  I think I'm not concerned right now, because these Python APIs already limit us to row-by-row processing on the parent tables.  I'm sure we will want to add vectorized APIs for at least some operations in the future, and I'll probably have some follow-up questions on Slack when we get to that.

Show
Jim Bosch added a comment - Thanks, Christopher Stephens .  I think I'm not concerned right now, because these Python APIs already limit us to row-by-row processing on the parent tables.  I'm sure we will want to add vectorized APIs for at least some operations in the future, and I'll probably have some follow-up questions on Slack when we get to that.
Hide
Tim Jenness added a comment -

Looks okay to me.

Show
Tim Jenness added a comment - Looks okay to me.

## People

• Assignee:
Jim Bosch
Reporter:
Jim Bosch
Reviewers:
Tim Jenness
Watchers:
Christopher Stephens, Jim Bosch, Steve Pietrowicz, Tim Jenness