Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23671

Improve dataset deletion in Gen3 butler

    XMLWordPrintable

    Details

    • Story Points:
      6
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      I've discovered various ways in which dataset deletion is unsafe or incomplete on DM-21849, and as that's already ballooned into a megaticket, I'm punting a lot of that to here.

      The main reason this is hard is that deletion is a fight between transactionality and foreign keys; we generally need three steps.
      1) Use the Registry to identify a bunch of datasets to delete.
      2) Actually delete those from the Datastore
      3) Actually delete those from Registry.

      We need to do (2) before (3) so we can be sure we've actually removed (e.g.) a file from storage before we delete its Registry representation; this is enforced by the foreign key from dataset_storage (aside: we should really rename that to dataset_location) to dataset, and the fact that has no ON DELETE clause.

      But deleting from Datastore first is actually really scary:

      • Datastore atomicity and exception safety is totally ad-hoc and implemented by us. It basically doesn't exist right now (e.g. https://github.com/lsst/daf_butler/blob/master/python/lsst/daf/butler/datastores/posixDatastore.py#L286). So if we delete one dataset artifact in Datastore, and then either Datastore fails on a later one or Registry fails due to some integrity constraint, we roll back the transaction, and BOOM: corrupt data repository.
      • Datasets have a ton of relationships in Registry, so later failures due to integrity constraints are actually pretty likely.  These include relationships to each other via composition and provenance and relationships to collections. Those are mostly well captured by foreign key constraints, and we can somewhat use ON DELETE rules to make sure step (3) doesn't fail as often. But it still can fail, for two reasons:
        • If an ON DELETE rule triggers on a dataset we didn't tell Datastore to delete in step (2), we'll violate (for good reason) the dataset_storage foreign key that intentionally lacks an ON DELETE rule.
        • While an ON DELETE rule can take care of the row that associates a dataset with a particular quantum, I think we need to delete the entire quantum that references a deleted dataset - otherwise our provenance is a lie, which is much worse than being nonexistent. As that's another table removed, I don't think there's an easy way to do this.

      I also think I've heard Christopher Stephens [X] say that he's not a huge fan of ON DELETE rules in the first place.

      I can think of a few angles with which to approach this:

      • We could add sophistication to step (1) to make it guarantee that it's accounted for all Registry relationships when determining what we need to send to Datastore.  It could even be responsible for actually deleting related non-dataset entities in a transaction in advance, passing that transaction onto steps (2) and (3) via a context manager.  It certainly means we'd have to do a lot of querying on the datasets we're told to delete, even if the user just did a bunch of similar querying to find them.
      • We can try to make at least PosixDatastore deletion more atomic, or at least make failures more recoverable (e.g. move files to temporary locations when "deleting them", then try to actually remove them during commit).
      • We could require Datastores to be robust against failure-to-delete instead of requiring that their deletion be atomic and transactional, and hence declare the dataset_storage table the only source of truth on whether a dataset exists in a Datastore (so e.g. orphaned files may exist, but must be ignored/clobbered by Datastore when encountered).  We'd then just commit the Registry deletion transaction before trying to delete from Datastore, and live with whatever level of success we can get there.

      The last one feels like it might have to be at least a part of the solution if we really want strong guarantees, but there may be others, and if Kian-Tat Lim wants to get more involved in Registry development, this may be a good place to start - I imagine this is an area he's got a lot more experience than I have.

      More information can be found in the extensive TODO comments in Butler.purge (on the tickets/DM-21849 branch right now) and I'll probably be adding some to ctrl_mpexec as well, as it looks like I'm going to have to add but not implement at least one command-line option because daf_butler lacks the deletion interfaces it will need (particularly deletion of entire RUN collections).

      I'm setting Team to Arch for now because I know I won't get to it soon, but I figure I'll steal it back if it's not done after I finish a few other Registry tickets.

        Attachments

          Issue Links

            Activity

            Hide
            cs2018 Christopher Stephens [X] (Inactive) added a comment -

            If the schema allows for "on delete cascade" FK constraints to declare what needs to be deleted and it plays well with potential datastore issues, I'm all for defining FK constraints like that. 

            i don't think this would play well with "on delete cascade" but could you take advantage of the "RETURNING INTO" clause to store deleted data until you know what result of datastore operations are? if something goes wrong, you work out what data needs to be added back?

              

            Show
            cs2018 Christopher Stephens [X] (Inactive) added a comment - If the schema allows for "on delete cascade" FK constraints to declare what needs to be deleted and it plays well with potential datastore issues, I'm all for defining FK constraints like that.  i don't think this would play well with "on delete cascade" but could you take advantage of the "RETURNING INTO" clause to store deleted data until you know what result of datastore operations are? if something goes wrong, you work out what data needs to be added back?   
            Hide
            ktl Kian-Tat Lim added a comment -

            Is it possible to add a "deleting/deleted" bit to the Registry schema or incorporate it into an existing field? Having an intermediate state (where the row is not gone but can be filtered from normal queries) may allow simplification.

            I will look into implementing this as a prototype.

            Show
            ktl Kian-Tat Lim added a comment - Is it possible to add a "deleting/deleted" bit to the Registry schema or incorporate it into an existing field? Having an intermediate state (where the row is not gone but can be filtered from normal queries) may allow simplification. I will look into implementing this as a prototype.
            Hide
            ktl Kian-Tat Lim added a comment - - edited

            My understanding is that the problem is that dataset deletion from the Datastore is difficult to make transactional, with rollback, and that the "dataset deleted" state is represented in the database either by Registry entry absent or Registry entry present and Datastore entry absent. My proposal is for dataset deletion to be handled transactionally in the Registry (with any required foreign key manipulation etc.) by setting a "Deleting/deleted" column. Removing Datastore entries and actual dataset physical representations can then be handled by non-transactional and garbage-collecting-style processes. If non-preservation of Registry entries is desired, once Datastore entries and files have been removed, the Registry entry can be removed. If necessary, Registry and Datastore can be handled together in the database transactionally, with only the dataset removal being non-transactional and external.

            Show
            ktl Kian-Tat Lim added a comment - - edited My understanding is that the problem is that dataset deletion from the Datastore is difficult to make transactional, with rollback, and that the "dataset deleted" state is represented in the database either by Registry entry absent or Registry entry present and Datastore entry absent. My proposal is for dataset deletion to be handled transactionally in the Registry (with any required foreign key manipulation etc.) by setting a "Deleting/deleted" column. Removing Datastore entries and actual dataset physical representations can then be handled by non-transactional and garbage-collecting-style processes. If non-preservation of Registry entries is desired, once Datastore entries and files have been removed, the Registry entry can be removed. If necessary, Registry and Datastore can be handled together in the database transactionally, with only the dataset removal being non-transactional and external.
            Hide
            jbosch Jim Bosch added a comment -

            Sounds reasonable. It'd be ideal if we could limit the number of tables that required this treatment to just the dataset location/storage one (we're inconsistent about naming; need to fix that) manipulated by the datastores. I think that works, but I haven't thought it through in detail.

            Show
            jbosch Jim Bosch added a comment - Sounds reasonable. It'd be ideal if we could limit the number of tables that required this treatment to just the dataset location/storage one (we're inconsistent about naming; need to fix that) manipulated by the datastores. I think that works, but I haven't thought it through in detail.
            Hide
            tjenness Tim Jenness added a comment -

            So it sounds like you only want this sort of flagging to happen when deleting datasets where you need the deletion of the rows to be guaranteed if the file was also deleted: flag the relevant rows, delete the dataset from file system, complete deletion of rows, unless file deletion failed in which case flags are reset.

            General registry deletions will always use their own sqlalchemy transactions since they aren't relying on non-database activity?

            Show
            tjenness Tim Jenness added a comment - So it sounds like you only want this sort of flagging to happen when deleting datasets where you need the deletion of the rows to be guaranteed if the file was also deleted: flag the relevant rows, delete the dataset from file system, complete deletion of rows, unless file deletion failed in which case flags are reset. General registry deletions will always use their own sqlalchemy transactions since they aren't relying on non-database activity?
            Hide
            jbosch Jim Bosch added a comment - - edited

            That's generally what I had in mind, and what I thought Kian-Tat Lim had in mind.

            But now that I think about it, I'm not sure adding that "removing" flag to just dataset_locations and using it as you just described does everything we want.  It might solve the most serious problem, which is making dataset deletions effectively atomic (at a single-dataset level) between Registry and Datastore; we can:

            1. Delete from Datastore first, just setting the "removing" flag in dataset_locations.  If deletion from Datastore fails, unset it.
            2. Delete from Registry (not just dataset_locations, but other tables as well) only datasets for which the "removing" flag is still set.

            But if we take that approach while trying to delete multiple files, as soon as we've actually deleted a file from the Datastore we can't undo it, even if something goes wrong with deleting a later dataset or deleting something from the Registry.  Maybe that's okay, but it's less than I'd hoped for.  Perhaps we need to chose between that "less than I'd hoped for" or adding a "removing" flat to all Registry tables that might be involved in a dataset-deletion operation.

            Show
            jbosch Jim Bosch added a comment - - edited That's generally what I had in mind, and what I thought Kian-Tat Lim had in mind. But now that I think about it, I'm not sure adding that "removing" flag to just dataset_locations and using it as you just described does everything we want.  It might solve the most serious problem, which is making dataset deletions effectively atomic (at a single-dataset level) between Registry and Datastore; we can: Delete from Datastore first, just setting the "removing" flag in dataset_locations.  If deletion from Datastore fails, unset it. Delete from Registry (not just dataset_locations, but other tables as well) only datasets for which the "removing" flag is still set. But if we take that approach while trying to delete multiple files, as soon as we've actually deleted a file from the Datastore we can't undo it, even if something goes wrong with deleting a later dataset or deleting something from the Registry.  Maybe that's okay, but it's less than I'd hoped for.  Perhaps we need to chose between that "less than I'd hoped for" or adding a "removing" flat to  all Registry tables that might be involved in a dataset-deletion operation.
            Hide
            ktl Kian-Tat Lim added a comment -

            Do we really have a need for transactional (with rollback) deletion of sets of datasets?  If not, I think we're OK.

            If something later goes wrong with deleting something from the Registry, the "removing" flag has already been set, so there should be no semantic issues.

            The setting of the "removing" flag does have to be transactional with respect to modifications to any other Registry tables involved in the deletion.  If those also need to be tied to the physical deletion, then things may get more complex.

            Show
            ktl Kian-Tat Lim added a comment - Do we really have a need for transactional (with rollback) deletion of sets of datasets?  If not, I think we're OK. If something later goes wrong with deleting something from the Registry, the "removing" flag has already been set, so there should be no semantic issues. The setting of the "removing" flag does have to be transactional with respect to modifications to any other Registry tables involved in the deletion.  If those  also need to be tied to the physical deletion, then things may get more complex.
            Hide
            jbosch Jim Bosch added a comment -

            I don't think we have a clear need for it. If we did have it, we wouldn't have to worry about partial deletes of sets of datasets, which also come with problems, but it sounds like that's still simpler than actually implementing transactional set deletion.

            Nothing in Registry is ever required to be deleted because physical deletion succeeds, but many related rows in Registry may not be deleted if physical deletion fails, unless datastores are required to treat the absence of a dataset_location row as an indicator that a file (for example) should not exist even if it does.

            Show
            jbosch Jim Bosch added a comment - I don't think we have a clear need for it. If we did have it, we wouldn't have to worry about partial deletes of sets of datasets, which also come with problems, but it sounds like that's still simpler than actually implementing transactional set deletion. Nothing in Registry is ever required to be deleted because physical deletion succeeds, but many related rows in Registry may not be deleted if physical deletion fails, unless datastores are required to treat the absence of a dataset_location row as an indicator that a file (for example) should not exist even if it does.
            Hide
            tjenness Tim Jenness added a comment -

            The fundamental problem we have is that we open a database transaction, twiddle lots of tables, delete a file from disk, twiddle some more tables and at some point during the final twiddling we get a problem and rollback all the transactions. The file itself can not be recovered.

            The main aim here on this ticket seems to be to be able to do as much database work as possible before deleting the file, thereby minimizing the risk of a rollback. Putting flags in each row that should be deleted helps since it gets everything set up for the file delete. In theory there still seems to be a risk of a rollback regardless.

            In posixDatastore at least I do have the option of doing a rename to a /deleting part of the datastore and then doing a final clean up right at the end. You could even have two APIs in datastore – markForRemoval and remove so that registry can really sort itself out first and know that datastore has set the relevant deleting flags in tables and moved the dataset to a new location (I'm not entirely sure how I keep track of the new path but that's a different problem).

            Show
            tjenness Tim Jenness added a comment - The fundamental problem we have is that we open a database transaction, twiddle lots of tables, delete a file from disk, twiddle some more tables and at some point during the final twiddling we get a problem and rollback all the transactions. The file itself can not be recovered. The main aim here on this ticket seems to be to be able to do as much database work as possible before deleting the file, thereby minimizing the risk of a rollback. Putting flags in each row that should be deleted helps since it gets everything set up for the file delete. In theory there still seems to be a risk of a rollback regardless. In posixDatastore at least I do have the option of doing a rename to a /deleting part of the datastore and then doing a final clean up right at the end. You could even have two APIs in datastore – markForRemoval and remove so that registry can really sort itself out first and know that datastore has set the relevant deleting flags in tables and moved the dataset to a new location (I'm not entirely sure how I keep track of the new path but that's a different problem).
            Hide
            jbosch Jim Bosch added a comment -

            Having datastores move when "deleting" and then actually delete on commit seems like the easiest way out to me - that could even work with no "removing" flag or changes to the interfaces at all (the custom datastores transaction object we have should work). It seems viable for PosixDatastore, at least - any idea if it would be for S3?

            I still haven't been able to work out a way to use the flag column idea that doesn't look point (1) in the ticket description, i.e. doable but painfully complex - basically because setting that flag isn't going to do anything about foreign key constraints, which only care about whether an ID is present. Having a dataset_location_removing table that we move rows to might solve that, though, if it doesn't have a foreign key to the dataset table. That means we could "orphan" datasets that are supposed to be deleted but weren't there, letting us clean them up manually or with GC logic later.

            Show
            jbosch Jim Bosch added a comment - Having datastores move when "deleting" and then actually delete on commit seems like the easiest way out to me - that could even work with no "removing" flag or changes to the interfaces at all (the custom datastores transaction object we have should work). It seems viable for PosixDatastore, at least - any idea if it would be for S3? I still haven't been able to work out a way to use the flag column idea that doesn't look point (1) in the ticket description, i.e. doable but painfully complex - basically because setting that flag isn't going to do anything about foreign key constraints, which only care about whether an ID is present. Having a dataset_location_removing table that we move rows to might solve that, though, if it doesn't have a foreign key to the dataset table. That means we could "orphan" datasets that are supposed to be deleted but weren't there, letting us clean them up manually or with GC logic later.
            Hide
            jbosch Jim Bosch added a comment -

            I should explain that last idea in more detail. We start with a bunch of datasets the user wants to delete, and maybe a collection - multi-dataset deletes are mostly deleting full runs, I think.
            1. Butler starts a Registry transaction. Within that, we delete from dataset, quantum, and optionally collection. We also delete from dataset_location, but insert identical rows into dataset_location_trash.
            2. Butler commits the Registry transaction (exiting the with block). ON DELETE CASCADe clauses clean up other Registry tables (like the pesky ones for composition) if it succeeds, but we can also roll back everything if something doesn't look right as detected by any FKs without ON DELETE.

            3. If that succeeds, we are left with rows in dataset_location_trash that have no row in dataset, but we consider that okay. Butler can now tell datastores to delete everything in the trash. They remove the rows in the trash table when successful at deleting the physical artifacta. If they aren't, they can try again later or ask the user for intervention - in either case, we know what we have in the repo, and the only kind of error that doesn't trigger a complete rollback are internal datastores errors - mistakes by users in specifying what to delete can be totally rolled back if the mistake is evident in a constraint violation.

            Show
            jbosch Jim Bosch added a comment - I should explain that last idea in more detail. We start with a bunch of datasets the user wants to delete, and maybe a collection - multi-dataset deletes are mostly deleting full runs, I think. 1. Butler starts a Registry transaction. Within that, we delete from dataset , quantum , and optionally collection . We also delete from dataset_location , but insert identical rows into dataset_location_trash . 2. Butler commits the Registry transaction (exiting the with block). ON DELETE CASCADe clauses clean up other Registry tables (like the pesky ones for composition) if it succeeds, but we can also roll back everything if something doesn't look right as detected by any FKs without ON DELETE. 3. If that succeeds, we are left with rows in dataset_location_trash that have no row in dataset , but we consider that okay. Butler can now tell datastores to delete everything in the trash. They remove the rows in the trash table when successful at deleting the physical artifacta. If they aren't, they can try again later or ask the user for intervention - in either case, we know what we have in the repo, and the only kind of error that doesn't trigger a complete rollback are internal datastores errors - mistakes by users in specifying what to delete can be totally rolled back if the mistake is evident in a constraint violation.
            Hide
            tjenness Tim Jenness added a comment -

            This sounds like there are only two changes we need to make then:

            1. Registry changes dataset_location table rather than datastore (and moves rows to the _trash version)
            2. Datastore does a bulk delete internally based on everything that is present in the dataset_location_trash table. If something goes wrong in the datastore the worst case is that you have a bunch of files left over that couldn't be deleted for some reason but that's fine because we know what they are because they are still listed in the trash table – registry won't ever ask for them again and we can clear up.

            It still might be worth datastore doing the deletes as bulk renames because that would prevent unpleasantness later on if the file is left hanging round for some reason and a later put attempts to re-store that dataset.

            For standalone datastore testing I would need to have some way of deleting a dataset but registry/butler don't really care how I do that.

            Given that this sounds mostly trivial from registry side and most of the work is datastore, maybe I should take on this ticket and we can give something else to Andy Salnikov.

            Someone needs to think still about managing in-memory datastore / registry relationship.

            Show
            tjenness Tim Jenness added a comment - This sounds like there are only two changes we need to make then: Registry changes dataset_location table rather than datastore (and moves rows to the _trash version) Datastore does a bulk delete internally based on everything that is present in the dataset_location_trash table. If something goes wrong in the datastore the worst case is that you have a bunch of files left over that couldn't be deleted for some reason but that's fine because we know what they are because they are still listed in the trash table – registry won't ever ask for them again and we can clear up. It still might be worth datastore doing the deletes as bulk renames because that would prevent unpleasantness later on if the file is left hanging round for some reason and a later put attempts to re-store that dataset. For standalone datastore testing I would need to have some way of deleting a dataset but registry/butler don't really care how I do that. Given that this sounds mostly trivial from registry side and most of the work is datastore, maybe I should take on this ticket and we can give something else to Andy Salnikov . Someone needs to think still about managing in-memory datastore / registry relationship.
            Hide
            jbosch Jim Bosch added a comment -

            :+1: I'm okay with deferring the in-memory datastore problem until after Gen2 deprecation.

            Show
            jbosch Jim Bosch added a comment - :+1: I'm okay with deferring the in-memory datastore problem until after Gen2 deprecation.
            Hide
            tjenness Tim Jenness added a comment -

            Ok. I was also thinking about deferring the chained/in-memory datastores. Caching datastores are going to be really important eventually but no-one is going to notice them in the short term.

            Show
            tjenness Tim Jenness added a comment - Ok. I was also thinking about deferring the chained/in-memory datastores. Caching datastores are going to be really important eventually but no-one is going to notice them in the short term.
            Hide
            tjenness Tim Jenness added a comment -

            Final comment: ctrl_mpexec is listed as a relevant component on that ticket? Given the discussion we just had (add one table, only really change datastore) is that relevant any longer?

            Show
            tjenness Tim Jenness added a comment - Final comment: ctrl_mpexec is listed as a relevant component on that ticket? Given the discussion we just had (add one table, only really change datastore) is that relevant any longer?
            Hide
            jbosch Jim Bosch added a comment -

            Only because I deferred implementing some command-line options there because they were blocked on this ticket. Might make more sense to make a new ticket for them (blocked on this one).

            Show
            jbosch Jim Bosch added a comment - Only because I deferred implementing some command-line options there because they were blocked on this ticket. Might make more sense to make a new ticket for them (blocked on this one).
            Hide
            tjenness Tim Jenness added a comment -

            I have no view into what the ctrl_mpexec tickets might involve so separate tickets would be best.

            Show
            tjenness Tim Jenness added a comment - I have no view into what the ctrl_mpexec tickets might involve so separate tickets would be best.
            Hide
            jbosch Jim Bosch added a comment -

            Created and linked DM-24414.

            Show
            jbosch Jim Bosch added a comment - Created and linked DM-24414 .
            Hide
            tjenness Tim Jenness added a comment -

            Jim Bosch I think I've implemented what you asked for. In the end I did create a separate _trash table. Other than that it's mainly a trash() method followed by an emptyTrash().

            Show
            tjenness Tim Jenness added a comment - Jim Bosch I think I've implemented what you asked for. In the end I did create a separate _trash table. Other than that it's mainly a trash() method followed by an emptyTrash().
            Hide
            jbosch Jim Bosch added a comment -

            Comments on PRs. Overall, I'm still worried about concurrency in a few places, but this is a huge improvement already and may not be the ticket to try to lock things down completely.

            Show
            jbosch Jim Bosch added a comment - Comments on PRs. Overall, I'm still worried about concurrency in a few places, but this is a huge improvement already and may not be the ticket to try to lock things down completely.
            Hide
            tjenness Tim Jenness added a comment -

            I've rewritten trash/emptyTrash to optionally ignore errors. If you do datastore.remove errors are important and if you do butler.prune errors are ignored. You probably should take another look before I merge.

            Show
            tjenness Tim Jenness added a comment - I've rewritten trash/emptyTrash to optionally ignore errors. If you do datastore.remove errors are important and if you do butler.prune errors are ignored. You probably should take another look before I merge.
            Hide
            jbosch Jim Bosch added a comment -

            Looks good; I've got nothing new to add (though I did reply to some of the existing threads), and I think it ought to be concurrent now.

            If you don't want to tackle simplifying Butler.prune here, would you mind ticketing that?

            Show
            jbosch Jim Bosch added a comment - Looks good; I've got nothing new to add (though I did reply to some of the existing threads), and I think it ought to be concurrent now. If you don't want to tackle simplifying Butler.prune here, would you mind ticketing that?

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Jim Bosch
              Watchers:
              Andy Salnikov, Christopher Stephens [X] (Inactive), Christopher Waters, Jim Bosch, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.