Status: To Do
Fix Version/s: None
I believe we're implementing a plan in which raws are protected from expiration in OODS by adding them to TAGGED collections, but this may be based on a faulty recollection of how this would work.
At a low level, the database rows that store dataset membership in TAGGED collections have a foreign key to the dataset table marked with ON DELETE CASCADE, which means, "if the dataset is deleted just delete this membership row, too" - so there is no foreign key protection against TAGGED dataset deletion.
There may well be higher-level guards; some of our deletion APIs do a lot of querying in advance to figure out whether it's safe to delete something, and these may perform such a check. But if we want a rigorous guard against TAGGED dataset deletion, we should change this foreign key definition. I think we do want it: even aside from the OODS use case, this has long been an uncomfortable asymmetry with how CHAINED collection membership works, where the philosophy is that you shouldn't be able to blindly delete something you own without at least being told when someone else might be depending on it.
I'm not sure this ticket will be easy-ish (it may just be a two-line code change, but is definitely also a minor schema migration, and that's always a pain) or much harder; the hard case would enter because the TAGGED member table is also the RUN membership table, and we may be relying on ON DELETE CASCADE to remove datasets from RUN collections, and if we are, I don't know how hard it would be to change that.
- is blocked by
DM-34951 Redesign Registry/Datastore boundary w.r.t. deletion and location lookups
|Field||Original Value||New Value|
|Team||Architecture [ 10304 ]|
Tim Jenness, I think we'll want to include this in the upcoming schema migration batch.
I'm trying to work out what is actually needed for this ticket.
Butler.pruneDatasets() currently has a disassociate parameter which explicitly states that when True this dataset will be removed from listed tagged collections. Furthermore disassociate must be True when purge is True and the latter says that it ignores the list of tagged collections. I'm trying to understand how we want the API to look in the OODS case because it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection. What I'm trying to do is construct a test that fails now but will work after the fix on this ticket and I can't work out how to reproduce.
it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection
I think that's right and my expectation is that if we allow that argument combination, and then use that argument combination when trying to delete datasets from a RUN collection when some are also in a TAGGED collection, it will do so while also removing them from the TAGGED collection. The desired behavior is that an exception (probably sqlalchemy.exc.IntegrityError, unless we re-raise) be raised instead, and nothing is deleted.
But now I'm a bit worried that we'll trash all of those datasets anyway even if we do later get an exception, so it'll all be still be a moot point from OODS' perspective (but maybe a step in the right direction).
I told this Jim in a meeting just now, but I thought it was worth adding here: On cleanup: the OODS queries which datasets are deletable, and which files are TAGGED and does a set difference between the two. From that, it knows which files it can delete.