Fix Version/s: None
https://github.com/lsst/daf_butler/blob/master/python/lsst/daf/butler/datastores/s3Datastore.py#L199-L205 checks to see if an object exists before writing it. With AWS and likely other object stores, relaxed consistency guarantees mean that there could be a race condition here if multiple simultaneous jobs are trying to write. Furthermore, the additional API call is pretty wasteful as it is almost always going to return false, and it may help trigger rate limiting.
We should consider whether we can live with overwriting of a dataset.
- is blocked by
DM-24247 butler validation error in ci_hsc_gen3
- is triggering
DM-26278 Investigate situation of deleting and adding to the same butler collection simultaneously
- To Do
Is it possible for a datastore.trash to happen on this dataset, then having a new put occurring at the same time as some other process is doing the emptyTrash (which could remove a file that was just written). This requires that the new put and the old dataset share the same dataId and collection.
I believe it is possible; once the dataset has been trashed, the registry will allow a new one with the same dataset_type, collection, and data ID. And the result would be that the datastore internal records and the locations table both claim the dataset exists, even though it really doesn't. That's a pretty mild form of repo corruption for something that can only happen if someone is already doing something pretty inadvisable, and I think it's totally recoverable in the sense that you can just trash and empty trash again on those data IDs. But it's also a fairly hard one to detect (you actually have to try to get the dataset to notice something has gone awry), so if we're concerned that someone is going to do something inadvisable, it may be worth working harder to address it.
The simple, brute-force way to fix this would be to always include the dataset_id in the filename template; that will just avoid the clash entirely.
Another approach would involve defining a UNIQUE constraint in the internal datastore records that would prohibit that concurrent put. The constraint would need to at least include the URI and whatever columns need to go with that to cover entries that should be allowed to have the same URI (e.g. for composition). With that in place, I think a concurrent put that happens just before the trash is emptied will fail at the Registry level (perhaps not before it actually writes to datastore storage, but then that's just one more case of "don't trust datastore", because the database transaction will still roll back both the internal datastore records insertion and the dataset_location insertion).
Looks OK as far as it goes. I take no position on the potential conflicts with "trash" that I think are beyond the scope here.
I'll file a new ticket to ponder the "deleting from a collection in one process whilst replacing that data in the same collection from another" problem. It sounds fraught regardless.
Dino Bektesevic ingest has been completely rewritten so don't worry about it.