Details
-
Type:
Improvement
-
Status: To Do
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Team:External
Description
Hsin-Fang reports encountering an FileNotFound error during a run even though s3CheckFileExists returns True and manually navigating to the faulty Key in the Bucket clearly shows it existing.
Two potential issues at fault here can be:
- transient network/connectivity issue
- S3 Eventual Consistency model
The error appeared in Butler.ingest functionality following a Butler.put of the same exact dataset in question. It is not a far fetched idea that due to S3 consistency model the Bucket has not yet been "updated" with the newly inserted Key in the time between the key was placed there, in Butler.put, and the time the key was checked for existence, in Butler.ingest.
In this case a waiting loop allowing the eventual consistency model to catch up would fix the problem.
In the case of network/connectivity issue it would be nicer to have a more specific error.
Attachments
Issue Links
- relates to
-
DM-25818 S3Datastore tests existence before writing
- Done
Are we doing an existence check first? In that case, a wait may be necessary. If so, it may be better to wait in the put method until the dataset is known to exist. If we're not doing an existence check, get after put should be read-after-write consistent. A more explicit explanation can be found in, e.g. https://codeburst.io/quick-explanation-of-the-s3-consistency-model-6c9f325e3f82