Fix Version/s: None
Team:Data Access and Database
The current implementation of the Ingest service at Qserv workers has a loophole creating a possibility of leaving successfully ingested table contributions in a table after aborting the corresponding super-transaction. A typical scenario for seeing the problem is when an ingest workflow is aborting a transaction while still having unfinished ingest requests in the context of the same transaction. Typically this would happen in complex parallel workflows, especially if no good coordination between the transaction management and ingesting contributions is implemented by the workflow.
The root cause of the problem was found in the following sequence of actions taken by the ingest service on a contribution request made by a client:
- Analyze the status of the transaction. If it's ABORTED report back to the client and exit.
- Begin pulling the contribution from the object store via HTTP, reading from a locally mounted filesystem, or receiving the contribution's data sent by the client of the binary protocol.
- Process the contribution and store it in the temporary filesystem,
- Create the MySQL partition corresponding to the transaction (that was validated earlier at step 1) if none existed.
- Upload the contribution into the table.
- Report SUCCESS back to a client.
The algorithm allows a race condition between steps 1 and 5 of the sequence.
The proposed solution is to introduce an additional check after step 5 to ensure the transaction was not ABORTED since its status was checked at step 1. And if it was then the service will eliminate the corresponding MySQL partition.
The algorithm is based on the one-way approach to changing states of the "super-transactions" in the Replication/Ingest system:
- from STARTED to ABORTED, or
- from STARTED to FINISHED
While it's perfectly clear what to do about the above-presented scenario (transactions being aborted during ingests), it's not clear how to treat transactions successfully committed while there were outstanding ingests. One option would be not to do anything about it and assume that the newly ingested contribution was legitimate. Another possibility would be to report an error code to the workflow (and optionally delete the MySQL partition). In general, there is no perfect solution to some ordering mistakes that could be made by the ingest workflows. Perhaps, adding an additional Q&A mechanism to the system would help to identify issues a-posteriori? The current implementation of the system has enough information in its persistent state that could be used to discover abnormalities during the catalog ingest after it's over. The findings could be used for the manual or semi-automated post-ingest fixes if needed.