Fix Version/s: None
Team:Data Access and Database
The current implementation of the Ingest system has a binary tool qserv-replica-file-ingest providing the batch mode for loading a list of contributions (read from input files) into various tables:
Where, the mandatory parameter <contributions> requires a JSON-formatted text file with an array of file contributions into potentially any tables using multiple transactions. Here is an example of such file:
Although this method provides a lot of flexibility in mixing contributions into any tables, using different transactions, it creates a few inconveniences for implementing ingest workflows. One of those is cased by encoding transaction identifiers into the file. If, for some reason, a transaction fails ad has to be restarted, the previous version of the table contributions files becomes invalid and it needs to be regenerated to encompass a new transaction. Another problem of mixing transactions in the same file would require aborting all transactions mentioned in the file if any contribution fails to be ingested. And finally, it's impractical in any realistic workflows to mix contributions into different tables in the same file.
The proposed effort is meant to keep the current batch method, and to add another one which addresses the above mentioned issues:
Where the <contributions> file will have a fewer number of attributes in each element of the JSON array. For example:
This file would produce the same yield as the existing one if used like illustrated below:
Advantages of the new batch mode:
- the file of contributions is reusable across transactions
- a scope of failures during ingest is limited to a single transactions
Destinations where the tables contributions are meant to be ingested do not change after aborting and restarting transactions.