# Improved batch mode of ingesting table contributions into Qserv

XMLWordPrintable

#### Details

• Type: Improvement
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
3
• Sprint:
DB_F20_06
• Team:
Data Access and Database

#### Description

The current implementation of the Ingest system has a binary tool qserv-replica-file-ingest providing the batch mode for loading a list of contributions (read from input files) into various tables:

 qserv-replica-file-ingest FILE-LIST [--auth-key=] ... 

Where, the mandatory parameter <contributions> requires a JSON-formatted text file with an array of file contributions into potentially any tables using multiple transactions. Here is an example of such file:

 [{"worker-host":"qserv-db01","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187107_overlap.txt"},  {"worker-host":"qserv-db01","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187107.txt"},  {"worker-host":"qserv-db02","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187108_overlap.txt"},  {"worker-host":"qserv-db02","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187108.txt"},  {"worker-host":"qserv-db01","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187109_overlap.txt"},  {"worker-host":"qserv-db01","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187109.txt"},  {"worker-host":"qserv-db02","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187110_overlap.txt"},  {"worker-host":"qserv-db02","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187110.txt"} ] 

Although this method provides a lot of flexibility in mixing contributions into any tables, using different transactions, it creates a few inconveniences for implementing ingest workflows. One of those is cased by encoding transaction identifiers into the file. If, for some reason, a transaction fails ad has to be restarted, the previous version of the table contributions files becomes invalid and it needs to be regenerated to encompass a new transaction. Another problem of mixing transactions in the same file would require aborting all transactions mentioned in the file if any contribution fails to be ingested. And finally, it's impractical in any realistic workflows to mix contributions into different tables in the same file.

The proposed effort is meant to keep the current batch method, and to add another one which addresses the above mentioned issues:

 qserv-replica-file-ingest FILE-LIST-TRANS [--auth-key=] ... 

Where the <contributions> file will have a fewer number of attributes in each element of the JSON array. For example:

 [{"worker-host":"qserv-db01","worker-port":25002,"path":"input/chunk_187107_overlap.txt"},  {"worker-host":"qserv-db01","worker-port":25002,"path":"input/chunk_187107.txt"},  {"worker-host":"qserv-db02","worker-port":25002,"path":"input/chunk_187108_overlap.txt"},  {"worker-host":"qserv-db02","worker-port":25002,"path":"input/chunk_187108.txt"},  {"worker-host":"qserv-db01","worker-port":25002,"path":"input/chunk_187109_overlap.txt"},  {"worker-host":"qserv-db01","worker-port":25002,"path":"input/chunk_187109.txt"},  {"worker-host":"qserv-db02","worker-port":25002,"path":"input/chunk_187110_overlap.txt"},  {"worker-host":"qserv-db02","worker-port":25002,"path":"input/chunk_187110.txt"} ] 

This file would produce the same yield as the existing one if used like illustrated below:

 qserv-replica-file-ingest FILE-LIST-TRANS 123 Object P 

Advantages of the new batch mode:

• the file of contributions is reusable across transactions
• a scope of failures during ingest is limited to a single transactions

Destinations where the tables contributions are meant to be ingested do not change after aborting and restarting transactions.

#### Activity

Hide
Igor Gaponenko added a comment -
Show
Igor Gaponenko added a comment - John Gates PR: https://github.com/lsst/qserv/pull/568
Hide
John Gates added a comment -

It looks good to me.

Show
John Gates added a comment - It looks good to me.

#### People

Assignee:
Igor Gaponenko
Reporter:
Igor Gaponenko
Reviewers:
John Gates
Watchers:
Fritz Mueller, Igor Gaponenko, John Gates, Nate Pease